HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

CalBERT - Code-mixed Adaptive Language representations using BERT

{Ashwini M Joshi Deeksha D Aronya Baksy Ansh Sarkar Aditeya Baral}

CalBERT - Code-mixed Adaptive Language representations using BERT

Abstract

A code-mixed language is a type of language that involves the combination of two or more language varieties in its script or speech. Analysis of code-text is difficult to tackle because the language present is not consistent and does not work with existing monolingual approaches. We propose a novel approach to improve performance in Transformers by introducing an additional step called "Siamese Pre-Training", which allows pre-trained monolingual Transformers to adapt language representations for code-mixed languages with a few examples of code-mixed data. The proposed architectures beat the state of the art F1-score on the Sentiment Analysis for Indian Languages (SAIL) dataset, with the highest possible improvement being 5.1 points, while also achieving the state-of-the-art accuracy on the IndicGLUE Product Reviews dataset by beating the benchmark by 0.4 points.

Benchmarks

BenchmarkMethodologyMetrics
sentiment-analysis-on-iitp-product-reviewsCalBERT
Accuracy: 79.4
sentiment-analysis-on-sail-2017CalBERT
F1: 62
Precision: 61.8
Recall: 61.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CalBERT - Code-mixed Adaptive Language representations using BERT | Papers | HyperAI