Command Palette
Search for a command to run...
CalBERT - Code-mixed Adaptive Language representations using BERT
{Ashwini M Joshi Deeksha D Aronya Baksy Ansh Sarkar Aditeya Baral}

Abstract
A code-mixed language is a type of language that involves the combination of two or more language varieties in its script or speech. Analysis of code-text is difficult to tackle because the language present is not consistent and does not work with existing monolingual approaches. We propose a novel approach to improve performance in Transformers by introducing an additional step called "Siamese Pre-Training", which allows pre-trained monolingual Transformers to adapt language representations for code-mixed languages with a few examples of code-mixed data. The proposed architectures beat the state of the art F1-score on the Sentiment Analysis for Indian Languages (SAIL) dataset, with the highest possible improvement being 5.1 points, while also achieving the state-of-the-art accuracy on the IndicGLUE Product Reviews dataset by beating the benchmark by 0.4 points.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| sentiment-analysis-on-iitp-product-reviews | CalBERT | Accuracy: 79.4 |
| sentiment-analysis-on-sail-2017 | CalBERT | F1: 62 Precision: 61.8 Recall: 61.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.