3 months ago

CalBERT - Code-mixed Adaptive Language representations using BERT

{Ashwini M Joshi Deeksha D Aronya Baksy Ansh Sarkar Aditeya Baral}

Abstract

A code-mixed language is a type of language that involves the combination of two or more language varieties in its script or speech. Analysis of code-text is difficult to tackle because the language present is not consistent and does not work with existing monolingual approaches. We propose a novel approach to improve performance in Transformers by introducing an additional step called "Siamese Pre-Training", which allows pre-trained monolingual Transformers to adapt language representations for code-mixed languages with a few examples of code-mixed data. The proposed architectures beat the state of the art F1-score on the Sentiment Analysis for Indian Languages (SAIL) dataset, with the highest possible improvement being 5.1 points, while also achieving the state-of-the-art accuracy on the IndicGLUE Product Reviews dataset by beating the benchmark by 0.4 points.

Benchmarks

Benchmark	Methodology	Metrics
sentiment-analysis-on-iitp-product-reviews	CalBERT	Accuracy: 79.4
sentiment-analysis-on-sail-2017	CalBERT	F1: 62 Precision: 61.8 Recall: 61.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette