HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Mohsinul Kabir Obayed Bin Mahfuz Syed Rifat Raiyan Hasan Mahmud Md Kamrul Hasan

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Abstract

The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BanglaBook, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook.

Code Repositories

mohsinulkabir14/banglabook
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sentiment-analysis-on-banglabookLogistic Regression (word 2-gram + word 3-gram)
Weighted Average F1-score: 0.8964
sentiment-analysis-on-banglabookRandom Forest (word 1-gram)
Weighted Average F1-score: 0.9043
sentiment-analysis-on-banglabookBangla-BERT (base-uncased)
Weighted Average F1-score: 0.9064
sentiment-analysis-on-banglabookXGBoost (word 2-gram + word 3-gram)
Weighted Average F1-score: 0.8651
sentiment-analysis-on-banglabookRandom Forest (word 2-gram + word 3-gram)
Weighted Average F1-score: 0.9106
sentiment-analysis-on-banglabookLSTM (GloVe)
Weighted Average F1-score: 0.0991
sentiment-analysis-on-banglabookMultinomial NB (word 2-gram + word 3-gram)
Weighted Average F1-score: 0.8663
sentiment-analysis-on-banglabookMultinomial NB (BoW)
Weighted Average F1-score: 0.8564
sentiment-analysis-on-banglabookBangla-BERT (large)
Weighted Average F1-score: 0.9331
sentiment-analysis-on-banglabookLogistic Regression (char 2-gram + char 3-gram)
Weighted Average F1-score: 0.8978
sentiment-analysis-on-banglabookSVM (word 1-gram)
Weighted Average F1-score: 0.8519
sentiment-analysis-on-banglabookSVM (word 2-gram + word 3-gram)
Weighted Average F1-score: 0.9053
sentiment-analysis-on-banglabookXGBoost (char 2-gram + char 3-gram)
Weighted Average F1-score: 0.8723

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp