HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Itihasa: A large-scale corpus for Sanskrit to English translation

Rahul Aralikatte Miryam de Lhoneux Anoop Kunchukuttan Anders Søgaard

Itihasa: A large-scale corpus for Sanskrit to English translation

Abstract

This work introduces Itihasa, a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata. We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances. We then benchmark the performance of standard translation models on this corpus and show that even state-of-the-art transformer architectures perform poorly, emphasizing the complexity of the dataset.

Benchmarks

BenchmarkMethodologyMetrics
machine-translation-on-itihasaBaseline (sn->en)
SacreBLEU: 7.49
machine-translation-on-itihasaBaseline (en->sn)
SacreBLEU: 7.59

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Itihasa: A large-scale corpus for Sanskrit to English translation | Papers | HyperAI