Command Palette
Search for a command to run...
Comparative study of models trained on synthetic data for Ukrainian grammatical error correction
{Andrii Fedorych Andrii Shportko Artem Yushko Maksym Bondarenko}

Abstract
The task of Grammatical Error Correction (GEC) has been extensively studied for the English language. However, its application to low-resource languages, such as Ukrainian, remains an open challenge. In this paper, we develop sequence tagging and neural machine translation models for the Ukrainian language as well as a set of algorithmic correction rules to augment those systems. We also develop synthetic data generation techniques for the Ukrainian language to create high-quality human-like errors. Finally, we determine the best combination of synthetically generated data to augment the existing UA-GEC corpus and achieve the state-of-the-art results of 0.663 F0. 5 score on the newly established UA-GEC benchmark. The code and trained models will be made publicly available on GitHub and HuggingFace.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| grammatical-error-correction-on-ua-gec | mBART-based model with synthetic data | F0.5: 68.17 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.