HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Effective Deep Learning Models for Automatic Diacritization of Arabic Text

{Ali Mustafa Qamar Mokthar Ali Hasan Madhfar}

Abstract

While building a text-to-speech system for the Arabic language, we found that the system synthesized speeches with many pronunciation errors. The primary source of these errors is the lack of diacritics in modern standard Arabic writing. These diacritics are small strokes that appear above or below each letter to provide pronunciation and grammatical information. We propose three deep learning models to recover Arabic text diacritics based on our work in a text-to-speech synthesis system using deep learning. The first model is a baseline model used to test how a simple deep learning model performs on the corpora. The second model is based on an encoder-decoder architecture, which resembles our text-to-speech synthesis model with many modifications to suit this problem. The last model is based on the encoder part of the text-to-speech model, which achieves state-of-the-art performances in both word error rate and diacritic error rate metrics. These models will benefit a wide range of natural language processing applications such as text-to-speech, part-of-speech tagging, and machine translation.

Benchmarks

BenchmarkMethodologyMetrics
arabic-text-diacritization-on-tashkeela-1CBHG model
Diacritic Error Rate: 0.0113
Word Error Rate (WER): 0.0443

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Effective Deep Learning Models for Automatic Diacritization of Arabic Text | Papers | HyperAI