HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Character-based Thai Word Segmentation with Multiple Attentions

{Manabu Okumura Hidetaka Kamigaito Thodsaporn Chay-intr}

Abstract

Character-based word-segmentation models have been extensively applied to agglutinative languages, including Thai, due to their high performance. These models estimate word boundaries from a character sequence. However, a character unit in sequences has no essential meaning, compared with word, subword, and character cluster units. We propose a Thai word-segmentation model that uses various types of information, including words, subwords, and character clusters, from a character sequence. Our model applies multiple attentions to refine segmentation inferences by estimating the significant relationships among characters and various unit types. The experimental results indicate that our model can outperform other state-of-the-art Thai word-segmentation models.

Benchmarks

BenchmarkMethodologyMetrics
thai-word-tokenization-on-best-2010Multiple Attentions (char-word-cc)
F1-Score: 0.9899

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Character-based Thai Word Segmentation with Multiple Attentions | Papers | HyperAI