Command Palette
Search for a command to run...
Khuyagbaatar Batsuren; Gábor Bella; Aryaman Arora; Viktor Martinović; Kyle Gorman; Zdeněk Žabokrtský; Amarsanaa Ganbold; Šárka Dohnalová; Magda Ševčíková; Kateřina Pelegrinová; Fausto Giunchiglia; Ryan Cotterell; Ekaterina Vylomova

Abstract
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections. Subtask 1, word-level morpheme segmentation, covered 5 million words in 9 languages (Czech, English, Spanish, Hungarian, French, Italian, Russian, Latin, Mongolian) and received 13 system submissions from 7 teams and the best system averaged 97.29% F1 score across all languages, ranging English (93.84%) to Latin (99.38%). Subtask 2, sentence-level morpheme segmentation, covered 18,735 sentences in 3 languages (Czech, English, Mongolian), received 10 system submissions from 3 teams, and the best systems outperformed all three state-of-the-art subword tokenization methods (BPE, ULM, Morfessor2) by 30.71% absolute. To facilitate error analysis and support any type of future studies, we released all system predictions, the evaluation script, and all gold standard datasets.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| morpheme-segmentaiton-on-unimorph-4-0 | WordPiece | f1 macro avg (subtask 2): 40.59 lev dist (subtask 2): 17.54 macro avg (subtask 1): 15.89 |
| morpheme-segmentaiton-on-unimorph-4-0 | Morfessor2 | f1 macro avg (subtask 2): 50.65 lev dist (subtask 2): 12.08 macro avg (subtask 1): 25.57 |
| morpheme-segmentaiton-on-unimorph-4-0 | ULM | f1 macro avg (subtask 2): 45.99 lev dist (subtask 2): 14.28 macro avg (subtask 1): 20.61 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.