HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Beyond Characters: Subword-level Morpheme Segmentation

{Andre F. T. Martins Ben Peters}

Beyond Characters: Subword-level Morpheme Segmentation

Abstract

This paper presents DeepSPIN’s submissions to the SIGMORPHON 2022 Shared Task on Morpheme Segmentation. We make three submissions, all to the word-level subtask. First, we show that entmax-based sparse sequence-tosequence models deliver large improvements over conventional softmax-based models, echoing results from other tasks. Then, we challenge the assumption that models for morphological tasks should be trained at the character level by building a transformer that generates morphemes as sequences of unigram language model-induced subwords. This subword transformer outperforms all of our character-level models and wins the word-level subtask. Although we do not submit an official submission to the sentence-level subtask, we show that this subword-based approach is highly effective there as well.

Benchmarks

BenchmarkMethodologyMetrics
morpheme-segmentaiton-on-unimorph-4-0Char LSTM (DeepSPIN-2; soft-attention, 1-5 entmax)
macro avg (subtask 1): 97.15
morpheme-segmentaiton-on-unimorph-4-0Subword-ULM transformer (DeepSPIN-3; soft-attention, 1-5 entmax)
macro avg (subtask 1): 97.29
morpheme-segmentaiton-on-unimorph-4-0Char LSTM (DeepSPIN-1; soft-attention)
macro avg (subtask 1): 96.32

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Beyond Characters: Subword-level Morpheme Segmentation | Papers | HyperAI