Command Palette
Search for a command to run...
Motion2Language, unsupervised learning of synchronized semantic motion segmentation
Radouane Karim ; Tchechmedjiev Andon ; Lagarde Julien ; Ranwez Sylvie

Abstract
In this paper, we investigate building a sequence to sequence architecturefor motion to language translation and synchronization. The aim is to translatemotion capture inputs into English natural-language descriptions, such that thedescriptions are generated synchronously with the actions performed, enablingsemantic segmentation as a byproduct, but without requiring synchronizedtraining data. We propose a new recurrent formulation of local attention thatis suited for synchronous/live text generation, as well as an improved motionencoder architecture better suited to smaller data and for synchronousgeneration. We evaluate both contributions in individual experiments, using thestandard BLEU4 metric, as well as a simple semantic equivalence measure, on theKIT motion language dataset. In a follow-up experiment, we assess the qualityof the synchronization of generated text in our proposed approaches throughmultiple evaluation metrics. We find that both contributions to the attentionmechanism and the encoder architecture additively improve the quality ofgenerated text (BLEU and semantic equivalence), but also of synchronization.Our code is available athttps://github.com/rd20karim/M2T-Segmentation/tree/main
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| motion-captioning-on-humanml3d | MLP+GRU | BERTScore: 37.2 BLEU-4: 23.4 |
| motion-captioning-on-kit-motion-language | MLP+GRU | BERTScore: 42.1 BLEU-4: 25.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.