Command Palette
Search for a command to run...
Jiang Biao ; Chen Xin ; Liu Wen ; Yu Jingyi ; Yu Gang ; Chen Tao

Abstract
Though the advancement of pre-trained large language models unfolds, theexploration of building a unified model for language and other multi-modaldata, such as motion, remains challenging and untouched so far. Fortunately,human motion displays a semantic coupling akin to human language, oftenperceived as a form of body language. By fusing language data with large-scalemotion models, motion-language pre-training that can enhance the performance ofmotion-related tasks becomes feasible. Driven by this insight, we proposeMotionGPT, a unified, versatile, and user-friendly motion-language model tohandle multiple motion-relevant tasks. Specifically, we employ the discretevector quantization for human motion and transfer 3D motion into motion tokens,similar to the generation process of word tokens. Building upon this "motionvocabulary", we perform language modeling on both motion and text in a unifiedmanner, treating human motion as a specific language. Moreover, inspired byprompt learning, we pre-train MotionGPT with a mixture of motion-language dataand fine-tune it on prompt-based question-and-answer tasks. Extensiveexperiments demonstrate that MotionGPT achieves state-of-the-art performanceson multiple motion tasks including text-driven motion generation, motioncaptioning, motion prediction, and motion in-between.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| motion-captioning-on-humanml3d | MotionGPT | BERTScore: 32.4 BLEU-4: 12.47 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.