HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Mustango: Toward Controllable Text-to-Music Generation

Melechovsky Jan ; Guo Zixun ; Ghosal Deepanway ; Majumder Navonil ; Herremans Dorien ; Poria Soujanya

Mustango: Toward Controllable Text-to-Music Generation

Abstract

The quality of the text-to-music models has reached new heights due to recentadvancements in diffusion models. The controllability of various musicalaspects, however, has barely been explored. In this paper, we propose Mustango:a music-domain-knowledge-inspired text-to-music system based on diffusion.Mustango aims to control the generated music, not only with general textcaptions, but with more rich captions that can include specific instructionsrelated to chords, beats, tempo, and key. At the core of Mustango is MuNet, aMusic-Domain-Knowledge-Informed UNet guidance module that steers the generatedmusic to include the music-specific conditions, which we predict from the textprompt, as well as the general text embedding, during the reverse diffusionprocess. To overcome the limited availability of open datasets of music withtext captions, we propose a novel data augmentation method that includesaltering the harmonic, rhythmic, and dynamic aspects of music audio and usingstate-of-the-art Music Information Retrieval methods to extract the musicfeatures which will then be appended to the existing descriptions in textformat. We release the resulting MusicBench dataset which contains over 52Kinstances and includes music-theory-based descriptions in the caption text.Through extensive experiments, we show that the quality of the music generatedby Mustango is state-of-the-art, and the controllability through music-specifictext prompts greatly outperforms other models such as MusicGen and AudioLDM2.

Benchmarks

BenchmarkMethodologyMetrics
text-to-music-generation-on-musicbenchMustango (non-pretrained)
FAD: 2.09

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp