HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

FLUX that Plays Music

Zhengcong Fei Mingyuan Fan Changqian Yu Junshi Huang

FLUX that Plays Music

Abstract

This paper explores a simple extension of diffusion-based rectified flowTransformers for text-to-music generation, termed as FluxMusic. Generally,along with design in advancedFluxhttps://github.com/black-forest-labs/flux model, we transfers itinto a latent VAE space of mel-spectrum. It involves first applying a sequenceof independent attention to the double text-music stream, followed by a stackedsingle music stream for denoised patch prediction. We employ multiplepre-trained text encoders to sufficiently capture caption semantic informationas well as inference flexibility. In between, coarse textual information, inconjunction with time step embeddings, is utilized in a modulation mechanism,while fine-grained textual details are concatenated with the music patchsequence as inputs. Through an in-depth study, we demonstrate that rectifiedflow training with an optimized architecture significantly outperformsestablished diffusion methods for the text-to-music task, as evidenced byvarious automatic metrics and human preference evaluations. Our experimentaldata, code, and model weights are made publicly available at:https://github.com/feizc/FluxMusic.

Code Repositories

black-forest-labs/flux
Official
pytorch
feizc/fluxmusic
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-to-music-generation-on-musiccapsFLUXMusic
FAD: 1.43
IS: 2.98
KL_passt: 1.25

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp