Command Palette
Search for a command to run...
Zach Evans Julian D. Parker CJ Carr Zack Zukowski Josiah Taylor Jordi Pons

Abstract
Open generative models are vitally important for the community, allowing forfine-tunes and serving as baselines when presenting new models. However, mostcurrent text-to-audio models are private and not accessible for artists andresearchers to build upon. Here we describe the architecture and trainingprocess of a new open-weights text-to-audio model trained with Creative Commonsdata. Our evaluation shows that the model's performance is competitive with thestate-of-the-art across various metrics. Notably, the reported FDopenl3 results(measuring the realism of the generations) showcase its potential forhigh-quality stereo sound synthesis at 44.1kHz.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| audio-generation-on-audiocaps | Stable Audio Open | CLAP_LAION: 0.35 CLAP_MS: 0.34 FD_openl3: 78.24 KL_passt: 2.14 |
| text-to-music-generation-on-musiccaps | Stable Audio Open | CLAP_LAION: 0.48 CLAP_MS: 0.49 FAD: 3.51 FD: 36.42 FD_openl3: 127.20 IS: 2.93 KL_passt: 1.32 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.