HyperAI
HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Audio Generation
Audio Generation On Audiocaps
Audio Generation On Audiocaps
Metrics
FAD
FD
Results
Performance results of various models on this benchmark
Columns
Model Name
FAD
FD
Paper Title
Repository
Make-An-Audio 2
1.80
11.75
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
-
Stable Audio
-
-
Fast Timing-Conditioned Latent Audio Diffusion
-
Audiobox Sound
0.77
8.30
Audiobox: Unified Audio Generation with Natural Language Prompts
-
GenAu-Large
1.21
16.51
Taming Data and Transformers for Audio Generation
-
Tango-AF&AC-FT-AC
2.54
17.19
Improving Text-To-Audio Models with Synthetic Captions
-
AudioLDM 2-AC-Large
1.42
-
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
-
Re-AudioLDM-L
1.37
-
Retrieval-Augmented Text-to-Audio Generation
-
Auffusion-Full
1.76
23.08
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
-
ETTA
2.51
13.12
ETTA: Elucidating the Design Space of Text-to-Audio Models
-
AudioGen
3.13
-
AudioGen: Textually Guided Audio Generation
-
Make-An-Audio
2.66
18.32
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
-
TangoFlux
-
-
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
-
ETTA-FT-AC-100k
2.03
10.10
ETTA: Elucidating the Design Space of Text-to-Audio Models
-
Diffsound
7.75
47.68
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
-
AudioLDM2-large
2.02
26.18
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
-
Stable Audio 2.0
-
-
Long-form music generation with latent diffusion
-
Consistency TTA (Single-step generation)
2.18
20.44
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
-
Auffusion
1.63
21.99
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
-
CoDi
1.80
22.90
Any-to-Any Generation via Composable Diffusion
-
AudioLDM-L-Full
1.96
23.31
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
-
0 of 23 row(s) selected.
Previous
Next
Audio Generation On Audiocaps | SOTA | HyperAI