HyperAI超神经

首页算力平台文档资讯论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Video To Sound Generation On Vgg Sound

评估指标

FAD

FD

评测结果

各个模型在此基准测试上的表现结果

			Paper Title	Repository
VATT-LLama	2.38	-	Tell What You Hear From What You See -- Video to Audio Generation Through Text
ReWas	2.16	15.24	Read, Watch and Scream! Sound Generation from Text and Video
MaskVAT_Hybrid	2.04	-	Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity	-
V-AURA	1.92	-	Temporally Aligned Audio for Video with Autoregression
Frieren	1.32	12.26	Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
MMAudio-L-44.1kHz	0.97	4.72	Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
V2A-Mapper	0.841	24.168	V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
MMAudio-S-16kHz	0.79	5.22	Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

0 of 8 row(s) selected.

Video To Sound Generation On Vgg Sound | SOTA | HyperAI超神经