Video Generation On Ucf 101

评估指标

FVD16

Inception Score

KVD16

评测结果

各个模型在此基准测试上的表现结果

				Paper Title	Repository
MCVD	2460	-	148	Latent Video Diffusion Models for High-Fidelity Long Video Generation
VDM	1396	-	116	Latent Video Diffusion Models for High-Fidelity Long Video Generation
TGAN-v2 (128x128)	1209	-	-	Latent Video Diffusion Models for High-Fidelity Long Video Generation
MCVD (64x64)	1143	-	-	MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
MoCoGAN-HD (256x256, unconditional)	700	33.95	-	A Good Image Generator Is What You Need for High-Resolution Video Synthesis
MagicVideo (256x256, text-conditional)	699	-	-	MagicVideo: Efficient Video Generation With Latent Diffusion Models	-
TATS (256x256)	635	-	55	Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
DIGAN (128x128, unconditional)	577	32.70	-	Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
LVDM (256x256, unconditional)	552	-	42	Latent Video Diffusion Models for High-Fidelity Long Video Generation
Video LDM (320x512, text-conditional)	550.61	33.45	-	Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
LAVIE (320x512, text-conditional)	526.30	-	-	LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
DIGAN (128x128, class-conditional)	465	59.68	39.6	Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
MeBT (128x128, unconditional)	438	65.93	-	Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
TATS (128x128, unconditional)	420	57.63	-	Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer
MMVG (128x128, unconditional)	395	58.3	-	Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
LVDM (256x256, unconditional)	372	-	27	Latent Video Diffusion Models for High-Fidelity Long Video Generation
Make-A-Video (Zero-shot, 256x256, class-conditional)	367.23	33	-	Make-A-Video: Text-to-Video Generation without Text-Video Data
PYoCo (Zero-shot, 64x64, text-conditional)	355.19	47.76	-	Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models	-
VideoPoet (text-conditional)	355	38.44	-	VideoPoet: A Large Language Model for Zero-Shot Video Generation	-
VideoAssembler (Zero-shot, 256x256, class-conditional)	346.84	48.01	-	MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing

0 of 46 row(s) selected.

Command Palette

Video Generation On Ucf 101

评估指标

评测结果