4 days ago

4DNeX: Feed-Forward 4D Generative Modeling Made Easy

Zhaoxi Chen, Tianqi Liu, Long Zhuo, Jiawei Ren, Zeng Tao, He Zhu, Fangzhou Hong, Liang Pan, Ziwei Liu

Abstract

We present 4DNeX, the first feed-forward framework for generating 4D (i.e.,dynamic 3D) scene representations from a single image. In contrast to existingmethods that rely on computationally intensive optimization or requiremulti-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4Dgeneration by fine-tuning a pretrained video diffusion model. Specifically, 1)to alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scaledataset with high-quality 4D annotations generated using advancedreconstruction approaches. 2) we introduce a unified 6D video representationthat jointly models RGB and XYZ sequences, facilitating structured learning ofboth appearance and geometry. 3) we propose a set of simple yet effectiveadaptation strategies to repurpose pretrained video diffusion models for 4Dmodeling. 4DNeX produces high-quality dynamic point clouds that enablenovel-view video synthesis. Extensive experiments demonstrate that 4DNeXoutperforms existing 4D generation methods in efficiency and generalizability,offering a scalable solution for image-to-4D modeling and laying the foundationfor generative 4D world models that simulate dynamic scene evolution.