5 months ago

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

Wang Xiaodong ; Wu Chenfei ; Yin Shengming ; Ni Minheng ; Wang Jianfeng ; Li Linjie ; Yang Zhengyuan ; Yang Fan ; Wang Lijuan ; Liu

Abstract

3D photography renders a static image into a video with appealing 3D visualeffects. Existing approaches typically first conduct monocular depthestimation, then render the input frame to subsequent frames with variousviewpoints, and finally use an inpainting model to fill those missing/occludedregions. The inpainting model plays a crucial role in rendering quality, but itis normally trained on out-of-domain data. To reduce the training and inferencegap, we propose a novel self-supervised diffusion model as the inpaintingmodule. Given a single input image, we automatically construct a training pairof the masked occluded image and the ground-truth image with randomcycle-rendering. The constructed training samples are closely aligned to thetesting instances, without the need of data annotation. To make full use of themasked images, we design a Masked Enhanced Block (MEB), which can be easilyplugged into the UNet and enhance the semantic conditions. Towards real-worldanimation, we present a novel task: out-animation, which extends the space andtime of input objects. Extensive experiments on real datasets show that ourmethod achieves competitive results with existing SOTA methods.

Benchmarks

Benchmark	Methodology	Metrics
image-outpainting-on-mscoco	NUWA-3D	CLIP Similarity: 32.26 FID: 10.65 Inception score: 38.61

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette