2 months ago

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan

Abstract

Multimodal learning has rapidly advanced visual understanding, largely via multimodal large language models (MLLMs) that use powerful LLMs as cognitive cores. In visual generation, however, these powerful core models are typically reduced to global text encoders for diffusion models, leaving most of their reasoning and planning ability unused. This creates a gap: current multimodal LLMs can parse complex layouts, attributes, and knowledge-intensive scenes, yet struggle to generate images or videos with equally precise and structured control. We propose MetaCanvas, a lightweight framework that lets MLLMs reason and plan directly in spatial and spatiotemporal latent spaces and interface tightly with diffusion generators. We empirically implement MetaCanvas on three different diffusion backbones and evaluate it across six tasks, including text-to-image generation, text/image-to-video generation, image/video editing, and in-context video generation, each requiring precise layouts, robust attribute binding, and reasoning-intensive control. MetaCanvas consistently outperforms global-conditioning baselines, suggesting that treating MLLMs as latent-space planners is a promising direction for narrowing the gap between multimodal understanding and generation.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

2 months ago

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Exploring MLLM-Diffusion Information Transfer with MetaCanvas

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan3 more

Abstract

Build AI with AI

HyperAI Newsletters

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan

Han Lin Xichen Pan Ziqi Huang Ji Hou Jialiang Wang Weifeng Chen Zecheng He Felix Juefei-Xu Junzhe Sun Zhipeng Fan