HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learning Flow Fields in Attention for Controllable Person Image Generation

Learning Flow Fields in Attention for Controllable Person Image
  Generation

Abstract

Controllable person image generation aims to generate a person imageconditioned on reference images, allowing precise control over the person'sappearance or pose. However, prior methods often distort fine-grained texturaldetails from the reference image, despite achieving high overall image quality.We attribute these distortions to inadequate attention to corresponding regionsin the reference image. To address this, we thereby propose learning flowfields in attention (Leffa), which explicitly guides the target query to attendto the correct reference key in the attention layer during training.Specifically, it is realized via a regularization loss on top of the attentionmap within a diffusion-based baseline. Our extensive experiments show thatLeffa achieves state-of-the-art performance in controlling appearance (virtualtry-on) and pose (pose transfer), significantly reducing fine-grained detaildistortion while maintaining high image quality. Additionally, we show that ourloss is model-agnostic and can be used to improve the performance of otherdiffusion models.

Code Repositories

franciszzj/leffa
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
pose-transfer-on-deep-fashionLeffa
FID: 4.23
virtual-try-on-on-dress-codeLeffa
FID: 2.06
virtual-try-on-on-viton-hdLeffa
FID: 4.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp