Command Palette
Search for a command to run...
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Li Zhen ; Cao Mingdeng ; Wang Xintao ; Qi Zhongang ; Cheng Ming-Ming ; Shan Ying

Abstract
Recent advances in text-to-image generation have made remarkable progress insynthesizing realistic human photos conditioned on given text prompts. However,existing personalized generation methods cannot simultaneously satisfy therequirements of high efficiency, promising identity (ID) fidelity, and flexibletext controllability. In this work, we introduce PhotoMaker, an efficientpersonalized text-to-image generation method, which mainly encodes an arbitrarynumber of input ID images into a stack ID embedding for preserving IDinformation. Such an embedding, serving as a unified ID representation, can notonly encapsulate the characteristics of the same input ID comprehensively, butalso accommodate the characteristics of different IDs for subsequentintegration. This paves the way for more intriguing and practically valuableapplications. Besides, to drive the training of our PhotoMaker, we propose anID-oriented data construction pipeline to assemble the training data. Under thenourishment of the dataset constructed through the proposed pipeline, ourPhotoMaker demonstrates better ID preservation ability than test-timefine-tuning based methods, yet provides significant speed improvements,high-quality generation results, strong generalization capabilities, and a widerange of applications. Our project page is available athttps://photo-maker.github.io/
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| diffusion-personalization-tuning-free-on | PhotoMaker | Cosine Similarity: 0.287 FID: 8.410 LPIPS: 0.424 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.