Command Palette
Search for a command to run...
WithAnyone: Towards Controllable and ID Consistent Image Generation
WithAnyone: Towards Controllable and ID Consistent Image Generation
Abstract
Identity-consistent generation has become an important focus in text-to-imageresearch, with recent models achieving notable success in producing imagesaligned with a reference identity. Yet, the scarcity of large-scale paireddatasets containing multiple images of the same individual forces mostapproaches to adopt reconstruction-based training. This reliance often leads toa failure mode we term copy-paste, where the model directly replicates thereference face rather than preserving identity across natural variations inpose, expression, or lighting. Such over-similarity undermines controllabilityand limits the expressive power of generation. To address these limitations, we(1) construct a large-scale paired dataset MultiID-2M, tailored formulti-person scenarios, providing diverse references for each identity; (2)introduce a benchmark that quantifies both copy-paste artifacts and thetrade-off between identity fidelity and variation; and (3) propose a noveltraining paradigm with a contrastive identity loss that leverages paired datato balance fidelity with diversity. These contributions culminate inWithAnyone, a diffusion-based model that effectively mitigates copy-paste whilepreserving high identity similarity. Extensive qualitative and quantitativeexperiments demonstrate that WithAnyone significantly reduces copy-pasteartifacts, improves controllability over pose and expression, and maintainsstrong perceptual quality. User studies further validate that our methodachieves high identity fidelity while enabling expressive controllablegeneration.