8 months ago

Abstract

We propose a unified Generative Adversarial Network (GAN) for controllableimage-to-image translation, i.e., transferring an image from a source to atarget domain guided by controllable structures. In addition to conditioning ona reference image, we show how the model can generate images conditioned oncontrollable structures, e.g., class labels, object keypoints, human skeletons,and scene semantic maps. The proposed model consists of a single generator anda discriminator taking a conditional image and the target controllablestructure as input. In this way, the conditional image can provide appearanceinformation and the controllable structure can provide the structureinformation for generating the target result. Moreover, our model learns theimage-to-image mapping through three novel losses, i.e., color loss,controllable structure guided cycle-consistency loss, and controllablestructure guided self-content preserving loss. Also, we present the Fr'echetResNet Distance (FRD) to evaluate the quality of the generated images.Experiments on two challenging image translation tasks, i.e., handgesture-to-gesture translation and cross-view image translation, show that ourmodel generates convincing results, and significantly outperforms otherstate-of-the-art methods on both tasks. Meanwhile, the proposed framework is aunified solution, thus it can be applied to solving other controllablestructure guided image translation tasks such as landmark guided facialexpression translation and keypoint guided person image generation. To the bestof our knowledge, we are the first to make one GAN framework work on all suchcontrollable structure guided image translation tasks. Code is available athttps://github.com/Ha0Tang/GestureGAN.

Source PDF View Code