HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

Shoya Matsumori Yuki Abe Kosuke Shingyouchi Komei Sugiura Michita Imai

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

Abstract

Text-guided image manipulation tasks have recently gained attention in the vision-and-language community. While most of the prior studies focused on single-turn manipulation, our goal in this paper is to address the more challenging multi-turn image manipulation (MTIM) task. Previous models for this task successfully generate images iteratively, given a sequence of instructions and a previously generated image. However, this approach suffers from under-generation and a lack of generated quality of the objects that are described in the instructions, which consequently degrades the overall performance. To overcome these problems, we present a novel architecture called a Visually Guided Language Attention GAN (LatteGAN). Here, we address the limitations of the previous approaches by introducing a Visually Guided Language Attention (Latte) module, which extracts fine-grained text representations for the generator, and a Text-Conditioned U-Net discriminator architecture, which discriminates both the global and local representations of fake or real images. Extensive experiments on two distinct MTIM datasets, CoDraw and i-CLEVR, demonstrate the state-of-the-art performance of the proposed model.

Code Repositories

smatsumori/lattegan
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
text-to-image-generation-on-geneva-codrawLatteGAN
F1-score: 77.51± 0.52
rsim: 54.16± 0.21
text-to-image-generation-on-geneva-i-clevrLatteGAN
F1-score: 97.26±1.56
rsim: 83.21± 1.70

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp