8 months ago

Abstract

Vision-Language Pre-training (VLP) has advanced the performance for manyvision-language tasks. However, most existing pre-trained models only excel ineither understanding-based tasks or generation-based tasks. Furthermore,performance improvement has been largely achieved by scaling up the datasetwith noisy image-text pairs collected from the web, which is a suboptimalsource of supervision. In this paper, we propose BLIP, a new VLP frameworkwhich transfers flexibly to both vision-language understanding and generationtasks. BLIP effectively utilizes the noisy web data by bootstrapping thecaptions, where a captioner generates synthetic captions and a filter removesthe noisy ones. We achieve state-of-the-art results on a wide range ofvision-language tasks, such as image-text retrieval (+2.7% in averagerecall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score).BLIP also demonstrates strong generalization ability when directly transferredto video-language tasks in a zero-shot manner. Code, models, and datasets arereleased at https://github.com/salesforce/BLIP.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Visual Question Answering

Zihang Dai Yonghui Wu Chengkai Zhang Qiwei Li Yiming Yang Xun Huang Zhiheng Huang Yonghong Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Visual Question Answering

Zihang Dai Yonghui Wu Chengkai Zhang Qiwei Li Yiming Yang Xun Huang Zhiheng Huang Yonghong Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Zihang Dai Yonghui Wu Chengkai Zhang Qiwei Li Yiming Yang Xun Huang Zhiheng Huang Yonghong Li

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Zihang Dai Yonghui Wu Chengkai Zhang Qiwei Li Yiming Yang Xun Huang Zhiheng Huang Yonghong Li

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Zihang Dai Yonghui Wu Chengkai Zhang Qiwei Li Yiming Yang Xun Huang Zhiheng Huang Yonghong Li

Abstract

Build AI with AI

HyperAI Newsletters