8 months ago

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber

Abstract

Large vision-language models (LVLMs) hallucinate: certain context cues in animage may trigger the language module's overconfident and incorrect reasoningon abnormal or hypothetical objects. Though a few benchmarks have beendeveloped to investigate LVLM hallucinations, they mainly rely on hand-craftedcorner cases whose fail patterns may hardly generalize, and finetuning on themcould undermine their validity. These motivate us to develop the firstautomatic benchmark generation approach, AUTOHALLUSION, that harnesses a fewprincipal strategies to create diverse hallucination examples. It probes thelanguage modules in LVLMs for context cues and uses them to synthesize imagesby: (1) adding objects abnormal to the context cues; (2) for two co-occurringobjects, keeping one and excluding the other; or (3) removing objects closelytied to the context cues. It then generates image-based questions whoseground-truth answers contradict the language module's prior. A model has toovercome contextual biases and distractions to reach correct answers, whileincorrect or inconsistent answers indicate hallucinations. AUTOHALLUSIONenables us to create new benchmarks at the minimum cost and thus overcomes thefragility of hand-crafted benchmarks. It also reveals common failure patternsand reasons, providing key insights to detect, avoid, or controlhallucinations. Comprehensive evaluations of top-tier LVLMs, e.g.,GPT-4V(ision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, show a 97.7% and98.7% success rate of hallucination induction on synthetic and real-worlddatasets of AUTOHALLUSION, paving the way for a long battle againsthallucinations.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Visual Question Answering

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Visual Question Answering

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber2 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber2 more

Abstract

Build AI with AI

HyperAI Newsletters

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber

Xiyang Wu Tianrui Guan Dianqi Li Shuaiyi Huang Xiaoyu Liu Xijun Wang Ruiqi Xian Abhinav Shrivastava Furong Huang Jordan Lee Boyd-Graber