HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for
  Vision-Language Models

Abstract

Large vision-language models (LVLMs) hallucinate: certain context cues in animage may trigger the language module's overconfident and incorrect reasoningon abnormal or hypothetical objects. Though a few benchmarks have beendeveloped to investigate LVLM hallucinations, they mainly rely on hand-craftedcorner cases whose fail patterns may hardly generalize, and finetuning on themcould undermine their validity. These motivate us to develop the firstautomatic benchmark generation approach, AUTOHALLUSION, that harnesses a fewprincipal strategies to create diverse hallucination examples. It probes thelanguage modules in LVLMs for context cues and uses them to synthesize imagesby: (1) adding objects abnormal to the context cues; (2) for two co-occurringobjects, keeping one and excluding the other; or (3) removing objects closelytied to the context cues. It then generates image-based questions whoseground-truth answers contradict the language module's prior. A model has toovercome contextual biases and distractions to reach correct answers, whileincorrect or inconsistent answers indicate hallucinations. AUTOHALLUSIONenables us to create new benchmarks at the minimum cost and thus overcomes thefragility of hand-crafted benchmarks. It also reveals common failure patternsand reasons, providing key insights to detect, avoid, or controlhallucinations. Comprehensive evaluations of top-tier LVLMs, e.g.,GPT-4V(ision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, show a 97.7% and98.7% success rate of hallucination induction on synthetic and real-worlddatasets of AUTOHALLUSION, paving the way for a long battle againsthallucinations.

Code Repositories

wuxiyang1996/AutoHallusion
Official
pytorch
Mentioned in GitHub
tianyi-lab/hallusionbench
Mentioned in GitHub
zli12321/videohallu
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-vqa-on-5GPT-4V
Overall Accuracy: 66.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp