HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Yifan Wu Pengchuan Zhang Wenhan Xiong Barlas Oguz James C. Gee Yixin Nie

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Abstract

The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning. We present the "Description then Decision" strategy, which is inspired by how humans process signals. This strategy significantly improves probing task performance by 50%, establishing the groundwork for future research on reasoning paradigms in complex vision-language tasks.

Benchmarks

BenchmarkMethodologyMetrics
visual-reasoning-on-winogroundGPT-4V (CoT, pick b/w two options)
Group Score: 58.75
Image Score: 68.75
Text Score: 75.25
visual-reasoning-on-winogroundGPT-4V (pick b/w two options)
Group Score: 39.25
Image Score: 46.25
Text Score: 69.25

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task | Papers | HyperAI