HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Situation Recognition: Visual Semantic Role Labeling for Image Understanding

{Luke Zettlemoyer Ali Farhadi Mark Yatskar}

Situation Recognition: Visual Semantic Role Labeling for Image Understanding

Abstract

This paper introduces situation recognition, the problem of producing a concise summary of the situation an image depicts including: (1) the main activity (e.g., clipping), (2) the participating actors, objects, substances, and locations (e.g., man, shears, sheep, wool, and field) and most importantly (3) the roles these participants play in the activity (e.g., the man is clipping, the shears are his tool, the wool is being clipped from the sheep, and the clipping is in a field). We use FrameNet, a verb and role lexicon developed by linguists, to define a large space of possible situations and collect a large-scale dataset containing over 500 activities, 1,700 roles, 11,000 objects, 125,000 images, and 200,000 unique situations. We also introduce structured prediction baselines and show that, in activity-centric images, situation-driven prediction of objects and activities outperforms independent object and activity recognition.

Benchmarks

BenchmarkMethodologyMetrics
grounded-situation-recognition-on-swigCRF
Top-1 Verb: 32.34
Top-1 Verb u0026 Value: 24.64
Top-5 Verbs: 58.88
Top-5 Verbs u0026 Value: 42.76
situation-recognition-on-imsituCRF
Top-1 Verb: 32.34
Top-1 Verb u0026 Value: 24.64
Top-5 Verbs: 58.88
Top-5 Verbs u0026 Value: 42.76

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Situation Recognition: Visual Semantic Role Labeling for Image Understanding | Papers | HyperAI