HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

Arun Mallya; Svetlana Lazebnik

Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering

Abstract

This paper proposes deep convolutional network models that utilize local and global context to make human activity label predictions in still images, achieving state-of-the-art performance on two recent datasets with hundreds of labels each. We use multiple instance learning to handle the lack of supervision on the level of individual person instances, and weighted loss to handle unbalanced training data. Further, we show how specialized features trained on these datasets can be used to improve accuracy on the Visual Question Answering (VQA) task, in the form of multiple choice fill-in-the-blank questions (Visual Madlibs). Specifically, we tackle two types of questions on person activity and person-object relationship and show improvements over generic features trained on the ImageNet classification task.

Benchmarks

BenchmarkMethodologyMetrics
human-object-interaction-detection-on-hico-1Mallya & Lazebnik
mAP: 36.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering | Papers | HyperAI