HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Fei Yu Jiji Tang Weichong Yin Yu Sun Hao Tian Hua Wu Haifeng Wang

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Abstract

We propose a knowledge-enhanced approach, ERNIE-ViL, which incorporates structured knowledge obtained from scene graphs to learn joint representations of vision-language. ERNIE-ViL tries to build the detailed semantic connections (objects, attributes of objects and relationships between objects) across vision and language, which are essential to vision-language cross-modal tasks. Utilizing scene graphs of visual scenes, ERNIE-ViL constructs Scene Graph Prediction tasks, i.e., Object Prediction, Attribute Prediction and Relationship Prediction tasks in the pre-training phase. Specifically, these prediction tasks are implemented by predicting nodes of different types in the scene graph parsed from the sentence. Thus, ERNIE-ViL can learn the joint representations characterizing the alignments of the detailed semantics across vision and language. After pre-training on large scale image-text aligned datasets, we validate the effectiveness of ERNIE-ViL on 5 cross-modal downstream tasks. ERNIE-ViL achieves state-of-the-art performances on all these tasks and ranks the first place on the VCR leaderboard with an absolute improvement of 3.7%.

Benchmarks

BenchmarkMethodologyMetrics
visual-question-answering-on-vcr-q-a-testERNIE-ViL-large(ensemble of 15 models)
Accuracy: 81.6
visual-question-answering-on-vcr-q-ar-testERNIE-ViL-large(ensemble of 15 models)
Accuracy: 70.5
visual-question-answering-on-vcr-qa-r-testERNIE-ViL-large(ensemble of 15 models)
Accuracy: 86.1
visual-question-answering-on-vqa-v2-test-stdERNIE-ViL-single model
number: 56.79
other: 65.24
overall: 74.93
yes/no: 90.83

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp