HyperAIHyperAI

Command Palette

Search for a command to run...

Console
6 months ago

LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

Li Jinyuan ; Li Han ; Sun Di ; Wang Jiahao ; Zhang Wenkun ; Wang Zan ; Pan Gang

LLMs as Bridges: Reformulating Grounded Multimodal Named Entity
  Recognition

Abstract

Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodaltask that aims to identify named entities, entity types and their correspondingvisual regions. GMNER task exhibits two challenging properties: 1) The weakcorrelation between image-text pairs in social media results in a significantportion of named entities being ungroundable. 2) There exists a distinctionbetween coarse-grained referring expressions commonly used in similar tasks(e.g., phrase localization, referring expression comprehension) andfine-grained named entities. In this paper, we propose RiVEG, a unifiedframework that reformulates GMNER into a joint MNER-VE-VG task by leveraginglarge language models (LLMs) as a connecting bridge. This reformulation bringstwo benefits: 1) It maintains the optimal MNER performance and eliminates theneed for employing object detection methods to pre-extract regional features,thereby naturally addressing two major limitations of existing GMNER methods.2) The introduction of entity expansion expression and Visual Entailment (VE)module unifies Visual Grounding (VG) and Entity Grounding (EG). It enablesRiVEG to effortlessly inherit the Visual Entailment and Visual Groundingcapabilities of any current or prospective multimodal pretraining models.Extensive experiments demonstrate that RiVEG outperforms state-of-the-artmethods on the existing GMNER dataset and achieves absolute leads of 10.65%,6.21%, and 8.83% in all three subtasks.

Code Repositories

JinYuanLi0012/RiVEG
Official
pytorch
Mentioned in GitHub
jinyuanli0012/pgim
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
grounded-multimodal-named-entity-recognitionRiVEG
F1: 67.06

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp