HyperAI超神经

首页算力平台文档资讯论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Image To Text Retrieval On Flickr30K

评估指标

Recall@1

Recall@10

Recall@5

评测结果

各个模型在此基准测试上的表现结果

				Paper Title	Repository
InternVL-G-FT (finetuned, w/o ranking)	97.9	100	100	InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
BLIP-2 ViT-G (zero-shot, 1K test set)	97.6	100	100	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ONE-PEACE (finetuned, w/o ranking)	97.6	100	100	ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
InternVL-C-FT (finetuned, w/o ranking)	97.2	100	100	InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
BLIP-2 ViT-L (zero-shot, 1K test set)	96.9	100	100	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ERNIE-ViL 2.0	96.1	100.0	99.9	ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
ALBEF	95.9	100.0	99.8	Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
ALBEF	92.6	99.9	99.3	HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
UNITER	87.3	99.2	98	HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
GSMN	76.4	97.3	94.3	A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
LGSGM	71	96.1	91.9	A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval

0 of 11 row(s) selected.