HyperAIHyperAI

Zero Shot Video Retrieval On Didemo

Metrics

text-to-video R@1
text-to-video R@10
text-to-video R@5

Results

Performance results of various models on this benchmark

Model Name
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper TitleRepository
Singularity-5M36.969.361.1Revealing Single Frame Bias for Video-and-Language Learning-
InternVideo2-6B57.984.680.0InternVideo2: Scaling Foundation Models for Multimodal Video Understanding-
BT-Adapter35.672.661.9BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning-
LanguageBind(ViT-H/14)39.974.666.1LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment-
HiTeA-17M43.279.069.3HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training-
Clover29.566.355.2Clover: Towards A Unified Video-Language Alignment and Fusion Model-
LanguageBind(ViT-L/14)39.773.865.5LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment-
mPLUG-245.779.271.1mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video-
VAST55.579.674.3VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset-
Singularity-17M37.169.961.7Revealing Single Frame Bias for Video-and-Language Learning-
VIOLET23.559.849.8--
MILES27.263.650.3--
GRAM54.280.7-Gramian Multimodal Representation Learning and Alignment-
ALPRO23.857.947.3Align and Prompt: Video-and-Language Pre-training with Entity Prompts-
InternVideo31.568.257.6InternVideo: General Video Foundation Models via Generative and Discriminative Learning-
VideoCLIP16.6-46.9VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding-
FROZEN21.156.246.0Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval-
Y. Ge et. al.25.661.150.6Bridging Video-text Retrieval with Multiple Choice Questions-
HiTeA-5M36.170.360.1HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training-
OA-Trans23.559.850.4--
0 of 26 row(s) selected.
Zero Shot Video Retrieval On Didemo | SOTA | HyperAI