Video Question Answering On Msrvtt Mc
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Paper Title | Repository |
---|---|---|---|
Singularity-temporal | 93.7 | Revealing Single Frame Bias for Video-and-Language Learning | |
Norton | 92.7 | Multi-granularity Correspondence Learning from Long-term Noisy Videos | |
HiTeA | 97.4 | HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training | - |
VindLU | 95.5 | VindLU: A Recipe for Effective Video-and-Language Pretraining | |
VIOLETv2 | 97.6 | An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling | |
Singularity | 92.1 | Revealing Single Frame Bias for Video-and-Language Learning | |
Clover | 95.2 | Clover: Towards A Unified Video-Language Alignment and Fusion Model |
0 of 7 row(s) selected.