Command Palette
Search for a command to run...
Question Answering On Next Qa Open Ended
Metrics
Accuracy
Confidence Score
Results
Performance results of various models on this benchmark
| Paper Title | Repository | |||
|---|---|---|---|---|
| Flash-VStream | 61.6 | 3.4 | Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | |
| Vista-LLaMA | 60.7 | 3.4 | Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens | - |
| VideoChat | 56.6 | 3.2 | VideoChat: Chat-Centric Video Understanding | |
| MovieChat+ | 54.8 | 3.0 | MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | |
| Video-ChatGPT | 54.6 | 3.2 | Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models | |
| MovieChat | 49.9 | 2.7 | MovieChat: From Dense Token to Sparse Memory for Long Video Understanding |
0 of 6 row(s) selected.