Command Palette
Search for a command to run...
Barrios Wayner ; Soldan Mattia ; Ceballos-Arroyo Alberto Mario ; Heilbron Fabian Caba ; Ghanem Bernard

Abstract
The recent introduction of the large-scale, long-form MAD and Ego4D datasetshas enabled researchers to investigate the performance of currentstate-of-the-art methods for video grounding in the long-form setup, withinteresting findings: current grounding methods alone fail at tackling thischallenging task and setup due to their inability to process long videosequences. In this paper, we propose a method for improving the performance ofnatural language grounding in long videos by identifying and pruning outnon-describable windows. We design a guided grounding framework consisting of aGuidance Model and a base grounding model. The Guidance Model emphasizesdescribable windows, while the base grounding model analyzes short temporalwindows to determine which segments accurately match a given language query. Weoffer two designs for the Guidance Model: Query-Agnostic and Query-Dependent,which balance efficiency and accuracy. Experiments demonstrate that ourproposed method outperforms state-of-the-art models by 4.1% in MAD and 4.52% inEgo4D (NLQ), respectively. Code, data and MAD's audio features necessary toreproduce our experiments are available at:https://github.com/waybarrios/guidance-based-video-grounding.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| natural-language-moment-retrieval-on-mad | VLG-Net + Guidance Model | R@1,IoU=0.1: 5.60 R@1,IoU=0.3: 4.28 R@1,IoU=0.5: 2.48 R@10,IoU=0.1: 23.64 R@10,IoU=0.3: 19.86 R@10,IoU=0.5: 13.72 R@100,IoU=0.1: 55.59 R@100,IoU=0.3: 49.38 R@100,IoU=0.5: 39.12 R@5,IoU=0.1: 16.07 R@5,IoU=0.5: 8.78 R@50,IoU=0.1: 45.35 R@50,IoU=0.3: 39.77 R@50,IoU=0.5: 30.22 |
| natural-language-moment-retrieval-on-mad | Zero-Shot CLIP + Guidance Model | R@1,IoU=0.1: 9.3 R@1,IoU=0.3: 4.65 R@1,IoU=0.5: 2.16 R@10,IoU=0.1: 24.30 R@10,IoU=0.3: 17.73 R@10,IoU=0.5: 11.09 R@100,IoU=0.1: 47.35 R@100,IoU=0.3: 39.58 R@100,IoU=0.5: 29.68 R@5,IoU=0.1: 18.96 R@5,IoU=0.3: 13.06 R@5,IoU=0.5: 7.4 R@50,IoU=0.1: 39.79 R@50,IoU=0.3: 32.23 R@50,IoU=0.5: 23.21 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.