Command Palette
Search for a command to run...

Abstract
In this report, we present our champion solutions to five tracks at Ego4Dchallenge. We leverage our developed InternVideo, a video foundation model, forfive Ego4D tasks, including Moment Queries, Natural Language Queries, FutureHand Prediction, State Change Object Detection, and Short-term ObjectInteraction Anticipation. InternVideo-Ego4D is an effective paradigm to adaptthe strong foundation model to the downstream ego-centric video understandingtasks with simple head designs. In these five tasks, the performance ofInternVideo-Ego4D comprehensively surpasses the baseline methods and thechampions of CVPR2022, demonstrating the powerful representation ability ofInternVideo as a video foundation model. Our code will be released athttps://github.com/OpenGVLab/ego4d-eccv2022-solutions
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| future-hand-prediction-on-ego4d | InternVideo | C.Disp(Left): 53.33 C.Disp(Right): 53.37 Disp(Total): 196.8 M.Disp(Left): 43.25 M.Disp(Right): 46.25 |
| short-term-object-interaction-anticipation-on | InternVideo | Noun (Top5 mAP): 24.6 Noun+TTC (Top5 mAP): 7.64 Noun+Verb(Top5 mAP): 9.18 Overall (Top5 mAP): 3.4 |
| state-change-object-detection-on-ego4d | InternVideo | AP: 37.19 AP50: 55.97 AP75: 38.44 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.