HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

Piergiovanni AJ ; Kuo Weicheng ; Angelova Anelia

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning

Abstract

We present a simple approach which can turn a ViT encoder into an efficientvideo model, which can seamlessly work with both image and video inputs. Bysparsely sampling the inputs, the model is able to do training and inferencefrom both inputs. The model is easily scalable and can be adapted tolarge-scale pre-trained ViTs without requiring full finetuning. The modelachieves SOTA results and the code will be open-sourced.

Code Repositories

daniel-code/TubeViT
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-charadesTubeViT-L
MAP: 66.2
action-classification-on-kinetics-400TubeVit-L (ImageNet-1k)
Acc@1: 90.2
Acc@5: 98.6
FLOPs (G) x views: 95300x4x3
Parameters (M): 307
action-classification-on-kinetics-400TubeViT-H (ImageNet-1k)
Acc@1: 90.9
Acc@5: 98.9
FLOPs (G) x views: 176400x4x3
Parameters (M): 632
action-classification-on-kinetics-400TubeVit-B (ImageNet-1k)
Acc@1: 88.6
Acc@5: 97.6
FLOPs (G) x views: 8700x3x4
Parameters (M): 86
action-classification-on-kinetics-600TubeVit-L
Top-1 Accuracy: 91.5
Top-5 Accuracy: 98.7
action-classification-on-kinetics-600TubeVit-B
Top-1 Accuracy: 90.9
Top-5 Accuracy: 97.3
action-classification-on-kinetics-600TubeVit-H
Top-1 Accuracy: 91.8
Top-5 Accuracy: 98.9
action-classification-on-kinetics-700TubeViT-L
Top-1 Accuracy: 83.8
Top-5 Accuracy: 96.6
action-recognition-in-videos-on-somethingTubeViT-L
Top-1 Accuracy: 76.1
Top-5 Accuracy: 95.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp