HyperAI
HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Action Recognition
Action Recognition On Diving 48
Action Recognition On Diving 48
Metrics
Accuracy
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy
Paper Title
Repository
ORViT TimeSformer
88.0
Object-Region Video Transformers
-
VIMPAC
85.5
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
-
SlowFast
77.6
SlowFast Networks for Video Recognition
-
LVMAE
94.9
Extending Video Masked Autoencoders to 128 frames
-
StructVit-B-4-1
88.3
Learning Correlation Structures for Vision Transformers
-
TimeSformer
75
Is Space-Time Attention All You Need for Video Understanding?
-
DUALPATH
88.7
Dual-path Adaptation from Image to Video Transformers
-
TimeSformer-HR
78
Is Space-Time Attention All You Need for Video Understanding?
-
TFCNet
88.3
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
-
Video-FocalNet-B
90.8
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
-
AIM (CLIP ViT-L/14, 32x224)
90.6
AIM: Adapting Image Models for Efficient Video Action Recognition
-
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Relational Self-Attention: What's Missing in Attention for Video Understanding
-
GC-TDN
87.6
Group Contextualization for Video Recognition
-
BEVT
86.7
BEVT: BERT Pretraining of Video Transformers
-
PMI Sampler
81.3
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
-
TQN
81.8
Temporal Query Networks for Fine-grained Video Understanding
-
TimeSformer-L
81
Is Space-Time Attention All You Need for Video Understanding?
-
PSB
86
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
-
0 of 18 row(s) selected.
Previous
Next
Action Recognition On Diving 48 | SOTA | HyperAI