HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
动作识别
Action Recognition In Videos On Something
Action Recognition In Videos On Something
评估指标
Top-1 Accuracy
Top-5 Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Top-1 Accuracy
Top-5 Accuracy
Paper Title
Repository
MVD (Kinetics400 pretrain, ViT-H, 16 frame)
77.3
95.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
InternVideo
77.2
-
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
InternVideo2-1B
77.1
-
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VideoMAE V2-g
77.0
95.9
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MVD (Kinetics400 pretrain, ViT-L, 16 frame)
76.7
95.5
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Hiera-L (no extra data)
76.5
-
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
TubeViT-L
76.1
95.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
VideoMAE (no extra data, ViT-L, 32x2)
75.4
95.2
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Side4Video (EVA ViT-E/14)
75.2
94.0
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
MaskFeat (Kinetics600 pretrain, MViT-L)
75.0
95.0
Masked Feature Prediction for Self-Supervised Visual Pre-Training
MAR (50% mask, ViT-L, 16x4)
74.7
94.9
MAR: Masked Autoencoders for Efficient Action Recognition
ATM
74.6
94.4
What Can Simple Arithmetic Operations Do for Temporal Modeling?
MAWS (ViT-L)
74.4
-
The effectiveness of MAE pre-pretraining for billion-scale pretraining
VideoMAE (no extra data, ViT-L, 16frame)
74.3
94.6
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MAR (75% mask, ViT-L, 16x4)
73.8
94.4
MAR: Masked Autoencoders for Efficient Action Recognition
ViC-MAE (ViT-L)
73.7
-
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
MVD (Kinetics400 pretrain, ViT-B, 16 frame)
73.7
94.0
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
TAdaFormer-L/14
73.6
-
Temporally-Adaptive Models for Efficient Video Understanding
TDS-CLIP-ViT-L/14(8frames)
73.4
93.8
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
AMD(ViT-B/16)
73.3
94.0
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
-
0 of 122 row(s) selected.
Previous
Next
Action Recognition In Videos On Something | SOTA | HyperAI超神经