HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
动作识别
Action Recognition On Ava V2 2
Action Recognition On Ava V2 2
评估指标
mAP
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
mAP
Paper Title
Repository
LART (Hiera-H, K700 PT+FT)
45.1
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Hiera-H (K700 PT+FT)
43.3
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
VideoMAE V2-g
42.6
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
STAR/L
41.7
End-to-End Spatio-Temporal Action Localisation with Video Transformers
-
MVD (Kinetics400 pretrain+finetune, ViT-H, 16x4)
41.1
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
InternVideo
41.01
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
MVD (Kinetics400 pretrain, ViT-H, 16x4)
40.1
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
MaskFeat (Kinetics-600 pretrain, MViT-L)
39.8
Masked Feature Prediction for Self-Supervised Visual Pre-Training
UMT-L (ViT-L/16)
39.8
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
VideoMAE (K400 pretrain+finetune, ViT-H, 16x4)
39.5
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
VideoMAE (K700 pretrain+finetune, ViT-L, 16x4)
39.3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain+finetune, ViT-L, 16x4)
38.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
VideoMAE (K400 pretrain+finetune, ViT-L, 16x4)
37.8
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain, ViT-L, 16x4)
37.7
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
VideoMAE (K400 pretrain, ViT-H, 16x4)
36.5
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
VideoMAE (K700 pretrain, ViT-L, 16x4)
36.1
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MeMViT-24
35.4
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
MViTv2-L (IN21k, K700)
34.4
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
VideoMAE (K400 pretrain, ViT-L, 16x4)
34.3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
MVD (Kinetics400 pretrain+finetune, ViT-B, 16x4)
34.2
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
0 of 38 row(s) selected.
Previous
Next
Action Recognition On Ava V2 2 | SOTA | HyperAI超神经