HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
动作识别
Action Recognition In Videos On Something 1
Action Recognition In Videos On Something 1
评估指标
GFLOPs
Param.
Top 1 Accuracy
Top 5 Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
GFLOPs
Param.
Top 1 Accuracy
Top 5 Accuracy
Paper Title
Repository
InternVideo
-
-
70.0
-
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
VideoMAE V2-g
-
-
68.7
91.9
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Side4Video (EVA ViT-E/14
-
-
67.3
88.8
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
ATM
-
-
65.6
88.6
What Can Simple Arithmetic Operations Do for Temporal Modeling?
TAdaFormer-L/14
-
-
63.7
-
Temporally-Adaptive Models for Efficient Video Understanding
TDS-CLIP-ViT-L/14(8frames)
-
-
63.0
87.8
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
UniFormerV2-L
-
-
62.7
88.0
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
-
StructVit-B-4-1
-
-
61.3
-
Learning Correlation Structures for Vision Transformers
-
UniFormer-B (IN-1K + Kinetics400)
259x3
50.1
60.9
87.3
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
-
TAdaConvNeXtV2-B
-
-
60.7
-
Temporally-Adaptive Models for Efficient Video Understanding
TPS
-
-
58.3
-
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
MSMA (8+16frames)
-
-
57.9
-
Multi-scale Motion-Aware Module for Video Action Recognition
-
UniFormer-B (IN-1K + Kinetics600)
41.8x3
21.4
57.6
84.9
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
-
SIFA
-
-
57.3
-
Stand-Alone Inter-Frame Attention in Video Models
TCM (Ensemble)
-
-
57.2
-
Motion-driven Visual Tempo Learning for Video-based Action Recognition
EAN ResNet50 (single clip, center crop,8+16 ensemble, with sparse Transformer)
-
-
57.2
83.9
EAN: Event Adaptive Network for Enhanced Action Recognition
BQNEn (ImageNet + K400 pretrained)
-
-
57.1
84.2
Busy-Quiet Video Disentangling for Video Classification
TDN ResNet101 (one clip, center crop, 8+16 ensemble, ImageNet pretrained, RGB only)
-
-
56.8
84.1
TDN: Temporal Difference Networks for Efficient Action Recognition
CT-Net Ensemble (R50, 8+12+16+24)
-
-
56.6
-
CT-Net: Channel Tensorization Network for Video Classification
MoDS (8+16frames)
-
-
56.6
-
Action Recognition With Motion Diversification and Dynamic Selection
-
0 of 74 row(s) selected.
Previous
Next
Action Recognition In Videos On Something 1 | SOTA | HyperAI超神经