HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
动作分类
Action Classification On Charades
Action Classification On Charades
评估指标
MAP
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
MAP
Paper Title
Repository
TokenLearner
66.3
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TubeViT-L
66.2
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
MoViNet-A6
63.2
MoViNets: Mobile Video Networks for Efficient Video Recognition
DEEP-HAL with ODF+SDF (AssembleNet++)
62.29
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
-
AssembleNet++ 50
59.8
AssembleNet++: Assembling Modality Representations via Attention Connections
AssembleNet-101
58.6
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
AssembleNet
58.6
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
VicTR (ViT-L/14)
57.6
VicTR: Video-conditioned Text Representations for Activity Recognition
-
AssembleNet++ 50 without object
54.98
AssembleNet++: Assembling Modality Representations via Attention Connections
BIKE
50.7
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
DEEP-HAL with ODF+SDF (I3D)
50.16
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors
-
MoViNet-A4
48.5
MoViNets: Mobile Video Networks for Efficient Video Recognition
AdaFocus (weak supervision, MViT-B-24, 32x3)
47.8
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
-
MViT-B-24, 32x3 (Kinetics-600 pretraining)
47.7
Multiscale Vision Transformers
En-VidTr-L
47.3
VidTr: Video Transformer Without Convolutions
-
MViT-B, 32x3 (Kinetics-600 pretraining)
47.1
Multiscale Vision Transformers
MViT-B-24, 32x3 (Kinetics-400 pretraining)
46.3
Multiscale Vision Transformers
SlowFast (Kinetics-600 pretraining, NL)
45.2
SlowFast Networks for Video Recognition
ActionCLIP (ViT-B/16)
44.3
ActionCLIP: A New Paradigm for Video Action Recognition
MViT-B, 32x3 (Kinetics-400 pretraining)
44.3
Multiscale Vision Transformers
0 of 49 row(s) selected.
Previous
Next