HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
物体检测
Object Detection On Coco Minival
Object Detection On Coco Minival
评估指标
AP50
AP75
APL
APM
APS
box AP
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
AP50
AP75
APL
APM
APS
box AP
Paper Title
Repository
Co-DETR
-
-
-
-
-
65.9
DETRs with Collaborative Hybrid Assignments Training
M3I Pre-training (InternImage-H)
-
-
-
-
-
65.0
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
InternImage-H
-
-
-
-
-
65.0
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Co-DETR (Swin-L)
-
-
-
-
-
64.7
DETRs with Collaborative Hybrid Assignments Training
Focal-Stable-DINO (Focal-Huge, no TTA)
81.5
71.4
78.5
68.5
50.4
64.6
A Strong and Reproducible Object Detector with Only Public Datasets
EVA
82.1
70.8
78.5
68.4
49.4
64.5
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
ViT-CoMer
-
-
-
-
-
64.3
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
-
FocalNet-H (DINO)
-
-
-
-
-
64.2
Focal Modulation Networks
InternImage-XL
-
-
-
-
-
64.2
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
CP-DETR-L Swin-L(Fine tuning separately in COCO)
-
-
-
-
-
64.1
CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection
-
RevCol-H(DINO)
-
-
-
-
-
63.8
Reversible Column Networks
DINO (Swin-L)
-
-
-
-
-
63.2
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Grounding DINO
-
-
-
-
-
63.0
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
SwinV2-G (HTC++)
-
-
-
-
-
62.5
Swin Transformer V2: Scaling Up Capacity and Resolution
GLEE-Pro
-
-
-
-
-
62.0
General Object Foundation Model for Images and Videos at Scale
Florence-CoSwin-H
-
-
-
-
-
62
Florence: A New Foundation Model for Computer Vision
ViTDet, ViT-H Cascade (multiscale)
-
-
-
-
-
61.3
Exploring Plain Vision Transformer Backbones for Object Detection
GLIP (Swin-L, multi-scale)
-
-
-
-
-
60.8
Grounded Language-Image Pre-training
Soft Teacher + Swin-L (HTC++, multi-scale)
-
-
-
-
-
60.7
End-to-End Semi-Supervised Object Detection with Soft Teacher
UNINEXT-H
77.5
66.7
75.3
64.8
45.1
60.6
Universal Instance Perception as Object Discovery and Retrieval
0 of 219 row(s) selected.
Previous
Next