HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
实例分割
Instance Segmentation On Coco
Instance Segmentation On Coco
评估指标
AP50
AP75
APL
APM
APS
mask AP
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
AP50
AP75
APL
APM
APS
mask AP
Paper Title
Repository
Co-DETR
80.2
63.4
72.0
60.1
41.6
57.1
DETRs with Collaborative Hybrid Assignments Training
CBNetV2 (EVA02, single-scale)
80.3
62.1
70.9
59.3
39.7
56.1
CBNet: A Composite Backbone Network Architecture for Object Detection
EVA
80.0
-
72.4
58.0
36.3
55.5
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
FD-SwinV2-G
-
-
-
-
-
55.4
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Mask Frozen-DETR
79.3
61.4
70.4
58.4
37.8
55.3
Mask Frozen-DETR: High Quality Instance Segmentation with One GPU
-
BEiT-3
-
-
-
-
-
54.8
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
MasK DINO (SwinL, multi-scale)
-
-
-
-
-
54.7
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
GLEE-Pro
-
-
-
-
-
54.5
General Object Foundation Model for Images and Videos at Scale
ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)
-
-
-
-
-
54.5
Vision Transformer Adapter for Dense Predictions
SwinV2-G (HTC++)
-
-
-
-
-
54.4
Swin Transformer V2: Scaling Up Capacity and Resolution
GLEE-Plus
-
-
-
-
-
53.3
General Object Foundation Model for Images and Videos at Scale
ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)
-
-
-
-
-
53.0
Vision Transformer Adapter for Dense Predictions
Soft Teacher + Swin-L (HTC++, multi-scale)
-
-
-
-
-
53.0
End-to-End Semi-Supervised Object Detection with Soft Teacher
Mask DINO (SwinL, single -scale)
-
-
-
-
-
52.8
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)
-
-
-
-
-
52.5
Vision Transformer Adapter for Dense Predictions
CBNetV2 (Dual-Swin-L HTC, multi-scale)
-
-
-
-
-
52.3
CBNet: A Composite Backbone Network Architecture for Object Detection
UNINEXT-H
76.2
56.7
67.5
55.9
33.3
51.8
Universal Instance Perception as Object Discovery and Retrieval
CBNetV2 (Dual-Swin-L HTC, single-scale)
-
-
-
-
-
51.6
CBNet: A Composite Backbone Network Architecture for Object Detection
Focal-L (HTC++, multi-scale)
75.4
56.5
64.2
-
35.6
51.3
Focal Self-attention for Local-Global Interactions in Vision Transformers
Swin-L (HTC++, multi scale)
-
-
-
-
-
51.1
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
0 of 112 row(s) selected.
Previous
Next