Instance Segmentation On Coco

评估指标

AP50

AP75

APL

APM

APS

mask AP

评测结果

各个模型在此基准测试上的表现结果

							Paper Title	Repository
Co-DETR	80.2	63.4	72.0	60.1	41.6	57.1	DETRs with Collaborative Hybrid Assignments Training
CBNetV2 (EVA02, single-scale)	80.3	62.1	70.9	59.3	39.7	56.1	CBNet: A Composite Backbone Network Architecture for Object Detection
EVA	80.0	-	72.4	58.0	36.3	55.5	EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
FD-SwinV2-G	-	-	-	-	-	55.4	Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Mask Frozen-DETR	79.3	61.4	70.4	58.4	37.8	55.3	Mask Frozen-DETR: High Quality Instance Segmentation with One GPU	-
BEiT-3	-	-	-	-	-	54.8	Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
MasK DINO (SwinL, multi-scale)	-	-	-	-	-	54.7	Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
GLEE-Pro	-	-	-	-	-	54.5	General Object Foundation Model for Images and Videos at Scale
ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)	-	-	-	-	-	54.5	Vision Transformer Adapter for Dense Predictions
SwinV2-G (HTC++)	-	-	-	-	-	54.4	Swin Transformer V2: Scaling Up Capacity and Resolution
GLEE-Plus	-	-	-	-	-	53.3	General Object Foundation Model for Images and Videos at Scale
ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)	-	-	-	-	-	53.0	Vision Transformer Adapter for Dense Predictions
Soft Teacher + Swin-L (HTC++, multi-scale)	-	-	-	-	-	53.0	End-to-End Semi-Supervised Object Detection with Soft Teacher
Mask DINO (SwinL, single -scale)	-	-	-	-	-	52.8	Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)	-	-	-	-	-	52.5	Vision Transformer Adapter for Dense Predictions
CBNetV2 (Dual-Swin-L HTC, multi-scale)	-	-	-	-	-	52.3	CBNet: A Composite Backbone Network Architecture for Object Detection
UNINEXT-H	76.2	56.7	67.5	55.9	33.3	51.8	Universal Instance Perception as Object Discovery and Retrieval
CBNetV2 (Dual-Swin-L HTC, single-scale)	-	-	-	-	-	51.6	CBNet: A Composite Backbone Network Architecture for Object Detection
Focal-L (HTC++, multi-scale)	75.4	56.5	64.2	-	35.6	51.3	Focal Self-attention for Local-Global Interactions in Vision Transformers
Swin-L (HTC++, multi scale)	-	-	-	-	-	51.1	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

0 of 112 row(s) selected.

Command Palette

Instance Segmentation On Coco

评估指标

评测结果