Semantic Segmentation On Ade20K

评估指标

GFLOPs

Params (M)

Validation mIoU

评测结果

各个模型在此基准测试上的表现结果

				Paper Title	Repository
ONE-PEACE	-	1500	63.0	ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
M3I Pre-training (InternImage-H)	-	1310	62.9	Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
InternImage-H	4635	1310	62.9	InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
BEiT-3	-	1900	62.8	Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
EVA	-	1074	62.3	EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)	-	571	61.5	Vision Transformer Adapter for Dense Predictions
FD-SwinV2-G	-	3000	61.4	Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
RevCol-H (Mask2Former)	-	2439	61.0	Reversible Column Networks
MasK DINO (SwinL, multi-scale)	-	223	60.8	Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (Mask2Former, BEiT pretrain)	-	571	60.5	Vision Transformer Adapter for Dense Predictions
DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)	-	1080	60.2	DINOv2: Learning Robust Visual Features without Supervision
SwinV2-G(UperNet)	-	-	59.9	Swin Transformer V2: Scaling Up Capacity and Resolution
SERNet-Former	-	-	59.35	SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks
FocalNet-L (Mask2Former)	-	-	58.5	Focal Modulation Networks
ViT-Adapter-L (UperNet, BEiT pretrain)	-	451	58.4	Vision Transformer Adapter for Dense Predictions
RSSeg-ViT-L (BEiT pretrain)	-	330	58.4	Representation Separation for Semantic Segmentation with Vision Transformers	-
SeMask (SeMask Swin-L MSFaPN-Mask2Former)	-	-	58.2	SeMask: Semantically Masked Transformers for Semantic Segmentation
SegViT-v2 (BEiT-v2-Large)	-	-	58.2	SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
SeMask (SeMask Swin-L FaPN-Mask2Former)	-	-	58.2	SeMask: Semantically Masked Transformers for Semantic Segmentation
DiNAT-L (Mask2Former)	-	-	58.1	Dilated Neighborhood Attention Transformer

0 of 230 row(s) selected.

Command Palette

Semantic Segmentation On Ade20K

评估指标

评测结果