HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
语义分割
Semantic Segmentation On Ade20K
Semantic Segmentation On Ade20K
评估指标
GFLOPs
Params (M)
Validation mIoU
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
GFLOPs
Params (M)
Validation mIoU
Paper Title
Repository
ONE-PEACE
-
1500
63.0
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
M3I Pre-training (InternImage-H)
-
1310
62.9
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
InternImage-H
4635
1310
62.9
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
BEiT-3
-
1900
62.8
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
EVA
-
1074
62.3
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
ViT-Adapter-L (Mask2Former, BEiTv2 pretrain)
-
571
61.5
Vision Transformer Adapter for Dense Predictions
FD-SwinV2-G
-
3000
61.4
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
RevCol-H (Mask2Former)
-
2439
61.0
Reversible Column Networks
MasK DINO (SwinL, multi-scale)
-
223
60.8
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation
ViT-Adapter-L (Mask2Former, BEiT pretrain)
-
571
60.5
Vision Transformer Adapter for Dense Predictions
DINOv2 (ViT-g/14 frozen model, w/ ViT-Adapter + Mask2former)
-
1080
60.2
DINOv2: Learning Robust Visual Features without Supervision
SwinV2-G(UperNet)
-
-
59.9
Swin Transformer V2: Scaling Up Capacity and Resolution
SERNet-Former
-
-
59.35
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks
FocalNet-L (Mask2Former)
-
-
58.5
Focal Modulation Networks
ViT-Adapter-L (UperNet, BEiT pretrain)
-
451
58.4
Vision Transformer Adapter for Dense Predictions
RSSeg-ViT-L (BEiT pretrain)
-
330
58.4
Representation Separation for Semantic Segmentation with Vision Transformers
-
SeMask (SeMask Swin-L MSFaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
SegViT-v2 (BEiT-v2-Large)
-
-
58.2
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
SeMask (SeMask Swin-L FaPN-Mask2Former)
-
-
58.2
SeMask: Semantically Masked Transformers for Semantic Segmentation
DiNAT-L (Mask2Former)
-
-
58.1
Dilated Neighborhood Attention Transformer
0 of 230 row(s) selected.
Previous
Next