DVIS-DAQ(VIT-L, Offline) | 86.1 | 72.2 | 49.6 | 70.7 | 64.5 | DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | - |
CAVIS(VIT-L, Offline) | 87.3 | 73.2 | 49.7 | 70.3 | 65.3 | Context-Aware Video Instance Segmentation | - |
TarViS (Swin-L) | 81.4 | 67.6 | 47.6 | 64.8 | 60.2 | TarViS: A Unified Approach for Target-based Video Segmentation | - |
InstanceFormer (Swin-L) | 73.7 | 56.9 | 42.8 | 56.0 | 51.0 | InstanceFormer: An Online Video Instance Segmentation Framework | - |
DVIS++(VIT-L, Online) | 82.7 | 70.2 | 49.5 | 68.0 | 62.3 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | - |
RefineVIS (Swin-L, online) | 84.1 | 68.5 | 48.3 | 65.2 | 61.4 | RefineVIS: Video Instance Segmentation with Temporal Attention Refinement | - |
GenVIS (Swin-L) | 80.9 | 66.5 | 49.1 | 64.7 | 60.1 | A Generalized Framework for Video Instance Segmentation | - |
DVIS(Swin-L) | 83.0 | 68.4 | 47.7 | 65.7 | 60.1 | DVIS: Decoupled Video Instance Segmentation Framework | - |
TarViS (Swin-T) | 71.6 | 56.6 | 42.2 | 57.2 | 50.9 | TarViS: A Unified Approach for Target-based Video Segmentation | - |
DVIS++(VIT-L, Offline) | 86.7 | 71.5 | 48.8 | 69.5 | 63.9 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | - |