Command Palette
Search for a command to run...
Yuxiang Lu Shalayiding Sirejiding Yue Ding Chunlin Wang Hongtao Lu

Abstract
Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| boundary-detection-on-nyu-depth-v2 | PGT (Swin-T) | odsF: 77.05 |
| boundary-detection-on-nyu-depth-v2 | PGT (Swin-S) | odsF: 78.04 |
| monocular-depth-estimation-on-nyu-depth-v2 | PGT (Swin-S) | RMSE: 0.5468 |
| monocular-depth-estimation-on-nyu-depth-v2 | PGT (Swin-T) | RMSE: 0.59 |
| semantic-segmentation-on-nyu-depth-v2 | PGT (Swin-T) | Mean IoU: 41.61 |
| semantic-segmentation-on-nyu-depth-v2 | PGT (Swin-S) | Mean IoU: 46.43 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.