Command Palette
Search for a command to run...
Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation
Jin Youngwan ; Park Incheol ; Song Hanbin ; Ju Hyeongjin ; Nalcakan Yagiz ; Kim Shiho

Abstract
This paper proposes Pix2Next, a novel image-to-image translation frameworkdesigned to address the challenge of generating high-quality Near-Infrared(NIR) images from RGB inputs. Our approach leverages a state-of-the-art VisionFoundation Model (VFM) within an encoder-decoder architecture, incorporatingcross-attention mechanisms to enhance feature integration. This design capturesdetailed global representations and preserves essential spectralcharacteristics, treating RGB-to-NIR translation as more than a simple domaintransfer problem. A multi-scale PatchGAN discriminator ensures realistic imagegeneration at various detail levels, while carefully designed loss functionscouple global context understanding with local feature preservation. Weperformed experiments on the RANUS dataset to demonstrate Pix2Next's advantagesin quantitative metrics and visual quality, improving the FID score by 34.81%compared to existing methods. Furthermore, we demonstrate the practical utilityof Pix2Next by showing improved performance on a downstream object detectiontask using generated NIR data to augment limited real NIR datasets. Theproposed approach enables the scaling up of NIR datasets without additionaldata acquisition or annotation efforts, potentially accelerating advancementsin NIR-based computer vision applications.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-to-image-translation-on-flir | Pix2Next | PSNR: 23.45 SSIM: 0.66 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.