Command Palette
Search for a command to run...
Psomas Bill ; Kakogeorgiou Ioannis ; Efthymiadis Nikos ; Tolias Giorgos ; Chum Ondrej ; Avrithis Yannis ; Karantzalos Konstantinos

Abstract
This work introduces composed image retrieval to remote sensing. It allows toquery a large image archive by image examples alternated by a textualdescription, enriching the descriptive power over unimodal queries, eithervisual or textual. Various attributes can be modified by the textual part, suchas shape, color, or context. A novel method fusing image-to-image andtext-to-image similarity is introduced. We demonstrate that a vision-languagemodel possesses sufficient descriptive power and no further learning step ortraining data are necessary. We present a new evaluation benchmark focused oncolor, context, density, existence, quantity, and shape modifications. Our worknot only sets the state-of-the-art for this task, but also serves as afoundational step in addressing a gap in the field of remote sensing imageretrieval. Code at: https://github.com/billpsomas/rscir
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| zero-shot-composed-image-retrieval-zs-cir-on-10 | WeiCom (CLIP) | mAP: 24.83 |
| zero-shot-composed-image-retrieval-zs-cir-on-10 | WeiCom (RemoteCLIP) | mAP: 30.19 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.