Command Palette
Search for a command to run...
Bravo María A. ; Mittal Sudhanshu ; Ging Simon ; Brox Thomas

Abstract
Vision-language modeling has enabled open-vocabulary tasks where predictionscan be queried using any text prompt in a zero-shot manner. Existingopen-vocabulary tasks focus on object classes, whereas research on objectattributes is limited due to the lack of a reliable attribute-focusedevaluation benchmark. This paper introduces the Open-Vocabulary AttributeDetection (OVAD) task and the corresponding OVAD benchmark. The objective ofthe novel task and benchmark is to probe object-level attribute informationlearned by vision-language models. To this end, we created a clean and denselyannotated test set covering 117 attribute classes on the 80 object classes ofMS COCO. It includes positive and negative annotations, which enablesopen-vocabulary evaluation. Overall, the benchmark consists of 1.4 millionannotations. For reference, we provide a first baseline method foropen-vocabulary attribute detection. Moreover, we demonstrate the benchmark'svalue by studying the attribute detection performance of several foundationmodels. Project page https://ovad-benchmark.github.io
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| open-vocabulary-attribute-detection-on-ovad | OVAD-Baseline (ResNet50) | mean average precision: 18.8 |
| open-vocabulary-attribute-detection-on-ovad-1 | OVAD-Baseline-Box | mean average precision: 21.4 |
| open-vocabulary-object-detection-on-mscoco | OVAD-Baseline | AP 0.5: 30.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.