HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

Mykola Lavreniuk Shariq Farooq Bhat Matthias Müller Peter Wonka

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

Abstract

This work presents the network architecture EVP (Enhanced Visual Perception). EVP builds on the previous work VPD which paved the way to use the Stable Diffusion network for computer vision tasks. We propose two major enhancements. First, we develop the Inverse Multi-Attentive Feature Refinement (IMAFR) module which enhances feature learning capabilities by aggregating spatial information from higher pyramid levels. Second, we propose a novel image-text alignment module for improved feature extraction of the Stable Diffusion backbone. The resulting architecture is suitable for a wide variety of tasks and we demonstrate its performance in the context of single-image depth estimation with a specialized decoder using classification-based bins and referring segmentation with an off-the-shelf decoder. Comprehensive experiments conducted on established datasets show that EVP achieves state-of-the-art results in single-image depth estimation for indoor (NYU Depth v2, 11.8% RMSE improvement over VPD) and outdoor (KITTI) environments, as well as referring segmentation (RefCOCO, 2.53 IoU improvement over ReLA). The code and pre-trained models are publicly available at https://github.com/Lavreniuk/EVP.

Code Repositories

lavreniuk/evp
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
depth-estimation-on-nyu-depth-v2EVP
RMS: 0.224
monocular-depth-estimation-on-kitti-eigenEVP
Delta u003c 1.25: 0.980
Delta u003c 1.25^2: 0.998
Delta u003c 1.25^3: 1.000
RMSE: 2.015
RMSE log: 0.073
Sq Rel: 0.136
absolute relative error: 0.048
monocular-depth-estimation-on-nyu-depth-v2EVP
Delta u003c 1.25: 0.976
Delta u003c 1.25^2: 0.997
Delta u003c 1.25^3: 0.999
RMSE: 0.224
absolute relative error: 0.061
log 10: 0.027
referring-expression-segmentation-on-refcoco-6EVP
IoU: 77.61
IoU (%): 77.61
referring-expression-segmentation-on-refcoco-8EVP
Overall IoU: 78.75
referring-expression-segmentation-on-refcoco-9EVP
Overall IoU: 72.94

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp