8 months ago

Object Detection

Multi-Task Learning

Method/Architecture

Computer Vision

Xiangyu Zhao Yicheng Chen Shilin Xu Xiangtai Li Xinjiang Wang Yining Li Haian Huang

Abstract

Grounding-DINO is a state-of-the-art open-set detection model that tacklesmultiple vision tasks including Open-Vocabulary Detection (OVD), PhraseGrounding (PG), and Referring Expression Comprehension (REC). Its effectivenesshas led to its widespread adoption as a mainstream architecture for variousdownstream applications. However, despite its significance, the originalGrounding-DINO model lacks comprehensive public technical details due to theunavailability of its training code. To bridge this gap, we presentMM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline,which is built with the MMDetection toolbox. It adopts abundant vision datasetsfor pre-training and various detection and grounding datasets for fine-tuning.We give a comprehensive analysis of each reported result and detailed settingsfor reproduction. The extensive experiments on the benchmarks mentioneddemonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tinybaseline. We release all our models to the research community. Codes andtrained models are released athttps://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Object Detection

Multi-Task Learning

Method/Architecture

Computer Vision

Xiangyu Zhao Yicheng Chen Shilin Xu Xiangtai Li Xinjiang Wang Yining Li Haian Huang

Abstract

Grounding-DINO is a state-of-the-art open-set detection model that tacklesmultiple vision tasks including Open-Vocabulary Detection (OVD), PhraseGrounding (PG), and Referring Expression Comprehension (REC). Its effectivenesshas led to its widespread adoption as a mainstream architecture for variousdownstream applications. However, despite its significance, the originalGrounding-DINO model lacks comprehensive public technical details due to theunavailability of its training code. To bridge this gap, we presentMM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline,which is built with the MMDetection toolbox. It adopts abundant vision datasetsfor pre-training and various detection and grounding datasets for fine-tuning.We give a comprehensive analysis of each reported result and detailed settingsfor reproduction. The extensive experiments on the benchmarks mentioneddemonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tinybaseline. We release all our models to the research community. Codes andtrained models are released athttps://github.com/open-mmlab/mmdetection/tree/main/configs/mm_grounding_dino.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp