HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MVT: Multi-view Vision Transformer for 3D Object Recognition

Chen Shuo ; Yu Tan ; Li Ping

MVT: Multi-view Vision Transformer for 3D Object Recognition

Abstract

Inspired by the great success achieved by CNN in image recognition,view-based methods applied CNNs to model the projected views for 3D objectunderstanding and achieved excellent performance. Nevertheless, multi-view CNNmodels cannot model the communications between patches from different views,limiting its effectiveness in 3D object recognition. Inspired by the recentsuccess gained by vision Transformer in image recognition, we propose aMulti-view Vision Transformer (MVT) for 3D object recognition. Since each patchfeature in a Transformer block has a global reception field, it naturallyachieves communications between patches from different views. Meanwhile, ittakes much less inductive bias compared with its CNN counterparts. Consideringboth effectiveness and efficiency, we develop a global-local structure for ourMVT. Our experiments on two public benchmarks, ModelNet40 and ModelNet10,demonstrate the competitive performance of our MVT.

Code Repositories

shanshuo/R2-MLP
pytorch
Mentioned in GitHub
shanshuo/MVT
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
3d-object-recognition-on-modelnet40MVT-small
Accuracy: 97.5%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp