Command Palette
Search for a command to run...
Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection
Lu Ruiying ; Wu YuJie ; Tian Long ; Wang Dongsheng ; Chen Bo ; Liu Xiyang ; Hu Ruimin

Abstract
Unsupervised image Anomaly Detection (UAD) aims to learn robust anddiscriminative representations of normal samples. While separate solutions perclass endow expensive computation and limited generalizability, this paperfocuses on building a unified framework for multiple classes. Under such achallenging setting, popular reconstruction-based networks with continuouslatent representation assumption always suffer from the "identical shortcut"issue, where both normal and abnormal samples can be well recovered anddifficult to distinguish. To address this pivotal issue, we propose ahierarchical vector quantized prototype-oriented Transformer under aprobabilistic framework. First, instead of learning the continuousrepresentations, we preserve the typical normal patterns as discrete iconicprototypes, and confirm the importance of Vector Quantization in preventing themodel from falling into the shortcut. The vector quantized iconic prototype isintegrated into the Transformer for reconstruction, such that the abnormal datapoint is flipped to a normal data point.Second, we investigate an exquisitehierarchical framework to relieve the codebook collapse issue and replenishfrail normal patterns. Third, a prototype-oriented optimal transport method isproposed to better regulate the prototypes and hierarchically evaluate theabnormal score. By evaluating on MVTec-AD and VisA datasets, our modelsurpasses the state-of-the-art alternatives and possesses goodinterpretability. The code is available athttps://github.com/RuiyingLu/HVQ-Trans.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multi-class-anomaly-detection-on-mvtec-ad | HVQ-Trans | Detection AUROC: 98.0 Segmentation AUROC: 97.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.