Command Palette
Search for a command to run...
Wang Zhifeng ; Zhang Kaihao ; Luo Wenhan ; Sankaranarayana Ramesh

Abstract
Facial expression is related to facial muscle contractions and differentmuscle movements correspond to different emotional states. For micro-expressionrecognition, the muscle movements are usually subtle, which has a negativeimpact on the performance of current facial emotion recognition algorithms.Most existing methods use self-attention mechanisms to capture relationshipsbetween tokens in a sequence, but they do not take into account the inherentspatial relationships between facial landmarks. This can result in sub-optimalperformance on micro-expression recognition tasks.Therefore, learning torecognize facial muscle movements is a key challenge in the area ofmicro-expression recognition. In this paper, we propose a HierarchicalTransformer Network (HTNet) to identify critical areas of facial musclemovement. HTNet includes two major components: a transformer layer thatleverages the local temporal features and an aggregation layer that extractslocal and global semantical facial features. Specifically, HTNet divides theface into four different facial areas: left lip area, left eye area, right eyearea and right lip area. The transformer layer is used to focus on representinglocal minor muscle movement with local self-attention in each area. Theaggregation layer is used to learn the interactions between eye areas and lipareas. The experiments on four publicly available micro-expression datasetsshow that the proposed approach outperforms previous methods by a large margin.The codes and models are available at:\url{https://github.com/wangzhifengharrison/HTNet}
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| micro-expression-recognition-on-casme-ii-1 | HTNet | UAR: 95.16 UF1: 95.32 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.