Command Palette
Search for a command to run...
Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities
Sun Jingyuan ; Li Mingxiao ; Chen Zijiao ; Zhang Yunhao ; Wang Shaonan ; Moens Marie-Francine

Abstract
Decoding visual stimuli from neural responses recorded by functional MagneticResonance Imaging (fMRI) presents an intriguing intersection between cognitiveneuroscience and machine learning, promising advancements in understandinghuman visual perception and building non-invasive brain-machine interfaces.However, the task is challenging due to the noisy nature of fMRI signals andthe intricate pattern of brain visual representations. To mitigate thesechallenges, we introduce a two-phase fMRI representation learning framework.The first phase pre-trains an fMRI feature learner with a proposedDouble-contrastive Mask Auto-encoder to learn denoised representations. Thesecond phase tunes the feature learner to attend to neural activation patternsmost informative for visual reconstruction with guidance from an imageauto-encoder. The optimized fMRI feature learner then conditions a latentdiffusion model to reconstruct image stimuli from brain activities.Experimental results demonstrate our model's superiority in generatinghigh-resolution and semantically accurate images, substantially exceedingprevious state-of-the-art methods by 39.34% in the 50-way-top-1 semanticclassification accuracy. Our research invites further exploration of thedecoding task's potential and contributes to the development of non-invasivebrain-machine interfaces.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| brain-visual-reconstruction-from-fmri-on-god | DC-LDM | 50-way-top1-classfication accuract: 17.999 |
| brain-visual-reconstruction-from-fmri-on-god | CAD (this paper) | 50-way-top1-classfication accuract: 25.080 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.