LLMs Can Capture Human Brain Responses to Everyday Scenes, Paving the Way for Smarter AI and Brain-Computer Interfaces
Large language models (LLMs) can now capture the way the human brain interprets everyday visual scenes, according to a groundbreaking study published in Nature Machine Intelligence. Researchers led by Ian Charest, associate professor of psychology at Université de Montréal and member of Mila—Quebec AI Institute, have shown that LLMs can generate a "language-based fingerprint" of complex visual scenes that closely matches actual brain activity. When people view natural scenes—like a group of children playing or a bustling city skyline—their brains process far more than just individual objects. They interpret context, action, location, and relationships between elements. Until now, measuring this rich, holistic understanding has been a major challenge for neuroscience. The team used LLMs, the same AI technology behind tools like ChatGPT, to analyze written descriptions of scenes. By processing these descriptions, the models created a semantic representation—essentially a digital blueprint of what the scene means. Remarkably, these representations aligned closely with brain activity patterns recorded via fMRI scans when people viewed the same scenes. “This means we can now decode the visual scene a person has just seen, just from a sentence describing it,” Charest explained. “We can also predict exactly how the brain will respond to scenes involving food, places, or human faces, based solely on the LLM’s internal representations.” Even more striking, the researchers trained artificial neural networks to predict these LLM-based fingerprints directly from images. These models outperformed many state-of-the-art computer vision systems—despite being trained on significantly less data. The work was supported by Tim Kietzmann and his team at the University of Osnabrück, Germany, and led by first author Adrien Doerig of Freie Universität Berlin. The findings suggest that the human brain may represent complex visual scenes in a way that mirrors how modern language models process language. “This is a major step toward understanding how the brain extracts meaning from what we see,” Charest said. “It opens new doors for decoding thoughts, improving brain-computer interfaces, and building AI systems that perceive the world more like humans do.” Potential applications include smarter self-driving cars that better understand real-world scenes, and future visual prostheses that restore vision for people with severe impairments. Ultimately, the study deepens our understanding of how the human mind interprets the visual world—and how AI might one day emulate that ability.