HyperAI
Back to Headlines

UC Riverside and Google develop AI system to detect deepfakes beyond face swaps and altered speech

8 days ago

In an era where manipulated videos pose significant threats to truth and trust, UC Riverside researchers have developed a groundbreaking AI system capable of detecting forgeries beyond traditional face swaps and speech alterations. The tool, called the Universal Network for Identifying Tampered and Synthetic Videos (UNITE), addresses a critical gap in current deepfake detection methods, which often fail to analyze content without visible faces. Amit Roy-Chowdhury, a professor of electrical and computer engineering at the University of California, Riverside, and doctoral candidate Rohit Kundu, collaborated with Google scientists to create UNITE. The system leverages a transformer-based deep learning model to examine entire video frames, including backgrounds and motion patterns, rather than focusing solely on facial features. This approach makes it one of the first tools designed to identify synthetic or altered videos that lack explicit facial content, a common tactic in modern disinformation campaigns. Kundu, a key researcher on the project, highlighted the evolving nature of deepfake technology. “Deepfakes have advanced beyond face swaps,” he explained. “Today, entire scenes can be fabricated—from faces to backgrounds—using generative AI. Our system is built to catch these complex manipulations.” The rise of text-to-video and image-to-video platforms has made such forgeries more accessible, enabling even non-experts to create convincing fake content. UNITE’s innovation lies in its ability to detect subtle spatial and temporal inconsistencies in videos. It uses a training method called “attention-diversity loss,” which forces the model to analyze multiple visual regions in each frame, avoiding over-reliance on faces. This technique allows the system to identify tampering in scenarios ranging from simple facial alterations to fully synthetic videos generated without real footage. The research, presented at the 2025 Conference on Computer Vision and Pattern Recognition (CVPR), is based on a foundational AI framework known as SigLIP. This system extracts features that are not tied to specific individuals or objects, enabling broader detection capabilities. The collaboration with Google, where Kundu interned, provided access to extensive datasets and computational resources, crucial for training the model on diverse synthetic content. Co-authors of the paper include Google researchers Hao Xiong, Vishal Mohanty, and Athula Balachandra. While still in development, UNITE has potential applications for social media platforms, fact-checkers, and news organizations seeking to combat the spread of manipulated videos. Kundu emphasized the urgency of the work. “As AI becomes better at faking reality, we must improve our ability to reveal the truth,” he said. The system’s universal design aims to address the growing complexity of synthetic media, which can distort narratives by altering scenes, environments, or even entire visual contexts. Existing detectors often fail when no face is present, leaving gaps in identifying disinformation that relies on background changes or motion anomalies. UNITE’s focus on holistic video analysis could mitigate this risk, offering a more robust defense against evolving threats. The project underscores the importance of interdisciplinary collaboration in tackling AI’s dual-use challenges. Roy-Chowdhury, also co-director of UCR’s Artificial Intelligence Research and Education (RAISE) Institute, noted that the system reflects a shift in how synthetic media is created and detected. As generative AI tools become more widespread, such innovations are critical to preserving the integrity of digital content.

Related Links