CAS-VSR-W1k Lip Reading Recognition Dataset

CAS-VSR-W1k, formerly known as LRW-1000, is the largest publicly available Mandarin lexical-level lip sync dataset. The dataset contains 1,000 word classes and 700,000 samples from more than 2,000 speakers. The dataset contains more than 1,000,000 Chinese character instances.
Each category corresponds to a syllable of a Mandarin word consisting of one or several Chinese characters. The dataset is designed to cover natural variations in different speech modes and imaging conditions to incorporate challenges encountered in real applications.