Thanks to a team of researchers working on Google DeepMind, they’ve developed an AI which not only is able to read your lips, but is capable of reading them 3x more accurately than professional human lip readers.
“Using this architecture, we demonstrate lip reading performance that beats a professional lip reader on videos from BBC television. The model also surpasses the performance of all previous work on standard lip reading benchmark datasets, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is used.”
– Joon Son Chung et al.
Photo Credit: Google DeepMind (via The Verge)
Researchers from both Google’s AI division DeepMind and the University of Oxford used thousands of hours of footage from the BBC, whereby its lip-reading neural network was able to annotate the video footage with 46.8% accuracy. Obviously this isn’t as impressive as a neural network’s ability to transcribe audio, but when compared to a professional human lip reader’s accuracy – 12.4% – while reviewing the same footage, it’s pretty clear just how significant the lip-reading software really is.
FUTURE IMPLICATIONS
Although you might not be too impressed by its 46.8% accuracy, unlike humans, Google’s AI is prepared to improve upon its ability to transcribe lip movements. Eventually it might even reach a near-perfect accuracy rating, of which would be made applicable throughout various operations in society – from entertainment to intelligence gathering. Sometime later down the road, people might even abandon speaking in audio format, and instead focus on lip reading, whereby each is translated in real-time by an open neural network used by nearly all of the population.