WHY THIS MATTERS IN BRIEF
As the barriers to creating high quality DeepFake content disappear soon we’ll all be able to create our very own DeepFakes.
Recent advances in Artificial Intelligence (AI) have made it far easier to create “fake” or “synthetic” audio, video and text content than ever before. And as the technology develops, becomes democratised and escapes from the labs and into the wild, and eventually the apps on your smartphone, and then gets to the point where it’s ubiquitous and free to use, we will witness an explosion in content creation like nothing we’ve ever seen before.
For example, recently we have seen AI’s write good full length scientific books and papers, and “dangerously” good articles, scripts and text, turn static paintings and photos of people into convincing DeepFake videos, and that’s before we discuss “regular” DeepFake technology, and even generate “complex video” out of thin air, and video from nothing more than text. And all of that is just the tip of the giant iceberg that governments now, three years after the technology’s first emergence, are now panicking to control – a problem that I highlighted nearly three years ago in a London keynote. Better late than never I guess.
Anyway, one of the trends I’ve been watching with interest is the field of AI based “Text to Video” creation – this is where you simply type whatever you want, and an AI will generate the corresponding video for you, and the reason why I’ve been watching it intently for over two years now is because it democratises content creation for everyone – good and bad. Imagine for example creating an advert or a movie just from some text or a script – hopefully I don’t have to spell out the sheer disruptive power of this amazing, and terrifying, technology for you.
Now the team behind the first ever Text to Video AI that I wrote about last year have gone one step further and developed an algorithm that simplifies the process of creating a DeepFake, of anyone or anything, to a terrifying degree that lets people make a video subject say anything they want by just editing the video’s transcript – as you can see from the video. And unsurprisingly, even its creators are concerned about what might happen if the tech falls into the wrong hands.
The researchers, who all hail from Stanford University, Princeton University, the Max Planck Institute for Informatics, and Adobe, who are also behind another similar breakthrough product that I discussed last year called Adobe VoCo which “does for voice what Adobe Photoshop did for images,” detail how their new algorithm works in a paper published to Stanford scientist Ohad Fried’s website.
First, the AI analyzes a source video of a person speaking, but it isn’t just looking at their words – it’s identifying each tiny unit of sound, or phoneme, the person utters, as well as what they look like when they speak each one.
There are only approximately 44 phonemes in the English language, and according to the researchers, as long as the source video is at least 40 minutes long, the AI will have enough data to gather all the pieces it needs to make the person appear to say anything. And yes, over time, the amount of training data it needs will drop from 40 minutes to sub a minute so watch out for that article in I’ll say about six or so months.
After that, all a person has to do is edit the transcript of the video, and the AI will generate a DeepFake that matches the rewritten transcript by intelligently stitching together the necessary sounds and mouth movements.
Based on the video showing the new algorithm in action, it appears best suited for minor changes, which is understandable given its stage of development. In one example, the researchers demonstrate how the AI can replace “napalm” in the famous “Apocalypse Now” quote, “I love the smell of napalm in the morning,” with the far more innocuous “French toast.”
But even they worry that some might find far more destructive uses for the new algorithm. And let’s face it – we all can so imagine this in “less noble” hands…
“We acknowledge that bad actors might use such technologies to falsify personal statements and slander prominent individuals,” they write in their paper, later adding that they “believe that a robust public conversation is necessary to create a set of appropriate regulations and laws that would balance the risks of misuse of these tools against the importance of creative, consensual use cases.”
And I for one would argue that debate needs to happen now, not three years after the technology first appeared as happened previously, and that that debate needs to discuss how to control where the technology is going before it gets there, not where it is today. Because by the time you find a way to control today’s technology we’ll already be multiple generations further on in its development and all those conversations will have been wasted. After all, let’s face it – technology waits for no man, it only accelerates and gets better, cheaper, and faster. Etc.
In the meantime though I want one… it’s still an awesome piece of tech!