WHY THIS MATTERS IN BRIEF
Why bother setting up a studio, hiring videographers and actors, when a DeepFake can do it all for a fraction of the cost and faster?
Love the Exponential Future? Join our XPotential Community, enjoy exclusive content, future proof yourself with XPotential University, connect, watch a keynote, or browse my blog.
This month advertising giant WPP will send unusual corporate training videos to tens of thousands of employees worldwide. A presenter will speak in the recipient’s language and address them by name, while explaining some basic concepts in Artificial Intelligence (AI). And the videos themselves will be powerful demonstrations of what AI can do because everything – the face, and the words it speaks, will be generated entirely from scratch using nothing more than software (and a clever AI).
WPP doesn’t bill them as such but its synthetic training videos might be called DeepFakes, a loose term applied to images or videos generated using AI that look real, like this newscaster that Reuters just developed, or this DeepFake Elon Musk who recently bombed a Zoom call. Although best known as tools of harassment, porn, or duplicity, image generating AI is now being used by major corporations for such anodyne purposes as corporate training.
WPP’s unreal training videos, made with technology from London startup Synthesia, aren’t perfect but they’re getting better and closer. WPP chief technology officer Stephan Pretorius says the prosody of the presenters’ delivery can be off, the most jarring flaw in an early cut shown to journalists that was otherwise visually smooth. But the ability to personalise and localise video to many individuals makes for more compelling footage than the usual corporate fare, he says.
“The technology is getting very good very quickly,” Pretorius says.
Deepfake style production can also be cheap and quick, an advantage amplified by Covid-19 restrictions that have made conventional video shoots trickier and riskier. Pretorius says a company-wide internal education campaign might require 20 different scripts for WPP’s global workforce, each costing tens of thousands of dollars to produce.
“With Synthesia we can have avatars that are diverse and speak your name and your agency and in your language and the whole thing can cost $100,000,” he says. In this summer’s training campaign, the languages are limited to English, Spanish, and Mandarin. Pretorius hopes to distribute the clips, 20 modules of about 5 minutes each, to 50,000 employees this year.
The term deepfakes comes from the Reddit username of the person or persons who in 2017 released a series of pornographic clips modified using machine learning to include the faces of Hollywood actresses. Their code was released online, and various forms of AI video and image-generation technology are now available to any interested amateur. Deepfakes have become tools of harassment against activists, countries, and individuals, and a cause of concern among lawmakers and social media executives worried about political disinformation and fake news, although they are also used for fun, such as to insert Nicolas Cage into movies he did not appear in.
Deepfakes made for titillation, harassment, or fun typically come with obvious giveaway glitches. Startups are now crafting AI technology that can generate video and images able to pass as substitutes for conventional corporate footage or marketing photos. It comes as synthetic content, and people, are becoming more mainstream. Prominent talent agency CAA recently signed Lil Miquela, a computer-generated Instagram influencer with more than 2 million followers.
Rosebud AI specialises in making the kind of glossy images used in ecommerce or marketing. Last year the company released a collection of 25,000 modelling photos of people that never existed, along with tools that can swap synthetic faces into any photo. More recently, it launched a service that can put clothes photographed on mannequins onto virtual but real-looking models.
Lisha Li, Rosebud’s CEO and founder, says the company can help small brands with limited resources produce more powerful portfolios of images, featuring more diverse faces.
“If you’re a brand that wanted to tell a visual story, you used to have to have a large creative team, or buy stock photos,” she says. Now you can tap algorithms to make your portfolio instead.
JumpStory, a stock photo startup in Højbjerg, Denmark, has experimented with Rosebud’s technology. It had already built a business around in-house machine learning technology that tries to curate a library containing only the most visually striking photos. Using Rosebud’s technology, JumpStory tested a feature that would allow customers to alter the face in a stock photo with a few clicks, including to change a person’s apparent ethnicity, a task that would otherwise be impractical or require careful Photoshop work.
Jonathan Low, JumpStory’s CEO, says the company chose not to launch the feature, preferring to emphasise the authenticity of its images. But the technology was impressive.
“If it’s a portrait it works extremely well,” Low says. Results generally aren’t as good when faces are less prominent in an image, such as in a full-length shot, he says.
Synthesia, the London startup that powered WPP’s deepfake project, makes video featuring synthesised talking heads for corporate clients including Accenture and SAP. Last year, it helped David Beckham appear to deliver a PSA on malaria in several languages, including Hindi, Arabic, and Kinyarwanda, spoken by millions of people in Rwanda.
Victor Riparbelli, Synthesia’s CEO and cofounder, says widespread use of synthetic video is inevitable because consumers and companies have a larger appetite for video than can possibly be sated by conventional production.
“We’re saying let’s remove the camera from the equation,” he says. Riparbelli says interest in his technology has grown since Covid-19 shut down many video shoots and forced some companies to launch new employee education and training schemes.
Making a video with Synthesia’s tools can take seconds. Select an avatar from a list, type the script, and click a button labeled “Generate video.” The company’s avatars are based on real people, who receive royalties based on how much footage is made with their image. After digesting some real video of a person, Synthesia’s algorithms can generate new video frames to match the movements of their face to the words of a synthesised voice, which it can create in more than two dozen languages. Clients can create their own avatars by providing a few minutes of sample footage of a person, and customise their surroundings and voices too.
Riparbelli and others working to commercialise deepfakes say they are proceeding with caution, not just rushing to cash in. Synthesia has posted ethics rules online and says that it vets its customers and their scripts. It requires formal consent from a person before it will synthesise their appearance, and won’t touch political content. Rosebud has its own, less detailed, ethics statement pledging to combat negative uses and effects of synthetic images.
Li, Rosebud’s CEO, says her technology should do more good than harm. Helping a broader range of people to compete, without large production budgets, should encourage a broadening of beauty standards, she says. Her technology can generate models of non-binary gender, as well as different ethnicities.
“A lot of the users I am working with are minority brand owners who want to create diverse imagery to represent their user base,” says Li, who worked on the side as a model for more than 10 years before gaining a Berkeley PhD in statistics and machine learning and working as a venture capitalist.
Subbarao Kambhampati, an AI professor at Arizona State University, says the technology is impressive but wonders whether some Rosebud clients may use diverse, synthetic models in place of real people from minority communities.
“It might lull us into a false sense of accomplishment in terms of representation without changing the ground reality,” he says.
As synthetic imagery moves into the corporate mainstream, big brands and their ad agencies will greatly influence how people experience the technology. Pretorius of WPP says his company is exploring many uses for AI-synthesized imagery, with creations so far including a Rembrandt-style portrait and digitally made models indistinguishable from real people.
“We can do it technically but we’re going slowly in terms of deploying that to the market,” he says. The company’s general counsel is working on a set of ethical standards for synthetic models and other imagery, including when and how to disclose that something is not in fact what it seems.