Deepmind's AI learned to create its own synthetic videos after watching YouTube

0 4

By Matthew Griffin Intelligence and the Senses 6th December 2020

WHY THIS MATTERS IN BRIEF

AI’s are being trained to create all kinds of their own synthetic content, from books to movies, and in time they’ll transform the creative industry.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, or browse my blog.

Perhaps you’ve heard of FaceApp, the mobile app that uses Artificial Intelligence (AI) to transform selfies of people into versions of their older selves, or This Person Does Not Exist which creates high resolution computer generated photos of fictional people. But what about an algorithm that can generate its own videos from scratch, like the ones I’ve written about many times before?

One of the newest papers from Google parent company Alphabet’s DeepMind outfit, entitled Efficient Video Generation on Complex Datasets details recent advances in the budding field of using AI to generate its own videos using nothing more than its digital mind.

See the videos AI made all by itself

According to the paper thanks to “computationally efficient components, new techniques, and a new custom data set,” researchers at DeepMind say their best-performing model — Dual Video Discriminator GAN (DVD-GAN) — can generate coherent 256 x 256-pixel videos of “notable fidelity” up to 48 frames in length. And that makes it one of the world’s best – even though this is still a nascent area of research.

“Generation of synthetic video is an obvious challenge for generative modelling, but one that is plagued by increased data complexity and computational requirements,” wrote the coauthors. “For this reason, much prior work on synthetic video generation has revolved around relatively simple data sets, or tasks where strong temporal conditioning information is available. We focus on the tasks of video synthesis and video prediction … and aim to extend the strong results of generative image models to the video domain.”

… and it can go on creating them forever …

The team built their system using a cutting-edge AI architecture and introduced video specific tweaks that enabled it to train on Kinetics-600, a data set of natural videos “an order of magnitude” larger than commonly used corpora. Specifically, the researchers leveraged scaled-up Generative Adversarial Networks, or GANs which consist of two parts – an AI systems consisting of generators that generate the video samples and AI discriminators that attempt to distinguish between the generated samples and real-world samples to see how well they compare.

DVD-GAN contains dual discriminators: a spatial discriminator that critiques a single frame’s content and structure by randomly sampling full-resolution frames and processing them individually, and a temporal discriminator that provides a learning signal to generate movement. A separate module, a “Transformer” then lets the learned information propagate across the entire AI model, which then improves it.

As for the training data set it was made up of over 500,000 10 second long high-resolution YouTube clips that the researchers described as “diverse” and “unconstrained.” They then added that after being trained on Google’s specialist Tensor Processing Units for between 12 and 96 hours their DVD-GAN managed to create videos “with object composition, movement, and even complicated textures like the side of an ice rink.”

“We further wish to emphasise the benefit of training generative models on large and complex video data sets, such as Kinetics-600,” wrote the co-authors. “We envisage the strong baselines we established on this data set with DVD-GAN will be used as a reference point by the generative modelling [synthetic content] community moving forward. While much remains to be done before realistic videos can be consistently generated in an unconstrained setting, we believe DVD-GAN is a big step in that direction.”

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.