Scroll Top

Nvidia and MIT open source an AI that creates crazy good synthetic videos

WHY THIS MATTERS IN BRIEF

Creating, and also then converting, video content is crazy laborious so companies are creating AI’s that do the work for you, and they’re getting better fast.

 

Interested in the Exponential Future? Connect, download a free E-Book, watch a keynote, or browse my blog.

Nvidia and MIT have announced that they’ve open sourced their stunning Video-to-Video Artificial Intelligence (AI) synthesis model. In short, they’ve just thrown a highly advanced AI that’s frighteningly good at creating synthetic content, in other words converting real video into synthetic video, which could be used to create not just new VR content but also help create better fake content. And while I’m going to walk you through what it is and why it’s so interesting frankly you might just want to watch the video, but put a cushion on the floor because you’re going to fall off your chair when you see what they’ve created with it.

 

See also
Darktrace's new AI automatically stops cyber attacks

 

Anyway, onto the article… by using a Generative Adversarial Network (GAN) the team were able to “generate high resolution, photorealistic and temporally (time) coherent results with various input formats,” including segmentation masks, sketches, and poses – and that’s a huge leap forwards in a field where huge leaps take place almost daily.

 

Take a look at the amazing results
 

Compared to Image-to-Image (I2I) translation and it’s close relative Text-to-Video (T2V) translation, which lets people type in text and then have an AI auto-generate the corresponding video, like the ones I’ve discussed before and which is amazing in itself, there’s been a lot less research into making AI’s that can perform Video-to-Video (V2V) translation and synthesis.

And why might you ask should anyone care about V2V? Well, for starters it would allow you to capture video of a city and instantly convert it into digital footage that you could then use to instantly create a realistic Virtual Reality (VR) world – with the added perk being that you could then use another AI to modify that world on the fly in any way you like – as the video above demonstrates nicely for you by turning buildings in a city into trees. And so on…

 

See also
Uber suspends entire self-driving car program after Arizona crash

 

One of the problems of V2V translation so far though has been trying to solve the problem of low visual quality and the incoherency of video results in existing image synthesis approaches, both of which the team has been able to solve to the point that their new AI can create 2K resolution videos that are up to 30 seconds in length – another set of breakthroughs.

During their research the authors performed “extensive experimental validation on various datasets” and “the model showed better results than existing approaches from both quantitative and qualitative perspectives.” And in addition to that when they extended the method to multimodal video synthesis with identical input data, the model produced new visual properties in the scene, with both high resolution and coherency.

 

See also
Google X spins off its molten salt grid scale storage business

 

The team then went on to suggest that the model could be improved in the future by adding additional 3D cues such as depth maps to better synthesise turning cars; using object tracking to ensure an object maintains its colour and appearance throughout the video; and training with coarser semantic labels to solve issues in semantic manipulation.

The Video-to-Video Synthesis paper is on arVix, the team’s model and data are here.

Related Posts

Leave a comment

FREE! DOWNLOAD THE 2024 EMERGING TECHNOLOGY AND TRENDS CODEXES!DOWNLOAD

Awesome! You're now subscribed.

Pin It on Pinterest

Share This