WHY THIS MATTERS IN BRIEF
As we produce more systems that rely on machine vision to do their jobs or keep people safe, systems that can’t see the world accurately pose both a danger and a risk.
Machine Vision has come a long way since Imagenet, a large repository of labelled images that researchers use to train their newest Artificial Intelligence (AI) agents with, was released, but still to this day images with bad, tricky or just plain weird lighting can still confuse even the best AI’s algorithms and get them to misreport whatever it is they’re looking at. And there are a multitude of examples where an AI has been tricked or confused, such as an AI that was being used to run an autonomous train prototype that mistook a shadow for a rock and came to a dead stop on the track, and even Nvidia’s DAVE 2.0 self-driving car software that under certain lighting conditions would send a simulated car off a cliff, both of which exemplify the issue that machine vision enthusiasts everywhere still face.
Over the past couple of years in order to try to overcome the issue researchers have either tried to create special hand crafted rules about how light interacts with objects or used data sets that cover as many lighting situations as possible, but there is a nearly limitless combination of items and light in the real world and that handicaps both approaches.
Now though a paper by researchers from MIT and DeepMind has detailed a new AI process that can identify images in different lighting without having to hand craft new rules or train on a huge data set. The process, called a Rendered Intrinsics Network, or RIN for short, automatically separates an image into reflectance, shape, and lighting layers. It then recombines the layers into a reconstruction of the original image.
To train their RIN the researchers started off by creating a data set consisting of five shapes including cones, cubes, cylinders, spheres and torus’s, and rendered each one with ten different orientations and over five hundred different colours.
As a proof of concept they then showed how breaking down an image into the three layers could help a computer identify what an item in an image is, or at least figure out what the real shape of objects in said image could be. For example, the model also learned to spot and categorise much more complicated objects, such as the classic image test models Stanford bunny, Utah teapot, and Blender’s Suzanne, after being trained on the basic sample shapes, without ever seeing specifically labelled examples.
Beyond offering a new way to overcome the problem of infinite lighting situations for an image RIN is also an example of learning with unlabelled data. Most AI still needs labelled data to learn, and preparing it takes hours of repetitive human labour so finding a way to learn from unlabelled data itself is yet another AI frontier that needs overcome, so the teams, especially DeepMind, who recently created one of the world’s first self-learning AI’s called Alpha Zero, have made progress on both fronts.