WHY THIS MATTERS IN BRIEF
A bug in your self-driving car’s AI software caused it to drive you off a cliff and you’re dead, you’re probably now wishing that it’d been debugged – and now it can be.
If you listen to the venture capitalists in Silicon Valley then you’ll hear the phrase “Software is eating the world” every time you sit down to have a coffee and every time you turn a corner, but the fact is that most software bugs won’t kill you. The worst that might happen is your Powerpoint presentation could blue screen and you’d be left hanging in front of an audience, but you’ll brush it off. Self-driving cars on the other hand, well, that’s another story. A bug in that software and you’ll find yourself having a completely different type of crash experience.
Today most of the Artificial Intelligence (AI) platforms that underpin self-driving cars are neural network black boxes, and even the experts, like those from Google and Elon Musks’ OpenAI, by their own admission, don’t really understand how they learn or how they do the things they do – like spontaneously learn.
Over the past year there have been a few attempts, from the likes of MIT and Nvidia, to get these black boxes to explain their decision making and there is some progress being made, but now a new team of researchers from Columbia University, who also recently found a way to create more nimble robots, and Lehigh University have come up with another method to get these magical boxes of mystery to reveal their darkest secrets, and their bug hunting method, called DeepXplore, aims to expose any AI’s bad decision making – whether those AI’s are deployed in online services and autonomous vehicles.
The new method uses at least three neural networks, the basic architecture of deep learning algorithms, to act as “cross-referencing oracles” in checking each other’s accuracy.
Originally the team designed DeepXplore to solve an optimization problem where they looked to strike the best balance between two objectives – maximizing the number of neurons activated within neural networks, and triggering as many conflicting decisions as possible among different neural networks.
By assuming that the majority of neural networks will generally make the right decision, DeepXplore automatically retrains the neural network that made the lone dissenting decision to follow the example of the majority in a given scenario.
“This is a differential testing framework that can find thousands of errors in self-driving systems and in similar neural network systems,” says Yinzhi Cao, Assistant Professor of Computer Science at Lehigh University in Bethlehem, Pa.
Cao and his colleagues on the DeepXplore team recently won best paper after presenting their research at the 2017 Symposium on Operating Systems Principles (SOSP) held in Shanghai, China late last month, and their win may signal a growing recognition of the need for debugging tools in deep learning AI.
Typically, deep learning algorithms become better at certain tasks by filtering huge amounts of training data that humans have labelled with the correct answers, and that’s enabled such algorithms to achieve accuracies of well over 90 percent on certain test datasets that involve tasks such as identifying the correct human faces in Facebook photos or choosing the correct phrase in a Google translation between, say, Chinese and English. In these cases, it’s not the end of the world if a friend occasionally gets misidentified or if a certain esoteric phrase gets translated incorrectly.
But the consequences of mistakes rise sharply once tech companies begin using deep learning algorithms in applications such as controlling an armed military drone, or where a two ton machine is moving at highway speeds. A wrong decision by a self-driving AI here could lead to the car crashing into a guard rail, colliding with another vehicle or worse still, mowing down pedestrians and cyclists.
The gallery was not found!
Bang you’re dead
In one example from DeepXplore compared the images of two hill top roads. One was a normally exposed image and the identical image was slightly darker and DeepXplore discovered that in this case Nvidia’s DAVE-2 self driving car software would have sent the car crashing into the guard rail. And possibly off a cliff. Remind me not to get in that car… AI debugging software is quickly becoming my favourite type of software.
Similarly government regulators will want to know for sure that self-driving cars can meet a certain safety standards, and random test datasets may not uncover all those rare “corner cases,” as they’re called, that could lead an algorithm to make a catastrophic mistake.
“I think this push toward secure and reliable AI kind of fits in nicely with explainable AI,” says Suman Jana, an Assistant Professor of Computer Science at Columbia University in New York City, “transparency, explanation and robustness all have to be improved a lot in machine learning systems before these systems can start working together with human beings or start running on roads.”
Jana and Cao come from a group of researchers who share backgrounds in software security and debugging. In their world, even software that is 99-percent error free could still be vulnerable if malicious hackers can exploit that one lone bug in the system, and that has made them far less tolerant of errors than many deep learning researchers who see mistakes as a natural part of the training process. It’s also made them fairly ideal candidates to figure out a new and more comprehensive approach for debugging deep learning.
Until now, debugging of the neural networks in self-driving cars has involved fairly tedious or random methods. One random testing approach involves human researchers manually creating test images and feeding those into the networks until they triggered a wrong decision. Meanwhile a second approach, called Adversarial Testing, can automatically create a sequence of test images by slightly tweaking one particular image until it trips up the neural network.
DeepXplore took a different approach by automatically creating test images most likely to cause three or more neural networks to make conflicting decisions. For example, DeepXplore might look for just the right amount of lighting in a given image that could lead two neural networks to identify a vehicle as a car while a third neural network identifies it as a face – a problem that, arguably, led to the death of a Tesla driver in Florida last year when the car’s autopilot mistook a semi-truck for the sun.
At the same time, DeepXplore also aimed to maximize neuron coverage in its testing by activating the maximum number of neurons and different neural network pathways. Such neuron coverage is based on a similar concept in traditional software testing called code coverage, Cao explains.
This process was able to activate 100 percent of network neurons, or about 30 percent more on average than either the random or adversarial testing methods previously used in deep learning algorithms.
Testing with 15 state of the art neural networks looking at five different public datasets showed how DeepXplore could find thousands of previously undiscovered errors in a wide variety of deep learning applications. The test datasets included scenarios for self-driving car AI, automatic object recognition in online images, and automatic detection of malware masquerading as ordinary software.
While DeepXplore cannot yet guarantee that it has found every single possible bug in a system, and nor is it every likely to be able to make that claim, it’s a huge step forwards in an area that is increasingly crucial if we are going to ever realise the full potential of AI.