WHY THIS MATTERS IN BRIEF
As AI becomes more deeply embedded into our every day lives it is imperative that there are safeguards in place to make sure it’s safe.
Making sure Artificial Intelligence (AI) does what we want and behaves in predictable ways will be crucial as the technology becomes increasingly ubiquitous and embedded into the digital fabric of our global society. It’s an area frequently neglected in the race to develop products, and a little while ago Google tried to create a kill switch to terminate “rogue AI’s” then pitted AI’s against one another and discovered that they “fight one another and get aggressive.” Now DeepMind, Google’s famous AI outfit, has now outlined its research agenda to tackle the problem.
AI safety, as the field is becoming known, has been gaining prominence in recent years. That’s probably at least partly down to the overzealous warnings of an upcoming AI apocalypse from concerned pundits like Elon Musk and Stephen Hawking. But it’s also recognition of the fact that AI technology is quickly pervading all aspects of our lives, making decisions on everything from how self-driving vehicles behave, what movies we watch to whether we get a mortgage.
That’s why DeepMind hired a bevy of researchers who specialise in foreseeing the unforeseen consequences of the way we built AI back in 2016. And now the team has spelled out the three key domains they think require research if we’re going to build autonomous machines that do what we want.
In a new blog designed to provide updates on the team’s work, they introduce the ideas of “Specification, Robustness, and Assurance,” which they say will act as the cornerstones of their future research.
Specification involves making sure AI systems do what their operator intends; robustness means a system can cope with changes to its environment, such as the ones DeepMind threw at one of its AI’s recently to try to corrupt it; and assurance involves our ability to understand what systems are doing and how to control them.
A classic thought experiment designed to illustrate how we could lose control of an AI system can help illustrate the problem of specification.
Philosopher Nick Bostrom’s posited a hypothetical machine charged with making as many paperclips as possible. Because the creators fail to add what they might assume are obvious additional goals like not harming people, the AI wipes out humanity so we can’t switch it off before turning all matter in the universe into paperclips.
Obviously the example is extreme, but it shows how a poorly specified AI goal can lead to unexpected and disastrous outcomes. And properly codifying the desires of the designer is no easy feat either, often there are no neat ways to encompass both the explicit and implicit goals in ways that are understandable to the machine and don’t leave room for ambiguities, meaning we often rely on incomplete approximations.
The researchers note recent research by OpenAI in which an AI was trained to play a boat-racing game called CoastRunners. The game rewards players for hitting targets laid out along the race route. The AI worked out that it could get a higher score by repeatedly knocking over regenerating targets rather than actually completing the course. The blog post includes a link to a spreadsheet detailing scores of such examples.
Another key concern for AI designers is making their creation robust to the unpredictability of the real world. Despite their superhuman abilities on certain tasks, most cutting-edge AI systems are remarkably brittle. They tend to be trained on highly curated datasets so can fail when they’re faced with unfamiliar inputs. This can happen by accident or by design, and researchers have come up with a variety of ways to trick image recognition algorithms into misclassifying things, including thinking a 3D printed tortoise was actually a gun.
Building systems that can deal with every possible encounter may not be feasible, so a big part of making AIs more robust may be getting them to avoid risks and ensuring they can recover from errors, or that they have fail safes to ensure errors don’t lead to catastrophic failure.
And finally, we need to have ways to make sure we can tell whether an AI is performing the way we expect it to. A key part of assurance is being able to effectively monitor systems and interpret what they’re doing. If, for example, we’re basing medical treatments or sentencing decisions on the output of an AI, we’d like to see the reasoning. That’s a major outstanding problem for popular deep learning approaches, which, despite work from DARPA, MIT, and Nvidia are largely indecipherable black boxes – boxes with potential bugs that car drive self-driving cars off of cliffs…
The other half of assurance is the ability to intervene if a machine isn’t behaving the way we’d like. But designing a reliable off switch, as Google found out with their kill switch, is tough because most learning systems have a strong incentive to prevent anyone from interfering with their goals.
The authors don’t pretend to have all the answers, but they hope the framework they’ve come up with can help guide others working on AI safety. And while it may be some time before AI is truly in a position to do us harm, hopefully early efforts like these will mean it’s built on a solid foundation that ensures it is aligned with our goals, and not the goals of the “immortal AI dictator” that Elon Musk recently preached about.