WHY THIS MATTERS IN BRIEF
Today’s AI’s are force fed training information but companies around the world are trying to create AI’s that learn for themselves which will revolutionise the field.
When playing a video game, what motivates you to carry on? This question is perhaps too broad to yield a single answer, but if you had to sum up why you accept that next quest, jump into a new level, or cave and play just one more turn, the simplest explanation might just be “curiosity” — to just see what happens next. And as it turns out, curiosity is a very effective motivator when teaching Artificial Intelligence (AI) to play and learn about video games too.
Becoming skilled at Montezuma’s Revenge is not a milestone equivalent to when AI’s mastered Go or Dota 2, but it’s still a notable advance. When Google owned DeepMind published its seminal 2015 paper explaining how it beat a number of Atari games using deep learning, Montezuma’s Revenge was the only game it scored 0 percent on. And for DeepMind, who are arguably the world leaders in AI development, where their ai’s are beginning to literally “make their own knowledge“, and are even knocking on Artificial General Intelligence’s (AGI) door, that’s a massive #Fail.
The reason for the game’s difficulty is a mismatch between the way it plays and the way AI agent learns – which also reveals a blind spot in today’s machine learning view of the world.
Usually AI agents rely on a training method called reinforcement learning to master video games. In this paradigm, agents are dumped into virtual world, and rewarded for different outcomes, such as increasing their score, and penalised for others, such as losing a life. The agent starts playing the game randomly at first, but over time learns to improve its strategy via trial and error. Reinforcement learning is often thought of as a key method for building smarter robots, so naturally it’s the technique that’s used by the majority of companies developing advanced AI, from the US to China.
The problem with Montezuma’s Revenge though is that it doesn’t provide regular rewards for the AI agent which makes the use of reinforcement learning as a way to teach the AI’s much more difficult. It’s a “puzzle platformer” where players have to explore an underground pyramid, dodging traps and enemies while collecting keys that eventually unlock doors and special items.
If you were training an AI agent to beat the game in the traditional way, without reverting to “curiosity,” you could reward it for staying alive and collecting keys, but how do you teach it to save certain keys for certain items, and use those items to overcome traps and complete the level? The answer, obviously, is curiosity.
In OpenAI’s research, their agent was rewarded not just for leaping over pits of spikes, but for exploring new parts in the pyramid. This led to better than human performance, with the agent achieving a mean score of 10,000 over nine runs, compared to an average human score of 4,000. In one run, it even completed the first of the game’s nine levels.
“There’s definitely still a lot of work to do,” said OpenAI’s Harrison Edwards. “But what we have at the moment is a system that can explore lots of rooms, get lots of rewards, and occasionally get past the first level.” He added that the game’s other levels are similar to the first, so playing through the whole thing “is just a matter of time.”
OpenAI is far from the first lab to try this approach, and AI researchers have been leveraging the concept of “curiosity” as motivation for decades. They’ve also applied it to Montezuma’s Revenge before, though never so successfully without teaching AI to learn from human examples.
However, while the general theory here is well established, building specific solutions is still challenging. For example, prediction based curiosity is only useful when learning to play certain types of games. It works for titles like Mario, for example, where there are big levels to explore, full of never-before-seen bosses and enemies. But for simpler games like Pong, AI agents prefer to play long rallies rather than actually beat their opponents. Perhaps because winning the game is more predictable than following path of the ball.
Another issue is the “Noisy TV problem,” which is where AI agents that have been programmed to seek out new experiences get addicted to random patterns, like a TV tuned to static noise. This is because these agents’ sense of what is “interesting” and “new” comes from their ability to predict the future. Before they take a certain action they try to predict what the game will look like afterwards. If they guess correctly, chances are they’ve seen this part of the game before, and this mechanism is known as “Prediction error.”
But because static noise is unpredictable, the result is that any AI agent confronted with such a TV, or a similarly unpredictable stimulus, becomes mesmerized. OpenAI compares the problem to human gamblers who are addicted to slot machines, unable to tear themselves away because they don’t know what’s going to happen next.
This new research from OpenAI sidesteps this issue by varying how the AI predicts the future. The exact methodology, called Random Network Distillation, is complex, but Edwards and his colleague Yuri Burda compare it to hiding a secret for the AI to find in every screen of the game. That secret is random and meaningless – something like “What is the colour in the top left of the screen?” suggests Edwards, but it motivates the agent to explore without leaving it vulnerable to the Noisy TV trap.
More importantly, this motivator doesn’t require a lot of calculation, which is incredibly important. These reinforcement learning methods rely on huge amounts of data to train AI agents. OpenAI’s bot, for example, had to play Montezuma’s Revenge for the real-time equivalent of three years so every step of the journey needs to be as quick as possible.
Arthur Juliani, a software engineer at Unity and machine learning expert, says this is what makes OpenAI’s work impressive.
“The method they use is really quite simple and therefore surprisingly effective,” said Juliani. “It is actually much simpler than other methods of exploration which have been applied to the game in the past and [which have] not led to nearly as impressive results.”
Juliani says that given the similarities between different levels in Montezuma’s Revenge, OpenAI’s work is “essentially equivalent” to solving the game, but he adds that “the fact that they aren’t able to consistently beat the first level means that there is still some of an open challenge left.”
He also wonders whether their approach will work in 3D games, where visual features are more subtle and a first person view occludes much of the world.
“In scenarios where exploration is required, but the differences between parts of the environment are more subtle, the method may not perform as well,” says Juliani. But why do we need curious AI in the first place? What good does it do us, apart from providing humorous parallels to our human tendency to get ensnared by random patterns. The big reason is that curiosity helps computers learn on their own.
Most machine learning approaches deployed today can be split into two camps. In the first, machines learn by looking at piles of data, working out patterns they can apply to similar problems, and in the second, they’re dropped into an environment and rewarded for achieving certain outcomes using reinforcement learning.
Both of these approaches are effective at specific tasks, but they also require a lot of human labour, either labelling training data or designing reward functions for virtual environments. By giving AI systems an intrinsic incentive to explore for explorations’ sake, some of this work is eliminated and humans spend less time holding their AI agent’s hands. Metaphorically speaking.
OpenAI’s Edwards and Burda say that this sort of curiosity-driven learning system is much better for building computer programs that have to operate in the real world. After all, in reality, as in Montezuma’s Revenge, immediate rewards are often scarce, and we need to work, learn, and explore for long periods of time before we get anything in return. Curiosity helps us keep going, and maybe it can help AI keep going too, almost in the same way that a curious child will keep trying and exploring new things until they achieve, well, whatever it is children are trying to achieve.