An AI learned to surf and use tools after playing 500 million games of hide and seek

0 2

By Matthew Griffin Intelligence and the Senses 26th September 2019

WHY THIS MATTERS IN BRIEF

AI researchers applied human evolutionary philosophies to train their AI, and witnessed a breakthrough that saw the AI’s create and use their own tools.

Interested in the Exponential Future? Connect, download a free E-Book, watch a keynote, or browse my blog.

In the early days of life on Earth, biological organisms were exceedingly simple. They were microscopic unicellular creatures with little to no ability to coordinate – a little like me still to be frank, especially after I’ve been travelling. Yet billions of years of evolution through competition and natural selection led to the complex life forms we have today – as well as complex human intelligence.

Researchers at OpenAI, the San Francisco based for-profit AI research lab, are now testing a hypothesis – if you could mimic that kind of competition in a virtual world, would it also give rise to much more sophisticated artificial intelligence? And in my opinion it’s a very interesting hypothesis that could have some ginormous future implications on not just AI development , but also the world at large.

The experiment builds on two existing ideas in the field. The first is multi-agent learning, the idea of placing multiple algorithms in competition or coordination to provoke emergent behaviours, like I’ve discussed before where Google, for example, got their AI’s to fight, with worrying results, and Microsoft got their pigs to co-operate. And the second is reinforcement learning, the specific machine learning technique that learns to achieve a goal through trial and error which was first popularised by Google DeepMind’s AlphaGo AI that beat the world’s top human champion a number of years ago.

In a new paper released recently, OpenAI has now revealed the initial results of its findings. Through playing a simple game of hide and seek hundreds of millions of times, two opposing teams of AI agents developed complex hiding and seeking strategies that involved tool use and collaboration. The research also offers insight into OpenAI’s dominant research strategy – to dramatically scale existing AI techniques to see what properties emerge, that a $1Bn investment from Microsoft will no doubt help boost substantially.

To create the game, the researchers designed a simulated virtual environment that consisted of an enclosed space with various objects like blocks, ramps, and mobile and immobile barricades. The agents themselves were controlled by reinforcement-learning algorithms. For each game, the agents were split into two teams: hiders were rewarded or penalized for avoiding or failing to avoid the seekers, respectively; seekers were also rewarded or penalized for finding or failing to find the hiders. As in a classic game of hide and seek, hiders were also given a few seconds’ head start. The researchers gave the agents no other instructions.

Over the course of hundreds of millions of rounds of game play, the agents developed several strategies and counter-strategies. During the first phase, the hiders and seekers developed very crude avoiding and chasing strategies by moving around the space without manipulating any of the objects.

But around the 25 million game mark the play became more sophisticated. The hiders learned to move and lock the boxes and barricades in the environment to build forts around themselves so the seekers would never see them. Not only that, but they also developed coordination strategies, such as passing each other objects, in order to speed up the fort-building process.

After 25 million rounds of hide-and-seek, the hiders learned to build forts to win the game.

Courtesy: OpenAI

After another 75 million rounds, however, the seekers discovered a counter-strategy, moving the process into phase three. They learned to move a ramp next to the hiders’ fort and use it to climb over the walls. By phase four, 85 million rounds in, the hiders had learned to lock the ramps in place before building their fort, and just for reference neither team could unlock an object that was locked by an opposing team.

After 75 million games the seekers learned to climb over the fort’s walls using a ramp.

Courtesy: OpenAI

At first, the researchers at OpenAI believed that this was the last phase of game play, but finally, at the 380 million game mark, two more strategies emerged. The seekers once again developed a strategy to break into the hiders’ fort by using a locked ramp to climb onto an unlocked box, then “surf” their way on top of the box to the fort and over its walls. In the final phase, the hiders once again learned to lock all the ramps and boxes in place before building their fort.

After 380 million rounds of training, the seekers learned to “box surf” to climb over the fort’s walls.

Courtesy: OpenAI

The researchers believe that these initial results demonstrate a promising new method for evolving even more complex AI’s than we have today – something that will no doubt go some way to helping us realise Artificial General Intelligence (AGI) – the point at which AI is more intelligent and capable than humans.

“We didn’t tell the hiders or the seekers to run near a box or interact with it,” says Bowen Baker, one of the authors of the paper. “But through multiagent competition, they created new tasks for each other such that the other team had to adapt.”

This study is relatively unique to OpenAI’s approach to AI research. Though the lab has also invested in developing novel techniques relative to other labs, it has primarily made a name for itself by dramatically scaling existing ones. GPT-2, the lab’s infamous language model, for example, heavily borrowed algorithmic design from earlier language models, including Google’s BERT; OpenAI’s primary innovation was a feat of engineering and expansive computational resources.

In a way, this study reaffirms the value of testing the limits of existing technologies at scale. The team also plans to continue with this strategy. The researchers say that the first round of experiments didn’t even come close to reaching the limits of the computational resources they could throw at the problem.

“We want people to imagine what would happen if you induced this kind of competition in a much more complex environment,” Baker says. “The behaviors they learn might actually be able to eventually solve some problems that we maybe don’t know how to solve already.”

And all this, as I alluded to earlier, is just one reason why I think of all the AI research out there this could be a game changer. So let’s wait and see – but not too long because these AI’s are evolving fast, very fast – like 3D printed meal and microwave dinner fast. Wow!

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.