OpenAI thrashes DeepMind using an AI from the 1980's

0 0

By Matthew Griffin Intelligence and the Senses 13th April 2017

WHY THIS MATTERS IN BRIEF

Decades old AI algorithms are increasingly demonstrating that, with some fine tuning, they can thrash today’s best systems – and people are taking notice

Artificial intelligence (AI) researchers have a long history of going back in time to explore old ideas, and now researchers at OpenAI, which is backed by Elon Musk, have revisited “Neuroevolution,” a field that has been around since the 1980s, and they’ve achieved state of the art results.

The group, which was led by OpenAI’s research director Ilya Sutskever, explored the use of a set of algorithms called “Evolution strategies,” which are aimed at solving “optimisation” problems. Optimisation problems are just like they sound, think of something that needs optimising, such as your route to work, a flight plan, or even a healthcare treatment and optimise it.

On an abstract level, the technique the team used works by letting successful algorithms to pass their characteristics on to future generations – in short, each successive generation gets better and better at whatever tasks they’ve been assigned. However, coming back into the present day, the researchers took these algorithms and reworked them so they’d work better with today’s deep neural networks and run better on large scale distributed computing systems.

To validate the new systems effectiveness they set the algorithms to work on a series of challenges that are seen as benchmarks for reinforcement learning – the technique behind many of Google DeepMind’s most impressive feats that range from teaching their AI’s to learn as fast as humans, and giving them human like memory, through to creating new Artificial General Intelligence (AGI) architectures, and teaching them to dream and annihilate online Go players – to name but a few.

One of the challenges was to train the algorithm to play a variety of Atari computer games, and the other was to get it to learn how to control a virtual humanoid walker in a physics engine.

First the algorithm started with a random policy – the set of rules that govern how the system should behave to achieve high score, and then it created several hundred copies of the policy, with some random variation that were then tested on the game. These policies were then mixed back together again, but with greater weight given to the ones that got the highest score in the game. The team repeated the process until it came up with a policy that played the game well.

In just an hour of training on the Atari challenge the algorithm achieved a level of mastery that took a DeepMind’s reinforcement learning system a whole day to learn, and on the walking problem it took just 10 minutes, compared to DeepMind’s 10 hours.

One of the keys to this dramatic performance improvement was the fact that the new system was superb at processing workloads in parallel. To solve the walking simulation, for example, the system spread its computations over 1,440 CPU cores, while in the Atari challenge it used 720.

This was possible because the system only required limited communication between the various “worker” algorithms testing the candidate policies – scaling reinforcement algorithms like DeepMind’s have to communicate a lot more. Additionally, the new system didn’t need to use “backpropagation,” a common neural network learning technique – this effectively compares the network’s input with the desired output and then feeds the resulting information back into the network to help optimise it.

When combined this helped make the new systems code shorter, and the algorithm three to four times faster. But the approach has its limitations. These kinds of algorithms are usually compared based on their data efficiency – the number of iterations required to achieve a specific score in a game, and using this metric, the OpenAI approach did worse than the traditional reinforcement learning approaches, even though it carried out those iterations much quicker.

For supervised learning problems, for example, such as image classification and speech recognition it was up to 1,000 times slower than approaches that use backpropagation. And that’s bad.

Nevertheless, the work demonstrated promising new applications for what were once thought obsolete evolutionary approaches, and OpenAI isn’t the only group investigating them, Google has also been experimenting with older algorithms, so while I don’t know about dogs, it certainly looks like you can teach old algorithms new tricks.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.