Scroll Top

OpenAI thrashes DeepMind using an AI from the 1980’s



  • Decades old AI algorithms are increasingly demonstrating that, with some fine tuning, they can thrash today’s best systems – and people are taking notice


Artificial intelligence (AI) researchers have a long history of going back in time to explore old ideas, and now researchers at OpenAI, which is backed by Elon Musk, have revisited “Neuroevolution,” a field that has been around since the 1980s, and they’ve achieved state of the art results.


See also
Meta's AI became an expert in Diplomacy and human gamers weren't the wiser


The group, which was led by OpenAI’s research director Ilya Sutskever, explored the use of a set of algorithms called “Evolution strategies,” which are aimed at solving “optimisation” problems. Optimisation problems are just like they sound, think of something that needs optimising, such as your route to work, a flight plan, or even a healthcare treatment and optimise it.

On an abstract level, the technique the team used works by letting successful algorithms to pass their characteristics on to future generations – in short, each successive generation gets better and better at whatever tasks they’ve been assigned. However, coming back into the present day, the researchers took these algorithms and reworked them so they’d work better with today’s deep neural networks and run better on large scale distributed computing systems.

To validate the new systems effectiveness they set the algorithms to work on a series of challenges that are seen as benchmarks for reinforcement learning – the technique behind many of Google DeepMind’s most impressive feats that range from teaching their AI’s to learn as fast as humans, and giving them human like memory, through to creating new Artificial General Intelligence (AGI) architectures, and teaching them to dream and annihilate online Go players – to name but a few.


See also
New quantum resistant crypto stops quantum computers spying on your data


One of the challenges was to train the algorithm to play a variety of Atari computer games, and the other was to get it to learn how to control a virtual humanoid walker in a physics engine.

First the algorithm started with a random policy – the set of rules that govern how the system should behave to achieve high score, and then it created several hundred copies of the policy, with some random variation that were then tested on the game. These policies were then mixed back together again, but with greater weight given to the ones that got the highest score in the game. The team repeated the process until it came up with a policy that played the game well.

In just an hour of training on the Atari challenge the algorithm achieved a level of mastery that took a DeepMind’s reinforcement learning system a whole day to learn, and on the walking problem it took just 10 minutes, compared to DeepMind’s 10 hours.


See also
Futurist Keynote, London: Investing in the Exponential Future, SRP European Conference


One of the keys to this dramatic performance improvement was the fact that the new system was superb at processing workloads in parallel. To solve the walking simulation, for example, the system spread its computations over 1,440 CPU cores, while in the Atari challenge it used 720.

This was possible because the system only required limited communication between the various “worker” algorithms testing the candidate policies – scaling reinforcement algorithms like DeepMind’s have to communicate a lot more. Additionally, the new system didn’t need to use “backpropagation,” a common neural network learning technique – this effectively compares the network’s input with the desired output and then feeds the resulting information back into the network to help optimise it.

When combined this helped make the new systems code shorter, and the algorithm three to four times faster. But the approach has its limitations. These kinds of algorithms are usually compared based on their data efficiency – the number of iterations required to achieve a specific score in a game, and using this metric, the OpenAI approach did worse than the traditional reinforcement learning approaches, even though it carried out those iterations much quicker.


See also
ORNL and UPS show off first of a kind 20Kw wireless EV charging system


For supervised learning problems, for example, such as image classification and speech recognition it was up to 1,000 times slower than approaches that use backpropagation. And that’s bad.

Nevertheless, the work demonstrated promising new applications for what were once thought obsolete evolutionary approaches, and OpenAI isn’t the only group investigating them, Google has also been experimenting with older algorithms, so while I don’t know about dogs, it certainly looks like you can teach old algorithms new tricks.

Related Posts

Leave a comment


Awesome! You're now subscribed.

Pin It on Pinterest

Share This