WHY THIS MATTERS IN BRIEF
The development of an AGI that can master tens, hundreds, and eventually thousands or millions of subjects is the “holy grail” of AI, and it will change every corner of society.
One of the most significant Artificial Intelligence (AI) milestones in history was quietly ushered into being this summer. I am, of course, am speaking about the quest for Artificial General Intelligence (AGI), probably the most sought after goal in the entire field of computer science. With the introduction of the Impala architecture, DeepMind, the company behind AlphaGo and the self-learning AlphaZero, now has AGI firmly in its sights, and while many people predicted the first AGI’s would emerge in or around 2035 we know know that date should be 2018. A staggering 18 years early – even if Impala is, by all interpretations, a basic first generation AGI.
Firstly let me define AGI, since it’s been used by different people to mean lots of different things, including the latest, and also revolutionary breakthrough for “General AI” which was realised earlier this year. Unlike today’s so called narrow AI’s that can only learn one thing very well AGI is a single intelligence, or algorithm, that can learn multiple tasks and exhibits “positive memory transfer” when doing so, sometimes called meta-learning. During meta-learning, the acquisition of one skill helps the learner to pick up another new skill faster, just as we ourselves do when we’re learning, because it applies some of its previous “know-how” to the new task. In other words, one learns how to learn — and can generalise that to acquiring new skills, the way humans do. This has been the holy grail of AI for a long time.
Lifting the lid on its revolutionary Impala AGI
As it currently exists, AI shows little ability to transfer learning towards new tasks. Typically, it must be trained anew every time from scratch, although even the way AI’s learn is changing as new more powerful AI’s being to figure out how to evolve and self-learn, like the ones from OpenAI and Baidu, which achieved the “Zero shot learning” goal, which both hit those milestones last year. For instance, the same neural network that makes recommendations to you for a Netflix show cannot use that learning to suddenly start making meaningful grocery recommendations. Even these single-instance “narrow” AIs can be impressive though, such as IBM Watson or Google’s self-driving car tech. However, these aren’t nearly so much so an artificial general intelligence, which could conceivably unlock the kind of recursive self-improvement variously referred to as the “intelligence explosion” or “Singularity” which many estimate will happen in the mid 2040’s.
Those who thought that the development of the first AGI’s would be sometime in the far and distant future would now be wise to think again. To be sure, DeepMind has made inroads into AGI before when they released the world’s first breakthrough blueprint for an AGI architecture in March last year, as well as their work on Psychlab and Differentiable Neural Computers. However, Impala is their largest and most successful effort to date, showcasing a single algorithm that can learn 30 different challenging tasks requiring various aspects of learning, memory, and navigation.
But enough preamble, let’s look under the hood and see what makes Impala tick. First, Impala’s based on reinforcement learning, an AI technique that has its origins in behaviorism. It parallels the way humans build up an intuition-based skill, such as learning to walk or riding a bicycle. Reinforcement learning has already been used for some amazing achievements, such as endowing an AI with emotions, see the video below, and learning complex games like Go and Poker, like the Liberatus AI did recently when it whipped the world’s top poker players.
However even these reinforcement learning algorithms couldn’t transfer what they’d learned about one task to acquiring a new task. In order to realise this achievement, DeepMind supercharged a reinforcement learning algorithm called A3C. In so-called actor-critic reinforcement learning, of which A3C is one variety, acting and learning are decoupled so that one neural network, the critic, evaluates the other, the actor. Together, they drive the learning process. This was already the state of the art, but DeepMind added a new off-policy correction algorithm called V-trace to the mix, which made the learning more efficient, and crucially, better able to achieve positive transfer between tasks.
To be sure, while DeepMind’s AGI breakthrough doesn’t herald the dawn of “conscious robots,” DeepMind also recently announced they’d built new AI models that “give their AI’s an imagination,” something that might also help these same Impala based AGI’s become more effective learners.
As more milestones fall, whether it’s the rise of creative machines that can create their own pop music, or innovate new products, such as the world’s first self-evolving robot that uses AI to evolve its design and then a 3D printer to print itself off, the emergence of AGI will likely be one of the biggest of them all, and as DeepMind continue to develop and iterate their AGI it might not be long before these powerful AGI’s affect and influence every corner of society. This is potentially history in the making.