DeepMind's newest AI learns new skills by watching humans

0 3

By Matthew Griffin Intelligence and the Senses 13th December 2023

WHY THIS MATTERS IN BRIEF

We think that we need data to train AI, but data comes in many forms, and new AI’s are being trained in many new ways that advance their capabilities and skills.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.

Teaching algorithms to mimic humans typically requires hundreds or thousands of examples. But a new Artificial Intelligence (AI) from Google DeepMind can pick up new skills from human demonstrators on the fly by just watching them, similar to what we say with the Baxter robot from MIT a while ago which, in that case, used telepathy to learn new things from humans.

One of humanity’s greatest tricks is our ability to acquire knowledge rapidly and efficiently from each other. This kind of social learning, often referred to as cultural transmission, is what allows us to show a colleague how to use a new tool or teach our children nursery rhymes.

The Future of AI, Cyber, and Data, by Keynote Matthew Griffin

It’s no surprise that researchers have tried to replicate the process in machines. Imitation learning, in which AI watches a human complete a task and then tries to mimic their behaviour, has long been a popular approach for training robots. But even today’s most advanced deep learning algorithms typically need to see many examples before they can successfully copy their trainers.

When humans learn through imitation, they can often pick up new tasks after just a handful of demonstrations. Now, Google DeepMind researchers have taken a step toward rapid social learning in AI with agents that learn to navigate a virtual world from humans in real time.

“Our agents succeed at real-time imitation of a human in novel contexts without using any pre-collected human data,” the researchers write in a paper in Nature Communications. “We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission.”

The researchers trained their agents in a specially designed simulator called GoalCycle3D. The simulator uses an algorithm to generate an almost endless number of different environments based on rules about how the simulation should operate and what aspects of it should vary.

In each environment, small blob-like AI agents must navigate uneven terrain and various obstacles to pass through a series of coloured spheres in a specific order. The bumpiness of the terrain, the density of obstacles, and the configuration of the spheres varies between environments.

The agents are trained to navigate using reinforcement learning. They earn a reward for passing through the spheres in the correct order and use this signal to improve their performance over many trials. But in addition, the environments also feature an expert agent – which is either hard-coded or controlled by a human – that already knows the correct route through the course.

Over many training runs, the AI agents learn not only the fundamentals of how the environments operate, but also that the quickest way to solve each problem is to imitate the expert. To ensure the agents were learning to imitate rather than just memorizing the courses, the team trained them on one set of environments and then tested them on another. Crucially, after training, the team showed that their agents could imitate an expert and continue to follow the route even without the expert.

This required a few tweaks to standard reinforcement learning approaches.

The researchers made the algorithm focus on the expert by having it predict the location of the other agent. They also gave it a memory module. During training, the expert would drop in and out of environments, forcing the agent to memorize its actions for when it was no longer present. The AI also trained on a broad set of environments, which ensured it saw a wide range of possible tasks.

It might be difficult to translate the approach to more practical domains though. A key limitation is that when the researchers tested if the AI could learn from human demonstrations, the expert agent was controlled by one person during all training runs. That makes it hard to know whether the agents could learn from a variety of people.

More pressingly, the ability to randomly alter the training environment would be difficult to recreate in the real world. And the underlying task was simple, requiring no fine motor control and occurring in highly controlled virtual environments.

Still, social learning progress in AI is welcome. If we’re to live in a world with intelligent machines, finding efficient and intuitive ways to share our experience and expertise with them will be crucial.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.