Google DeepMind is teaching AI to play Diplomacy before taking on the real thing

0 2

By Matthew Griffin Intelligence and the Senses 10th December 2021

WHY THIS MATTERS IN BRIEF

AI might be getting alot better at strategy, but sometimes diplomacy is called for and that’s a skill that AI has yet to master …

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, read our codexes, or browse my blog.

Now that Google DeepMind, one of the world’s most famous Artificial Intelligence (AI) outfits, has taught AI to master the game of everything from Chess to Go, Dota to StarCraft, as well as inadvertently helped the US military build what could very well be one of the world’s first AI fighter pilots, they’ve turned their attention to another board game – Diplomacy, and their research could have real world implications for how AI helps humans, from business leaders to everyday Joe’s, and other machines, as well as AI powered politicians, negotiate, discover each others weaknesses, and reach agreement in the future.

Unlike Go Diplomacy is a seven player game and requires a combination of competition and cooperation to win, and on each turn players make moves simultaneously so they must reason about what others are reasoning about them, and so on …

“[Diplomacy] is a qualitatively different problem from something like Go or chess,” says Andrea Tacchetti, a computer scientist at DeepMind. In December, Tacchetti and collaborators presented a paper at the NeurIPS conference on their system, which advances the state of the art, and may point the way toward AI systems with real-world diplomatic skills, which is no doubt something that the world’s first AI politician, developed in China, would find useful – whether it’s in negotiating with strategic or commercial partners, or simply scheduling its next team meeting.

Diplomacy is a strategy game played on a map of Europe divided into 75 provinces. Players build and mobilise military units to occupy provinces until someone controls a majority of supply centers. Each turn, players write down their moves, which are then executed simultaneously. They can attack or defend against opposing players’ units, or support opposing players’ attacks and defenses, building alliances. In the full version, players can negotiate. DeepMind tackled the simpler No-Press Diplomacy, devoid of explicit communication.

Historically, AI has played Diplomacy using hand-crafted strategies. In 2019, the Montreal Research Institute Mila beat the field with a system using deep learning. They trained a neural network they called DipNet to imitate humans, based on a dataset of 150,000 human games. DeepMind started with a version of DipNet and refined it using reinforcement learning, a kind of trial-and-error.

Exploring the space of possibility purely through trial-and-error would pose problems, though. They calculated that a 20 move game can be played nearly 10⁸⁶⁸ ways – yes, that’s 10 with 868 zeroes after it.

So they tweaked their reinforcement learning algorithm. During training, on each move, they sample likely moves of opponents, calculate the move that works best on average across these scenarios, then train their net to prefer this move. After training, it skips the sampling and just works from what its learning has taught it.

“The message of our paper is: we can make reinforcement learning work in such an environment,” Tacchetti says. One of their AI players versus six DipNets won 30 percent of the timE, with 14 percent being chance. One DipNet against seven of theirs won only 3 percent of the time.

This April, Facebook will present a paper at the ICLR conference describing their own work on No-Press Diplomacy. They also built on a human-imitating network similar to DipNet. But instead of adding reinforcement learning, they added search – the techniques of taking extra time to plan ahead and reason about what every player is likely to do next.

On each turn, SearchBot computes an equilibrium, a strategy for each player that the player can’t improve by switching only its own strategy. To do this, SearchBot evaluates each potential strategy for a player by playing the game out a few turns, assuming everyone chooses subsequent moves based on the net’s top choice. A strategy consists not of a single best move but a set of probabilities across 50 likely moves , suggested by the net, to avoid being too predictable to opponents.

Conducting such exploration during a real game slows SearchBot down, but allows it beat DipNet by an even greater margin than DeepMind’s system does. SearchBot also played anonymously against humans on a Diplomacy website and ranked in the top 2 percent of players.

“This is the first bot that’s demonstrated to be competitive with humans,” says Adam Lerer, a computer scientist at Facebook and paper co-author. “I think the most important point is that search is often underestimated,” Lerer says. One of his Facebook collaborators, Noam Brown, implemented search in a superhuman poker bot. Brown says the most surprising finding was that their method could find equilibria, a computationally difficult task.

“I was really happy when I saw their paper,” Tacchetti says, “because of just how different their ideas were to ours, which means that there’s so much stuff that we can try still.” Lerer sees a future in combining reinforcement learning and search, which worked well for DeepMind’s AlphaGo.

Both teams found that their systems were not easily exploitable. Facebook, for example, invited two top human players to each play 35 straight games against SearchBot, probing for weaknesses. The humans won only 6 percent of the time. Both groups also found that their systems didn’t just compete, but also cooperated, sometimes supporting opponents.

“They get that in order to win, they have to work with others,” says Yoram Bachrach, from the DeepMind team.

That’s important, Bachrach, Lerer, and Tacchetti say, because games that combine competition and cooperation are much more realistic than purely competitive games like Go. Mixed motives occur in all realms of life: driving in traffic, negotiating contracts, and arranging times to Zoom.

How close are we to AI that can play Diplomacy with “press,” negotiating all the while using natural language?

“For Press Diplomacy, as well as other settings that mix cooperation and competition, you need progress,” Bachrach says, “in terms of theory of mind, how they can communicate with others about their preferences or goals or plans. And, one step further, you can look at the institutions of multiple agents that human society has. All of this work is super exciting, but these are early days.”

Matthew Griffin / About Author

Matthew Griffin, multi-award winning Futurist and named Futurist of the Year 2024, has been described as a "Walking encyclopaedia of the future" by NASA and a futurist polymath. One of the world's most renowned futurists and strategic foresight experts Matthew is the 15 times author of the blockbuster "Codex of the Future" series, and is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working across the next 50 years, XPotential University, the world's first free futures and foresight university, and the World Futures Forum which works with the United Nations to solve the worlds greatest challenges. Matthew is an in demand international keynote, acclaimed university lecturer and mentor, and host of the hit Fanatical Futurist podcast.

A rare talent in his past Matthew helped build and run several multi-billion dollar business units for Atos, Dell-EMC, and IBM, and his ability to identify, track, and explain the impacts of hundreds of emerging technologies and trends on global business, culture, and society has earned him a powerful reputation and a roster of clients that include royal households, world leaders, G7, G20, and G77+ governments, and many of the world's most respected brands including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi Group, Coca Cola, Dentons, Deloitte, Disney, Dow, EY, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, T-Mobile, UBS, VISA, and many others. He was also the only futurist invited to talk at the UN COP28 held in Dubai alongside world leaders.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.