Scroll Top

Google DeepMind is teaching AI to play Diplomacy before taking on the real thing


AI might be getting alot better at strategy, but sometimes diplomacy is called for and that’s a skill that AI has yet to master …


Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential Universityconnect, watch a keynote, read our codexes, or browse my blog.

Now that Google DeepMind, one of the world’s most famous Artificial Intelligence (AI) outfits, has taught AI to master the game of everything from Chess to Go, Dota to StarCraft, as well as inadvertently helped the US military build what could very well be one of the world’s first AI fighter pilots, they’ve turned their attention to another board game – Diplomacy, and their research could have real world implications for how AI helps humans, from business leaders to everyday Joe’s, and other machines, as well as AI powered politicians, negotiate, discover each others weaknesses, and reach agreement in the future.


See also
The world's largest hedge fund is using AI to automate its entire management team


Unlike Go Diplomacy is a seven player game and requires a combination of competition and cooperation to win, and on each turn players make moves simultaneously so they must reason about what others are reasoning about them, and so on …

“[Diplomacy] is a qualitatively different problem from something like Go or chess,” says Andrea Tacchetti, a computer scientist at DeepMind. In December, Tacchetti and collaborators presented a paper at the NeurIPS conference on their system, which advances the state of the art, and may point the way toward AI systems with real-world diplomatic skills, which is no doubt something that the world’s first AI politician, developed in China, would find useful – whether it’s in negotiating with strategic or commercial partners, or simply scheduling its next team meeting.


See also
The future of jobs in a Machine World


Diplomacy is a strategy game played on a map of Europe divided into 75 provinces. Players build and mobilise military units to occupy provinces until someone controls a majority of supply centers. Each turn, players write down their moves, which are then executed simultaneously. They can attack or defend against opposing players’ units, or support opposing players’ attacks and defenses, building alliances. In the full version, players can negotiate. DeepMind tackled the simpler No-Press Diplomacy, devoid of explicit communication.

Historically, AI has played Diplomacy using hand-crafted strategies. In 2019, the Montreal Research Institute Mila beat the field with a system using deep learning. They trained a neural network they called DipNet to imitate humans, based on a dataset of 150,000 human games. DeepMind started with a version of DipNet and refined it using reinforcement learning, a kind of trial-and-error.


See also
AI bests human experts at detecting Breast Cancer to help save lives


Exploring the space of possibility purely through trial-and-error would pose problems, though. They calculated that a 20 move game can be played nearly 10868 ways – yes, that’s 10 with 868 zeroes after it.

So they tweaked their reinforcement learning algorithm. During training, on each move, they sample likely moves of opponents, calculate the move that works best on average across these scenarios, then train their net to prefer this move. After training, it skips the sampling and just works from what its learning has taught it.

“The message of our paper is: we can make reinforcement learning work in such an environment,” Tacchetti says. One of their AI players versus six DipNets won 30 percent of the timE, with 14 percent being chance. One DipNet against seven of theirs won only 3 percent of the time.


See also
Microsoft's AI has learned to generate images from captions


This April, Facebook will present a paper at the ICLR conference describing their own work on No-Press Diplomacy. They also built on a human-imitating network similar to DipNet. But instead of adding reinforcement learning, they added search – the techniques of taking extra time to plan ahead and reason about what every player is likely to do next.

On each turn, SearchBot computes an equilibrium, a strategy for each player that the player can’t improve by switching only its own strategy. To do this, SearchBot evaluates each potential strategy for a player by playing the game out a few turns, assuming everyone chooses subsequent moves based on the net’s top choice. A strategy consists not of a single best move but a set of probabilities across 50 likely moves , suggested by the net, to avoid being too predictable to opponents.

Conducting such exploration during a real game slows SearchBot down, but allows it beat DipNet by an even greater margin than DeepMind’s system does. SearchBot also played anonymously against humans on a Diplomacy website and ranked in the top 2 percent of players.


See also
Knupath unveils a new Machine Learning chip architecture


“This is the first bot that’s demonstrated to be competitive with humans,” says Adam Lerer, a computer scientist at Facebook and paper co-author. “I think the most important point is that search is often underestimated,” Lerer says. One of his Facebook collaborators, Noam Brown, implemented search in a superhuman poker bot. Brown says the most surprising finding was that their method could find equilibria, a computationally difficult task.

“I was really happy when I saw their paper,” Tacchetti says, “because of just how different their ideas were to ours, which means that there’s so much stuff that we can try still.” Lerer sees a future in combining reinforcement learning and search, which worked well for DeepMind’s AlphaGo.

Both teams found that their systems were not easily exploitable. Facebook, for example, invited two top human players to each play 35 straight games against SearchBot, probing for weaknesses. The humans won only 6 percent of the time. Both groups also found that their systems didn’t just compete, but also cooperated, sometimes supporting opponents.


See also
Researchers unveil an unhackable chip that even quantum computers can't crack


“They get that in order to win, they have to work with others,” says Yoram Bachrach, from the DeepMind team.

That’s important, Bachrach, Lerer, and Tacchetti say, because games that combine competition and cooperation are much more realistic than purely competitive games like Go. Mixed motives occur in all realms of life: driving in traffic, negotiating contracts, and arranging times to Zoom.

How close are we to AI that can play Diplomacy with “press,” negotiating all the while using natural language?

“For Press Diplomacy, as well as other settings that mix cooperation and competition, you need progress,” Bachrach says, “in terms of theory of mind, how they can communicate with others about their preferences or goals or plans. And, one step further, you can look at the institutions of multiple agents that human society has. All of this work is super exciting, but these are early days.”

Related Posts

Leave a comment


Awesome! You're now subscribed.

Pin It on Pinterest

Share This