DeepMind's newest AI thrashes human gamers at Stratego

0 3

By Matthew Griffin Intelligence and the Senses 23rd December 2022

WHY THIS MATTERS IN BRIEF

We all make decisions even in the absence of data and in order for AI to “take on the real world” it has to aswell but so far that’s been tricky.

Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, connect, watch a keynote, read our codexes, or browse my blog.

Artificial Intelligence (AI) hates uncertainty, yet to navigate our unpredictable world it needs to learn to make choices with imperfect information – as we do every single day. And this is just one reason why when AI’s have been able to cope adequately with uncertainty quite often the US military has conscripted them to take part in war game simulations …

This week though DeepMind just took a stab at solving this issue. The trick was to interweave game theory into an algorithmic strategy loosely based on the human brain called deep reinforcement learning. The result, DeepNash, toppled human experts in a highly strategic board game called Stratego. A notoriously difficult game for AI, Stratego requires multiple strengths of human wit: long-term thinking, bluffing, and strategizing, all without knowing your opponent’s pieces on the board.

The Future of Artificial Intelligence by keynote Matthew Griffin

“Unlike chess and Go, Stratego is a game of imperfect information: players cannot directly observe the identities of their opponent’s pieces,” DeepMind wrote in a blog post. With DeepNash, “game-playing AI systems have advanced to a new frontier.”

It’s not all fun and games though. AI systems that can easily manoeuvre the randomness of our world and adjust their “behaviour” accordingly could one day handle real-world problems with limited information, such as optimizing traffic flow to reduce travel time and, hopefully, quenching road rage as self-driving cars become ever more present.

“If you’re making a self-driving car, you don’t want to assume that all the other drivers on the road are perfectly rational, and going to behave optimally,” said Dr. Noam Brown at Meta AI, who wasn’t involved in the research.

DeepNash’s triumph comes hot on the heels of another AI advance this month, where an algorithm learned to play Diplomacy – a game that requires negotiation and cooperation to win. As AI gains more flexible reasoning, becomes more generalized, and learns to navigate social situations, it may also spark insights into our own brains’ neural processes and cognition.

In terms of complexity, Stratego is a completely different beast compared to chess, Go, or poker – all games that AI has previously mastered. The game is essentially capture the flag. Each side has 40 pieces they can place at any position on the board. Each piece has a different name and numerical rank, such as “marshal,” “general,” “scout,” or “spy.” Higher ranking pieces can capture lower ones. The goal is to eliminate the opposition and capture their flag.

Stratego is especially challenging for AI because players can’t see the location of their opponents’ pieces, both during initial setup and throughout gameplay. Unlike chess or Go, in which each piece and movement is in view, Stratego is a game with limited information. Players must “balance all possible outcomes” any time they make a decision, the authors explained.

This level of uncertainty is partly why Stratego has stumped AI for ages. Even the most successful game-play algorithms, such as AlphaGo and AlphaZero, rely on complete information. Stratego, in contrast, has a touch of Texas Hold ’em, a poker game DeepMind previously conquered with an algorithm. But that strategy faltered for Stratego, largely because of the length of game, which unlike poker, normally encompasses hundreds of moves.

The number of potential game plays is mind-blowing. Chess has one starting position. Stratego has over 10⁶⁶ possible starting positions – far more than all the stars in the universe. Stratego’s game tree, the sum of all potential moves in the game, totals a staggering 10⁵³⁵.

“The sheer complexity of the number of possible outcomes in Stratego means algorithms that perform well on perfect-information games, and even those that work for poker, don’t work,” said study author Dr. Julien Perolat at DeepMind. The challenge is “what excited us,” he said.

Stratego’s complexity means that the usual strategy for searching gameplay moves is out of the question. Dubbed the Monte Carlo tree search, a “stalwart approach to AI-based gaming,” the technique plots out potential routes – like branches on a tree – that could result in victory.

Instead, the magic touch for DeepNash came from the mathematician John Nash, portrayed in the film A Beautiful Mind. A pioneer in game theory, Nash won the Nobel Prize for his work for the Nash equilibrium. Put simply, in each game, players can tap into a set of strategies followed by everyone, so that no single player gains anything by changing their own strategy. In Statego, this brings about a zero-sum game: any gain a player makes results in a loss for their opponent.

Because of Stratego’s complexity, DeepNash took a model-free approach to their algorithm. Here, the AI isn’t trying to precisely model its opponent’s behaviour. Like a baby, it has a blank slate, of sorts, to learn. This set-up is particularly useful in early stages of gameplay, “when DeepNash knows little about its opponent’s pieces,” making predictions “difficult, if not impossible,” the authors said.

The team then used deep reinforcement learning to power DeepNash, with the goal of finding the game’s Nash equilibrium. It’s a match made in heaven: reinforcement learning helps decide the best next move at every step of the game, while DeepNash provides an overall learning strategy. To evaluate the system, the team also engineered a “tutor” using knowledge from the game to filter out obvious mistakes that likely wouldn’t make real-world sense.

As a first learning step, DeepNash played against itself in 5.5 billion games, a popular approach in AI training dubbed self-play. When one side wins, the AI gets awarded, and its current artificial neural network parameters are strengthened. The other side – the same AI – receives a penalty to dampen its neural network strength. It’s like rehearsing a speech to yourself in front of a mirror. Over time, you figure out mistakes and perform better. In DeepNash’s case, it drifts towards a Nash equilibrium for best gameplay.

The team tested the algorithm against other elite Stratego bots, some of which won the Computer Stratego World Championship. DeepNash squashed its opponents with a win rate of roughly 97 percent. When unleashed against Gravon – an online platform for human players – DeepNash trounced its human opponents. After over two weeks of matches against Gravon’s players in April this year, DeepNash rose to third place in all ranked matches since 2002.

It shows that bootstrapping human play data to AI isn’t needed for DeepNash to reach human-level performance – and beat it.

The AI also exhibited some intriguing behaviour with the initial setup and during gameplay. For example, rather than settling on a particular “optimized” starting position, DeepNash constantly shifted the pieces around to prevent its opponent from spotting patterns over time. During gameplay, the AI bounced between seemingly senseless moves – such as sacrificing high-ranking pieces – to locate the opponent’s even higher-ranking pieces upon counterattack.

DeepNash can also bluff. In one play, the AI moved a low-ranking piece as if it were a high-ranking one, luring the human opponent to chase after the piece with its high-ranking colonel. The AI sacrificed the pawn, but in turn, lured the opponent’s valuable spy piece into an ambush.

Although DeepNash was developed for Stratego, it’s generalizable to the real-world. The core method can potentially instruct AI to better tackle our unpredictable future using limited information – from crowd and traffic control to analyzing market turmoil, perhaps even war games.

“In creating a generalizable AI system that’s robust in the face of uncertainty, we hope to bring the problem-solving capabilities of AI further into our inherently unpredictable world,” the team said.

Matthew Griffin / About Author

Matthew Griffin is a multi-award winning Futurist and expert in Disruption and Innovation, Geopolitics, Leadership, and Technology, who NASA have described as a "walking encyclopaedia of the future" and a "futurist Polymath." 15-time best selling author of the "Codex of the Future" series, Matthew is the Founder and Futurist in Chief of the 311 Institute, a global Futures and Deep Futures advisory firm working with royal households, world leaders, G7, G20, and G77 governments, NGOs, and multi-national mid and mega cap firms to help them explore, shape, and lead the next 50 years of business and society.

An award-winning YouTube creator with over a million followers, with an unrivalled global reach and impact, Matthew is a highly sought-after international keynote speaker, lecturer, and mentor who collaborates with global leaders through the United Nations Alliance of Civilizations (UNAOC) and United Nations General Assembly (UNGA) to shape pivotal initiatives such as the UN’s AI for Humanity program, the United Nations Conference of the Parties (UN COP), and the World Economic Forum in Davos.

As the former Global Head of Cloud, National Security, and Enterprise Sales for companies including Atos, Dell-EMC, and IBM, Matthew has a proven track record of building multi-billion dollar business units and turning failing divisions into market leaders. His ability to identify, analyse, and communicate the implications of hundreds of emerging technologies and trends is unparalleled, and his insights are trusted by many of the world’s most respected organisations, including ABB, Accenture, Adidas, AON, ARM, BCG, Centrica, Citi, Coca-Cola, Dentons, Deloitte, Dow Jones, EY, Google, KPMG, Lego, Legal & General, LinkedIn, Microsoft, PepsiCo, Qualcomm, RWE, Samsung, Siemens AG and Siemens Energy, T-Mobile, UBS, VISA, Walmart, Workday, Worldpay and many others.

Regularly featured in the global media including the AP, BBC, Bloomberg, CNBC, Discovery, Forbes, Khaleej Times, Telegraph, TIME, ViacomCBS, WIRED, and the WSJ, Matthews mission is to help organisations create a fair and sustainable future whose benefits are shared by everyone irrespective of their ability, background, or circumstances.