When DeepMind, Google’s AI research outfit, set out to demonstrate its latest breakthrough, it had to confront an added twist: how do you set your robot free to play games on the internet without anyone realising they’re competing against it?
The company caused a stir when it announced that its AlphaGo AI had beaten a world-class player at the ancient Asian boardgame Go. A few months later, it beat the world number one player.
But for the deeply strategic real-time war game StarCraft II, it had a different goal: to reach “grandmaster” standard – putting it in the top 200 players worldwide – on the game’s public servers, building its ranking the same way any human player would. That meant being matched with a steadily improving cadre of other human players, and winning against them consistently enough to be promoted into those lofty ranks.
StarCraft may seem like an odd next step, for a team that has previously taken on chess and Go, but the game has some qualities that make it interesting to researchers. It’s real-time, with millions upon millions of possible actions each second, and a vastly more complex roster than the six pieces of chess. Most importantly, it features hidden information: for the first few minutes of the game, it’s impossible to even see what your opponent is doing, let alone work out what they’re planning.
That means strategies have to be flexible enough to account for surprises, and need to incorporate mind-games as well: feints and ambushes are possible in a way they aren’t in chess. And, of course, there’s an advantage in a community where even the best players in the world can be found playing each other online, ranked according to a very public algorithm, with a ton of data flying around.
Players were told the new AI, dubbed AlphaStar, would be online, and were given the option of opting-in to play it, but in order to ensure it achieved its rank fairly, it had to play its games anonymously, so that opponents didn’t spend more effort trying to trick it or break it than they did trying to win.
“There was a bit of a meme where people started asking ‘are you AlphaStar’ to others,” said DeepMind’s David Silver, one of the company’s co-founders and a lead author on the Nature paper announcing the StarCraft II victory, with a laugh. “We had the policy to just not chat – other than wishing people good luck, and then ‘good game’ at the end.”
Staying silent during the game is not uncommon. But the need to remain anonymous did also turn the experience from a test of raw skill into a sort of “Turing test for video games”, said Silver’s colleague Oriol Vinyals. “AlphaStar needed to play like a good human, not like a superhuman.”
That meant taking a different approach from previous StarCraft AIs, which tended to lean on the abilities that only a computer could have. In a game where human competitors track their “actions per minute”, for instance – how many times they use their mouse or keyboard – a professional-level player may hit three or four hundred, while some AIs were acting thousands or tens of thousands of times over a sixty second period. At other times, AIs were given near omniscience, with all the information available over the entire map plugged into their systems at once.
“We really wanted to have an interface that we believe was reasonable from a capability standpoint,” says Vinyals. “So we added this notion of a camera view, which is very crucial for players to control where in the map they’re actually focusing on, and we also reduced the peak actions per minute, to 22 actions in a span of five seconds.” In other words, the AI is forced to play much more like a human, only being able to have a portion of the game world “on screen” at any one time, and clicking a reasonably limited amount of times.
All of which is moot if the AI gives itself away by, well, playing like a robot. Luckily, it doesn’t – quite. In the first series of matches played publicly, in January, AlphaStar did exhibit one slightly mechanistic behaviour, falling prey to an almost cartoonish tactic where its opponent, the human player MaNa, moved a unit into and out of its field of view, changing its behaviour each time. Picture Winnie the Pooh following his own footsteps in the snow, and you’d get an idea of what the approach looked like. Hardly elegant, but it worked for MaNa to eke out the only win the humans scored over those first 11 matches.
More interestingly, the AI – which was initially trained through imitation learning, watching how human players performed to learn the basics of the game – did develop its own understanding of the best tactical play, occasionally differing from the generally accepted practice among pros. The intricacies are a bit specialist (Playing as Protoss, for instance, AlphaStar moved its probes to a second nexus earlier in the game than was seen as efficient), but reinforce the idea that simply teaching an AI to perform a task to human level can improve our understanding of the work itself.
“AlphaStar has been an amazing experience,” Oriol says. “Not because we beat most humans – I mean, we’ve beaten 99.8%, let’s not forget the 0.2% that are quite incredible. But it’s more like that we were able to see what some limitations might be, to inspire research that will come, you know, hopefully in the next few months or years and decades. Picking harder and harder problems and trying to be very good at them has been clearly the way so far.”