Lee Sedol faced AlphaGo in 2016
AP Photo/Ahn Young-joon/Alamy
The first time that AlphaGo revealed its full power, it prompted a visceral reaction. Lee Sedol, the world’s greatest player of the ancient Chinese board game Go, had grown visibly agitated at the artificial intelligence’s prowess. The hushed crowd in downtown Seoul, South Korea, could barely contain its gasps. It was quickly dawning on Lee, and the tens of millions watching at home, that this AI was different to those that had come before.
It wasn’t just beating Lee, but it was doing so with an almost human-like aptitude. “AlphaGo actually does have an intuition,” Google co-founder Sergey Brin told New Scientist in 2016, shortly after AlphaGo went 3-0 up. “It makes beautiful moves. It even creates more beautiful moves than most of us could think of.”
The series ended with Google DeepMind’s AlphaGo system winning 4-1. Lee said he was “in shock”.
It is now a decade since this defining moment for AlphaGo and AI at large. Marvelling at AI is a commonplace experience with the success of large language models like ChatGPT. AlphaGo was, in many ways, our first glimpse at what was to come. Ten years on, what is the legacy of AlphaGo and has the technology lived up to its potential?
“Large language models are now quite different in some ways from AlphaGo, but there’s actually an underlying technological thread that really hasn’t changed,” says Chris Maddison at the University of Toronto, who was part of the original AlphaGo team.
That underlying technology is neural networks – mathematical structures inspired by the brain and written into code. Historically, creating a game-playing machine would involve a human writing down the rules it should follow in different situations. With a neural network, the machine learns by itself.
But even with a neural network, cracking Go was a tall order. The ancient Chinese game, which sees two players moving black and white counters to gain territory on a 19-by-19 board, allows for 10171 possible positions. By comparison, there are only 1080 atoms in the entire observable universe.
The breakthrough came from Maddison and his colleagues trying to recreate the intuition of a human player by training a neural network to predict the next strongest move based on millions of moves from real games. Human players, of course, wouldn’t need to play so many games to build up their intuition, but they also never could – a distinct advantage for AI.
AlphaGo also wasn’t restricted to learning from human players; it could play millions of games against itself to hone its skills. “By learning through these games, it could discover new knowledge and could go beyond human-level players,” says Pushmeet Kohli at Google DeepMind.
The final system that beat Lee was more complex than Maddison’s early models but the overarching message was simple: neural networks worked. “AlphaGo definitively showed that neural nets can do pattern recognition better than humans. They can essentially have intuition that surpasses humans,” says Noam Brown at OpenAI.
Other alphas
So what happened next? After AlphaGo, Google DeepMind and AI researchers set to applying that fundamental lesson to real-world applications, like in mathematics and biology. One of the most striking examples of this was AlphaFold, an AI that could predict how proteins would look in three-dimensional space from their chemical make-up far better than any human-designed program, and which won the team behind it the Nobel prize in chemistry.
More recently, another neural network-based AI, AlphaProof, performed at a gold medal-level in the International Mathematical Olympiad, a prestigious maths test for students, stunning mathematicians. “Not only can you get this beyond-human-level intelligence in a game, but you can get that experience in important scientific applications,” says Kohli.
The logic behind both the AlphaGo-style of AI and that used for large language models (LLMs) like ChatGPT is similar. The first step, called pretraining, involves feeding a neural network a large amount of human data, such as complete Go games, or the entire internet in the case of and an LLM. The second step, called post-training, then sees the network improve through a technique called reinforcement learning, which shows an AI what success looks like and lets it figure out how to achieve it.
For AlphaGo, this meant letting it play against itself millions of times until it found out the best winning strategies. For AlphaFold, it was about telling the AI what a successfully folded protein looked like and letting it figure out the rules. For ChatGPT, it’s telling the model which answers people like better, a process called reinforcement learning from human feedback, or giving it a solution to a defined problem, such as in maths or coding, and letting it work out how best to “reason” towards a solution by feeding its output back to itself, akin to how humans think out loud.
But this comes with drawbacks too. Neural networks are, in many ways, a black box. Despite efforts to find out how they work, many of them are too large and complex to understand at a basic level.
When AlphaGo made its now famous move 37, spectators initially thought the AI had gone mad, but it was only as the game progressed that it was clear it was a strategic masterstroke. However, Google DeepMind’s engineers couldn’t ask AlphaGo why it had made that move, and it could have just as easily been a mistake, which we would equally have been none the wiser about its reasoning for.
“These models will come up with answers and we will not know whether they are genius insights or hallucinations,” says Kohli. “We are still all actively working on trying to resolve those sorts of questions.”
A large part of AlphaGo’s achievement was that there was abundant data to initially feed the model and a clear definition of success. It makes sense, then, that the areas that AI is having the most success today are in fields where both of those conditions are also true, says Maddison, such as mathematics and programming, where it is easy to define, and verify, what is correct or incorrect. “The similarities between these approaches are telling us something, and it’s telling us what are the raw necessary ingredients for progress.”
Topics:
