AlphaGo Zero: self-taught AI plays better than humans

On October the 18th, DeepMind, a Google-owned company, released its new Artificial intelligence program: Alphago Zero. The team who worked on AlphaGo Zero published a paper in Nature yesterday, explaining why this new model represents such a huge step-forward for AI research.

AlphaGo: the first version

Last year, DeepMind came out with its first version of AlphaGo. The team’s aim was to create an AI program that would play the game of Go better than any human being ever could. The first AlphaGo was already a success: in a series of games, it repeatedly defeated Mr Lee Sedol, the former world’s best player of Go.

The event represented a milestone in the history of AI, as it was previously thought that no machine could ever beat a professional player at this game.

The rules of Go are rather simple: two players have to place so-called black or white “stones” on an empty intersection point of a 19x19 board. The objective for each player is to surround the stones of the adversary with his or her own stones, in order to “capture” them. Once a group of stones is captured, they are removed from the board. The game ends when neither players wish to make a further move.

The difficulty of Go, from an AI standpoint, is that the 19x19 board allows for 361 different first moves, followed by other 360 in response, and so on.

Overall, the total number of different arrangements for the board is in the order of 10^170, more than the total number of atoms in the observable universe. Due to the complexity of the game, only humans where thought to be able to master it, which eventually has proved to be an incorrect assumption.

The first AlphaGo was taught how to play Go by using thousands of games played by human experts in the past. The program would extract the tactics and strategies those players employed and, once it mastered them, it further improved by playing millions of unsupervised matches against itself. This process is called supervised learning, as the machine is instructed on the basis of human examples. However, this type of learning is limited, since it relies on the availability of big sets of data that show the machine how to perform the required task.

Furthermore, the AI will be subject to human limits, since its learning is bounded to pre-existent human knowledge.

AlphaGo Zero: another step forward

AlphaGo Zero shows great improvements with respect to all its predecessors. The most important difference is that it actually taught itself how to play Go, without any input from human experts. But how was this possible? At the beginning, AlphaGo Zero was instructed only on the rules of the game: it was a so-called “tabula rasa”. Subsequently, the program would start to play games against increasingly competitive versions of itself. The AI program would award itself points for a win and deduct points for a loss, thereby learning from its own mistakes.

At first, AlphaGo Zero was placing the stones on the board quite randomly. However, after just one day, its expertise reached the one of an advanced human professional. After two days, AlphaGo Zero had surpassed the first version of the program, AlphaGo, and after 40 days of training and 30 million games, the AI was ready to defeat the former best player of Go: the AlphaGo Master.

Watching AlphaGo Zero teaching itself to play Go was compelling. The AI program covered centuries of human history in slightly more than a month and then surpassed human abilities. In fact, AlphaGo Zero firstly rediscovered tactics and strategies humans had developed over thousands of years: for example, it employed the joseki move, which is also taught to human players.

However, most astonishingly, after several trials, the computer discarded those tactics and preferred others of its own invention, previously unknown to human players. Towards the end, AlphaGo Zero was playing with a definitely non-human style.

However, AlphaGo Zero is not unique only in terms of its ability to play, but it has other considerable advantages. First of all, its pace of learning is much faster than that of any other predecessors. The reason is the use of a single, rather than a double, neural network. Simply put, instead of exploring all possible outcomes for each combination of moves, the program just predicts the winner of a match. This makes the algorithm both stronger and more efficient.

Another consequence of the use of a single network is that the required hardware is ten times less expensive than the one used by the previous versions (it still costs UD$25 million, by the way).

Practical consequences

Of course, the biggest accomplishment of the DeepMind team was to create a self-taught program. Having an AI that does not require the guidance of humans means that we can use these machines to solve problems that we, as humans, do not understand. The implications of this discovery are, therefore, unthinkable. However, AI advancements are far from making human intervention unneeded. Nevertheless, AI could become a precious facilitator and assistant for biological brains.

The most promising fields of application are currently biology and chemistry.

For example, AI could help understanding the way proteins fold, which could foster great improvements in drug discovery. But only imagination can really set a limit to the possibilities of artificial intelligence.

AlphaGo Zero: self-taught AI plays better than humans

The newly released AI program by DeepMind is Go top player and it learnt to play alone.

AlphaGo: the first version

AlphaGo Zero: another step forward

Practical consequences