Skip to content

How to build your own AI and win at Go.


    Disclaimer: This article appeared in the April 2016 issue of Novasia Magazine. The article provides an introduction to some of the basic concepts used in programming AI. By all means, this is popsci level. The most experienced reader will find more useful looking directly at the sources listed at the bottom of the page.


    How to build your own AI and win at Go.

    by Cesare Scartozzi

    DeepMind’s remarkable wins at Go over Lee Se-dol have marked an important milestone in the history of artificial intelligence (AI) and humankind. AI is here, and it is already outsmarting us. Its potential has never been as tangible as today and we should start thinking about the revolution that it will soon bring to our lives and societies.

    Over the course of this article, in order to better understand AI, I will describe the theory and procedures that can be used to build an AI capable of playing Go. In doing so, I hope to eventually give you some direction on how to build your own AI and win at Go over Lee Se-dol. Since the procedures that I am going to explain are rather simple, we should call the AI capable of executing them SimpleMind.

    The first step in building an AI consists in understanding what intelligence is. We can define intelligence in practical terms as the capability to acquire and apply knowledge and skills. Thus, an AI is a machine or software that exhibits this capability. It does not have to be sentient or self-aware, it just needs to be able to learn and process knowledge, act with autonomy, and develop skills. Indeed, most AIs currently in commercial use tend to be very task-oriented and only excel in specific games or functions, like speech recognition or fraud detection for digital payments. Likewise, SimpleMind will only do one thing: play the game of Go.

    Nevertheless, building an AI that is good at playing Go is a daunting task. Go is not yet a solved game. In a solved game, assuming that both players are playing perfectly, we would be able to predict the outcome of the game from any position. For example, in the game of Connect Four, we can predict that if first-move player puts a stone in the central raw and plays perfectly, he will be assured the victory. Conversely, Go is infinitely more complicated that Connect Four and, while still being a perfect information game, it cannot be solved with any existing computational hardware. The complexity of Go lies in its simple rules and 19 x 19 grid, which create an exceptionally large number of possible moves and countermoves that can be played. To put that in perspective, the number of possible outcomes in Go is a number greater than the number of atoms in the universe, which is estimated to be about 1080.

    Since we cannot calculate all the possible strategies, we need a more heuristic approach to win at Go. We need to program an AI capable of autonomously developing its own moves and strategies. We can do so by following the teaching of the computer scientist John H. Holland, and build SimpleMind around three core procedures: performance system, credit assignment, and rule discovery.

    The first procedure is the most simple and essential of the three. In coding SimpleMind, we want to establish a simple syntax of stimulus-response rules. These rules are also known as ‘IF-THEN’ rules. Their function is to acquire knowledge and execute an action or trigger another rule. For instance, when you press the home button of your smartphone, the mobile will use the detector ‘IF the button is pressed’ to execute the effector ‘THEN go to the home menu.’ The rule could also be more precise and use multiple detectors and effectors at the same time. Like for example: ‘IF the button is pressed,’ and ‘IF it is held for 3 seconds,’ ‘THEN launch Siri.’ In a solved game like Connect Four, IF-THEN rules would be enough to allow SimpleMind to win the game, because in response to any given move of the other player, our AI would know what effectors to execute. However, in Go we cannot compute all existing possible stimulus-response rules.

    Credit assignment helps us to circumvent this problem. Instead of calculating all rules, we will limit SimpleMind to use only the most effective ones. In order to understand which rules are effective, we will assign a pay-off to each rule based on the outcome of the action it carries out. This procedure can easily be implemented when the game provides us with direct rewards and payoff. For instance, if we play Space Invaders, every action of SimpleMind would result in the destruction of a certain number of enemy’s spaceships. Consequentially, the rules that will destroy more enemies will have a higher credit than those that do not.  Unfortunately, Go works differently due to what is known as a credit assignment problem. A professional Go game can last up to six hours and see more than 300 moves, yet the game only provides one single final payoff, victory or loss. This makes credit assignment challenging. What payoff should we give to move number 21 or number 176? Even if we have won a game, not all our moves were necessarily effective and correct. How do we distinguish between those who were and those who were not?

    We can use a bucket brigade algorithm to strengthen the credit of rules that belong to a chain of actions that ends with a good payoff. In simple terms, this kind of algorithm works by creating a cascading effect by which a strong rule has to give some of its strength to the useful rules that have preceded it. This way, rule number 21 will be rewarded with a high payoff if it is part of the chain of moves that leads to victory.

    At this stage of development, SimpleMind would already be able to play Go using the two procedures of performance system and credit assignment. If we were to show it many different matches, it would be able to assign credit to each move and play using the most effective moves that it has observed. Probably, this would be enough to beat most amateur players. However, beating Lee Se-dol requires more sophistication. As a 9-dan rank Go player, Lee has spent his entire life playing and studying the masters. Not only does he know all strong moves and tactics, but he has also developed intuition and skills that our AI would not possess.

    Fortunately, we still have an ace to play: rule discovery. With this procedure we want SimpleMind to create new moves and tactics that mimic intuition and creativity. We want it to generate innovative strong tactics that Lee has never seen before. In other words, SimpleMind will have to create and discover new rules. We have two options for programming rule discovery. The first consist in asking the AI to create random rules and test them in game. This procedure can work for simple games, but it is inefficient due to its randomness. Moreover, it performs poorly in Go due to the immense amount of possible random tactics. SimpleMind would end up spending a huge amount of time and computational power to find only very few good moves.

    The second option would be to focus on the creation of new ‘plausible’ rules. Instead of creating random rules, we would want to use SimpleMind’s resources to create rules that we expect to be effective. We could do so with a procedure that is inspired by the history of science and evolution. Technological innovation, just like biological evolution, is a non-linear historical process where innovations are built upon cumulative experience and then tested in the environment. Eventually, the innovations that have high fitness, like opposable thumbs, stick around and serve as a base for greater innovations and discoveries. Thus, if we seek to find plausible good moves at Go, we could try to emulate evolution by programming a so-called genetic algorithm into SimpleMind. As it can be inferred, this algorithm would seek to discover new rules by mimicking chromosomal crossover. It would select couples of high credit moves and then cross them over to create an offspring move. In turn, the child move would then be evaluated and, if considered strong, crossed over with another rule. Through iteration of this procedure, SimpleMind could discover new moves and strategies with high potential. Using those moves, our AI would be able to cope with Lee’s attack tactics and even surprise him with unexpected moves. Finally, SimpleMind would be ready to retrieve information, process it, and create new knowledge. It would be ready to beat Lee Se-dol, and all it needs is a bit of time to compute the best strategies.

    The last lesson that I want to share about SimpleMind is also a disclaimer. AI is not easy to code, especially if we are not computer scientists, but the procedures and theories are rather intuitive and fascinating. After all, studying AI means posing questions to ourselves: how our brain works, how we think, and how we build knowledge. And this kind of inquiry, is one that we can all do regardless of our coding skills.


    References:

    Holland, John H. Hidden Order: How Adaptation Builds Complexity. New York: Addison-Wesley, 1995.

    Boccara, Nino. Modeling Complex Systems. New York: Springer, 2004.

    Zhong, Ning, Jiming Liu, Wu Jinglon, Yiyu Yao, Shengfu Lu, and Kuncheng Li. Web Intelligence Meets Brain Informatics: First WICI International Workshop, WImBI 2006 Beijing, China, December 15-16, 2006 Revised Selected and Invited Papers. Edited by J. G. Carbonell Siekmann and J. Subseries. Lecture Notes in Computer Science. Berlin: Springer, 2005.