Lee Sedol vs. AlphaGo [link to live feed]

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

mar
Posts: 2559
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by mar »

whereagles wrote:AlphaGo was given an honorary 9th Dan certificate. Cute :D
Absolutely! Overall the event was amazing, I'm also under the impression that people revolving around go can not only behave but also show respect to each other.
A lot to learn from them.
User avatar
towforce
Posts: 11589
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by towforce »

Laskos wrote:
Laskos wrote:
towforce wrote: My experience of playing human chess masters is that I think I'm doing better than I expected, then suddenly a win for the opponent emerges.

If Crazy Stone genuinely had a good evaluation, it would be able to beat human opponents. Maybe it is weak at evaluating the "frameworks" that will eventually become territory?
It might also be related to some deeper tactics too, from what I saw, these "weak" (much stronger than me anyway) engines lose large fights too to strong humans, so it's not clear to me whether the general assessment of the position is to blame for their weakness.
AlphaGo lost to tesuji, so it seems I was about right. The evaluation can hardly help in unique long line fights, MCTS seems to be to blame. Let's see in the 5th game if it is a systematic weakness.
I have been thinking further about game four, and I have 2 further ideas I'd like to hear some feedback on, please:

1. AlphaGo's intelligence isn't very "generalised" - so when positions arise that don't suit its expertise, it under performs compared to a human of similar strength

2. the team have focused on getting an advantage and holding it. What they weren't aware of is that in honing that skill, they were making it very poor in losing positions. In winning positions, their program plays very well - but in losing positions, they should switch to an entirely different strategy (almost a completely different program) whose aim is nothing less than to create absolute bloody mayhem!
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Laskos »

towforce wrote:
Laskos wrote:
Laskos wrote:
towforce wrote: My experience of playing human chess masters is that I think I'm doing better than I expected, then suddenly a win for the opponent emerges.

If Crazy Stone genuinely had a good evaluation, it would be able to beat human opponents. Maybe it is weak at evaluating the "frameworks" that will eventually become territory?
It might also be related to some deeper tactics too, from what I saw, these "weak" (much stronger than me anyway) engines lose large fights too to strong humans, so it's not clear to me whether the general assessment of the position is to blame for their weakness.
AlphaGo lost to tesuji, so it seems I was about right. The evaluation can hardly help in unique long line fights, MCTS seems to be to blame. Let's see in the 5th game if it is a systematic weakness.
I have been thinking further about game four, and I have 2 further ideas I'd like to hear some feedback on, please:

1. AlphaGo's intelligence isn't very "generalised" - so when positions arise that don't suit its expertise, it under performs compared to a human of similar strength
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. It was interesting to see that in the points where AlhpaGo stumbled, Crazy Stone (MCTS too) stumbles badly too. In game 5, AlphaGo miscalculated a race to capture in lower right part, a thing even strong amateur players wouldn't do. Crazy Stone does the same, here is its evaluation of the whole game:
Image
White and Black moves 24-28, where a sequence of unique tactical moves is required, are completely misevaluated. Crazy Stone thinks that White gained a large advantage, while it's an important tactical loss for White, almost game-changing. Also, observe from the graphic that Crazy Stone completely misses to notice all the fights which occurred later, and which were potentially game-changing too. AlphaGo is much better, but I bet it failed to see too the often game-changing nature of local fights, races to capture, invasions. And it's probably due to the inadequacy of the MCTS.
2. the team have focused on getting an advantage and holding it. What they weren't aware of is that in honing that skill, they were making it very poor in losing positions. In winning positions, their program plays very well - but in losing positions, they should switch to an entirely different strategy (almost a completely different program) whose aim is nothing less than to create absolute bloody mayhem!
I think this is easily corrected. It's probably not hard to make AlphaGo a bit weaker but more human in its goal: to capture as much territory as it can instead of purely maximizing its probability of win. Also when losing, it could go to some swindle mode and fool around by fighting for every local point, invading and such. That's double. I am not sure how the point 1) will be solved.
Uri Blass
Posts: 10314
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Uri Blass »

towforce wrote:
Laskos wrote:
Laskos wrote:
towforce wrote: My experience of playing human chess masters is that I think I'm doing better than I expected, then suddenly a win for the opponent emerges.

If Crazy Stone genuinely had a good evaluation, it would be able to beat human opponents. Maybe it is weak at evaluating the "frameworks" that will eventually become territory?
It might also be related to some deeper tactics too, from what I saw, these "weak" (much stronger than me anyway) engines lose large fights too to strong humans, so it's not clear to me whether the general assessment of the position is to blame for their weakness.
AlphaGo lost to tesuji, so it seems I was about right. The evaluation can hardly help in unique long line fights, MCTS seems to be to blame. Let's see in the 5th game if it is a systematic weakness.
I have been thinking further about game four, and I have 2 further ideas I'd like to hear some feedback on, please:

1. AlphaGo's intelligence isn't very "generalised" - so when positions arise that don't suit its expertise, it under performs compared to a human of similar strength

2. the team have focused on getting an advantage and holding it. What they weren't aware of is that in honing that skill, they were making it very poor in losing positions. In winning positions, their program plays very well - but in losing positions, they should switch to an entirely different strategy (almost a completely different program) whose aim is nothing less than to create absolute bloody mayhem!
I do not know much about go but I read that Alphago probably had a losing position and won the last game so I disagree that alphago is very poor in losing positions.
Uri Blass
Posts: 10314
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Uri Blass »

<snipped>
Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .
The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Laskos »

Uri Blass wrote:<snipped>
Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .
The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.
MC rollouts go pretty deep, but lack good pruning. It seems not only the policy network is easy to fool tactically, the value one too. During the game 4, although AlphaGo mistake was on move 79, the evaluation started to see the important loss only at move 87, 8 plies later. I am not sure what happened in game 5, but I bet it was the same, it entered a losing race evaluating it as winning. I don't know if it's possible to correct unique sequences of moves purely by networks, it seems a better search is required. It also seems that the situation in Go is opposite to that in Chess: AlphaGo would perform badly compared to humans on Go life and death problems, but better in quiet, incrementally chance improving moves. In Chess, engines are usually much better than humans on deep tactical winners, but (maybe) not that good on positional, quiet moves. Maybe AlphaGo can be improved by using some sort of test suites of Go problems.
Isaac
Posts: 265
Joined: Sat Feb 22, 2014 8:37 pm

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Isaac »

Uri Blass wrote:<snipped>
Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .
The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.
Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Laskos »

Isaac wrote:
Uri Blass wrote:<snipped>
Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .
The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.
Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.
In fact, my guess is that the major improvement with AlphaGo may be in tactics, globally MCTS UCT engines were already prety good. I don't know how the clustering is done, but pattern matching after the clustering is probably the most efficient locally, with policy and value networks guiding the search. So, the improvement in tactics could come both from pruning (more important, but harder) and from better eval (easier with ML, but less rewarding).
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Daniel Shawul »

I think this is easily corrected. It's probably not hard to make AlphaGo a bit weaker but more human in its goal: to capture as much territory as it can instead of purely maximizing its probability of win. Also when losing, it could go to some swindle mode and fool around by fighting for every local point, invading and such. That's double. I am not sure how the point 1) will be solved.
They use a 50-50 mix of winning chance (monte-carlo simulaitions) and value-network (evaluation). One would think that using only the value network (100%) should solve the weak play in loosing positions. But I am not sure about it because the value network is adjusted to maximize winning probablity with self-play, and also the way it is originally constructed out of human-games with supervised learning. On the other hand, my simple Go program, which uses alpha-beta+LMR, uses a territory/influnece evaluation method that solves a PDE (sort of heat map) over the board. This would solve the weak play problem, because when you have big influence (even with dead stones) the program thinks you are always winning -- unlike a monte-carlo evaluation that could expose it as being a poor position. Therefore using an evaluation like that when in loosing position (according to MC simualtions) may help.
Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.
Even their complicated value network was not good enough to completely discard the monte-carlo simulations, otherwise, they could have used alpha-beta. If they did that, there would be no chance to ammend mis-evaluation of some patterns at runtime using montecarlo simualtions. I think we have already seen in game 4 whre the value network misevaluated a tesuji pattern.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Lee Sedol vs. AlphaGo [link to live feed]

Post by Laskos »

Daniel Shawul wrote:
I think this is easily corrected. It's probably not hard to make AlphaGo a bit weaker but more human in its goal: to capture as much territory as it can instead of purely maximizing its probability of win. Also when losing, it could go to some swindle mode and fool around by fighting for every local point, invading and such. That's double. I am not sure how the point 1) will be solved.
They use a 50-50 mix of winning chance (monte-carlo simulaitions) and value-network (evaluation). One would think that using only the value network (100%) should solve the weak play in loosing positions. But I am not sure about it because the value network is adjusted to maximize winning probablity with self-play, and also the way it is originally constructed out of human-games with supervised learning. On the other hand, my simple Go program, which uses alpha-beta+LMR, uses a territory/influnece evaluation method that solves a PDE (sort of heat map) over the board. This would solve the weak play problem, because when you have big influence (even with dead stones) the program thinks you are always winning -- unlike a monte-carlo evaluation that could expose it as being a poor position. Therefore using an evaluation like that when in loosing position (according to MC simualtions) may help.
Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.
Even their complicated value network was not good enough to completely discard the monte-carlo simulations, otherwise, they could have used alpha-beta. If they did that, there would be no chance to ammend mis-evaluation of some patterns at runtime using montecarlo simualtions. I think we have already seen in game 4 whre the value network misevaluated a tesuji pattern.
How hard is in your view to improve AlpahaGo tactically? My guess is that the path would be: better eval -> better localization -> better pruning heuristics. That would still introduce some misses, and maybe they are already doing it. After thinking a bit yesterday, I realized that the major improvement compared to Crazy Stone is that based on better local eval, AlphaGo is better _tactically_, but still not good enough not to show some embarrassments like in games 4 and 5.