Lee Sedol vs. AlphaGo [link to live feed]

mar · Post by **mar** » Tue Mar 15, 2016 6:51 pm

whereagles wrote:AlphaGo was given an honorary 9th Dan certificate. Cute

Absolutely! Overall the event was amazing, I'm also under the impression that people revolving around go can not only behave but also show respect to each other.
A lot to learn from them.

towforce · Post by **towforce** » Tue Mar 15, 2016 11:51 pm

Laskos wrote:
Laskos wrote:
towforce wrote: My experience of playing human chess masters is that I think I'm doing better than I expected, then suddenly a win for the opponent emerges.

If Crazy Stone genuinely had a good evaluation, it would be able to beat human opponents. Maybe it is weak at evaluating the "frameworks" that will eventually become territory?
It might also be related to some deeper tactics too, from what I saw, these "weak" (much stronger than me anyway) engines lose large fights too to strong humans, so it's not clear to me whether the general assessment of the position is to blame for their weakness.
AlphaGo lost to tesuji, so it seems I was about right. The evaluation can hardly help in unique long line fights, MCTS seems to be to blame. Let's see in the 5th game if it is a systematic weakness.

I have been thinking further about game four, and I have 2 further ideas I'd like to hear some feedback on, please:

1. AlphaGo's intelligence isn't very "generalised" - so when positions arise that don't suit its expertise, it under performs compared to a human of similar strength

2. the team have focused on getting an advantage and holding it. What they weren't aware of is that in honing that skill, they were making it very poor in losing positions. In winning positions, their program plays very well - but in losing positions, they should switch to an entirely different strategy (almost a completely different program) whose aim is nothing less than to create absolute bloody mayhem!

Laskos · Post by **Laskos** » Wed Mar 16, 2016 10:17 am

towforce wrote:
Laskos wrote:
Laskos wrote:
towforce wrote: My experience of playing human chess masters is that I think I'm doing better than I expected, then suddenly a win for the opponent emerges.

If Crazy Stone genuinely had a good evaluation, it would be able to beat human opponents. Maybe it is weak at evaluating the "frameworks" that will eventually become territory?
It might also be related to some deeper tactics too, from what I saw, these "weak" (much stronger than me anyway) engines lose large fights too to strong humans, so it's not clear to me whether the general assessment of the position is to blame for their weakness.
AlphaGo lost to tesuji, so it seems I was about right. The evaluation can hardly help in unique long line fights, MCTS seems to be to blame. Let's see in the 5th game if it is a systematic weakness.
I have been thinking further about game four, and I have 2 further ideas I'd like to hear some feedback on, please:

1. AlphaGo's intelligence isn't very "generalised" - so when positions arise that don't suit its expertise, it under performs compared to a human of similar strength

I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. It was interesting to see that in the points where AlhpaGo stumbled, Crazy Stone (MCTS too) stumbles badly too. In game 5, AlphaGo miscalculated a race to capture in lower right part, a thing even strong amateur players wouldn't do. Crazy Stone does the same, here is its evaluation of the whole game:

White and Black moves 24-28, where a sequence of unique tactical moves is required, are completely misevaluated. Crazy Stone thinks that White gained a large advantage, while it's an important tactical loss for White, almost game-changing. Also, observe from the graphic that Crazy Stone completely misses to notice all the fights which occurred later, and which were potentially game-changing too. AlphaGo is much better, but I bet it failed to see too the often game-changing nature of local fights, races to capture, invasions. And it's probably due to the inadequacy of the MCTS.

2. the team have focused on getting an advantage and holding it. What they weren't aware of is that in honing that skill, they were making it very poor in losing positions. In winning positions, their program plays very well - but in losing positions, they should switch to an entirely different strategy (almost a completely different program) whose aim is nothing less than to create absolute bloody mayhem!

I think this is easily corrected. It's probably not hard to make AlphaGo a bit weaker but more human in its goal: to capture as much territory as it can instead of purely maximizing its probability of win. Also when losing, it could go to some swindle mode and fool around by fighting for every local point, invading and such. That's double. I am not sure how the point 1) will be solved.

Uri Blass · Post by **Uri Blass** » Wed Mar 16, 2016 10:33 am

towforce wrote:
Laskos wrote:
Laskos wrote:
towforce wrote: My experience of playing human chess masters is that I think I'm doing better than I expected, then suddenly a win for the opponent emerges.

If Crazy Stone genuinely had a good evaluation, it would be able to beat human opponents. Maybe it is weak at evaluating the "frameworks" that will eventually become territory?
It might also be related to some deeper tactics too, from what I saw, these "weak" (much stronger than me anyway) engines lose large fights too to strong humans, so it's not clear to me whether the general assessment of the position is to blame for their weakness.
AlphaGo lost to tesuji, so it seems I was about right. The evaluation can hardly help in unique long line fights, MCTS seems to be to blame. Let's see in the 5th game if it is a systematic weakness.
I have been thinking further about game four, and I have 2 further ideas I'd like to hear some feedback on, please:

1. AlphaGo's intelligence isn't very "generalised" - so when positions arise that don't suit its expertise, it under performs compared to a human of similar strength

2. the team have focused on getting an advantage and holding it. What they weren't aware of is that in honing that skill, they were making it very poor in losing positions. In winning positions, their program plays very well - but in losing positions, they should switch to an entirely different strategy (almost a completely different program) whose aim is nothing less than to create absolute bloody mayhem!

I do not know much about go but I read that Alphago probably had a losing position and won the last game so I disagree that alphago is very poor in losing positions.

Uri Blass · Post by **Uri Blass** » Wed Mar 16, 2016 10:38 am

Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .

The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.

Laskos · Post by **Laskos** » Wed Mar 16, 2016 11:22 am

Uri Blass wrote:<snipped>
Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .
The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.

MC rollouts go pretty deep, but lack good pruning. It seems not only the policy network is easy to fool tactically, the value one too. During the game 4, although AlphaGo mistake was on move 79, the evaluation started to see the important loss only at move 87, 8 plies later. I am not sure what happened in game 5, but I bet it was the same, it entered a losing race evaluating it as winning. I don't know if it's possible to correct unique sequences of moves purely by networks, it seems a better search is required. It also seems that the situation in Go is opposite to that in Chess: AlphaGo would perform badly compared to humans on Go life and death problems, but better in quiet, incrementally chance improving moves. In Chess, engines are usually much better than humans on deep tactical winners, but (maybe) not that good on positional, quiet moves. Maybe AlphaGo can be improved by using some sort of test suites of Go problems.

Isaac · Post by **Isaac** » Wed Mar 16, 2016 7:52 pm

Uri Blass wrote:<snipped>
Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .
The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.

Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.

Laskos · Post by **Laskos** » Wed Mar 16, 2016 8:17 pm

Isaac wrote:
Uri Blass wrote:<snipped>
Laskos wrote:
I wouldn't say that. AlphaGo would play well in most, even weird, but quiet positions. It will approximate them just fine to a pattern. But this backfires when approximations don't work. In sequences of unique moves it suffers. In fact I would be curious to see how would AlphaGo perform on Go problems compared to reasonably strong humans (not even top professionals). In both games 4 and 5 AlphaGo miscalculated 10-12 "plies" races to capture, sequences of unique moves. In these races, pattern matching and approximations are not very useful, one has to have a better search. .
The interesting question is if humans saw more plies forward then alphago or maybe alphago did not evaluate correctly the position after the forced moves.
Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.

In fact, my guess is that the major improvement with AlphaGo may be in tactics, globally MCTS UCT engines were already prety good. I don't know how the clustering is done, but pattern matching after the clustering is probably the most efficient locally, with policy and value networks guiding the search. So, the improvement in tactics could come both from pruning (more important, but harder) and from better eval (easier with ML, but less rewarding).

Daniel Shawul · Post by **Daniel Shawul** » Wed Mar 16, 2016 8:38 pm

I think this is easily corrected. It's probably not hard to make AlphaGo a bit weaker but more human in its goal: to capture as much territory as it can instead of purely maximizing its probability of win. Also when losing, it could go to some swindle mode and fool around by fighting for every local point, invading and such. That's double. I am not sure how the point 1) will be solved.

They use a 50-50 mix of winning chance (monte-carlo simulaitions) and value-network (evaluation). One would think that using only the value network (100%) should solve the weak play in loosing positions. But I am not sure about it because the value network is adjusted to maximize winning probablity with self-play, and also the way it is originally constructed out of human-games with supervised learning. On the other hand, my simple Go program, which uses alpha-beta+LMR, uses a territory/influnece evaluation method that solves a PDE (sort of heat map) over the board. This would solve the weak play problem, because when you have big influence (even with dead stones) the program thinks you are always winning -- unlike a monte-carlo evaluation that could expose it as being a poor position. Therefore using an evaluation like that when in loosing position (according to MC simualtions) may help.

Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.

Even their complicated value network was not good enough to completely discard the monte-carlo simulations, otherwise, they could have used alpha-beta. If they did that, there would be no chance to ammend mis-evaluation of some patterns at runtime using montecarlo simualtions. I think we have already seen in game 4 whre the value network misevaluated a tesuji pattern.

Laskos · Post by **Laskos** » Thu Mar 17, 2016 5:46 am

Daniel Shawul wrote:
I think this is easily corrected. It's probably not hard to make AlphaGo a bit weaker but more human in its goal: to capture as much territory as it can instead of purely maximizing its probability of win. Also when losing, it could go to some swindle mode and fool around by fighting for every local point, invading and such. That's double. I am not sure how the point 1) will be solved.
They use a 50-50 mix of winning chance (monte-carlo simulaitions) and value-network (evaluation). One would think that using only the value network (100%) should solve the weak play in loosing positions. But I am not sure about it because the value network is adjusted to maximize winning probablity with self-play, and also the way it is originally constructed out of human-games with supervised learning. On the other hand, my simple Go program, which uses alpha-beta+LMR, uses a territory/influnece evaluation method that solves a PDE (sort of heat map) over the board. This would solve the weak play problem, because when you have big influence (even with dead stones) the program thinks you are always winning -- unlike a monte-carlo evaluation that could expose it as being a poor position. Therefore using an evaluation like that when in loosing position (according to MC simualtions) may help.

Before alphago, monte carlo implementations read until the very last move of the game (even from move 1 of the game). With alphago they changed this and they truncated the plies reached, replacing the last ply with an evaluation, like in computer chess I believe. But still, the number of plies reached is, I guess, more around 20 than 8 in average. If I remember well this information was given in the paper on Alphago.
I had read before that Monte Carlo programs have difficulties with semeai (capturing races). For some reason, they don't seem to count well the number of liberties. Maybe this is also present in Alphago.
Even their complicated value network was not good enough to completely discard the monte-carlo simulations, otherwise, they could have used alpha-beta. If they did that, there would be no chance to ammend mis-evaluation of some patterns at runtime using montecarlo simualtions. I think we have already seen in game 4 whre the value network misevaluated a tesuji pattern.

How hard is in your view to improve AlpahaGo tactically? My guess is that the path would be: better eval -> better localization -> better pruning heuristics. That would still introduce some misses, and maybe they are already doing it. After thinking a bit yesterday, I realized that the major improvement compared to Crazy Stone is that based on better local eval, AlphaGo is better _tactically_, but still not good enough not to show some embarrassments like in games 4 and 5.

Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]

Re: Lee Sedol vs. AlphaGo [link to live feed]