Announcing lczero

gladius · Post by **gladius** » Tue Jan 16, 2018 7:26 pm

Uri Blass wrote:
pferd wrote:This seems like a very interesting project.

I am playing some 5 minute games against it right now and it moves instantly every single time. Is this the expected behaviour?
Looking at the games it seems that something is wrong.

Google got level above GM strength.
I understand that they had hardware advantage but still
I think that it should be possible to get at least above level of 2000 by similiar methods and the level that I see that is simply stupid mistakes of losing pieces is not even close.

Beating the old version by result that is not close to 100-0 is too slow progress.

Google got something that get better if you give it more time.
If I understand correctly lczero does not get better with more time because it play immediately when time control is not relevant.

The whole point of this initial supervised learning run is to validate that the search is working well. It's still very early in the learning phases. Google played many orders of magnitude more games to get the results they did.

Things are looking good so far, but yes, it's still making huge blunders. We will see if it learns it's way out of them though

.

Damir · Post by **Damir** » Tue Jan 16, 2018 7:38 pm

Gary,

Are you going to setup lczero in Fishtest so people can test it there ?

pferd · Post by **pferd** » Tue Jan 16, 2018 8:50 pm

gladius wrote:
pferd wrote:This seems like a very interesting project.

I am playing some 5 minute games against it right now and it moves instantly every single time. Is this the expected behaviour?
This is because the default number of playouts is set to 800, and there is no time management.

You can increase the number of playouts by passing the -p argument on the command line, like -p20000, and the engine will think for longer, and play stronger.

Thanks. That helped improving the playing strength quite a bit. Using -p20000 --noponder made the engine think for approximately 5s/move on my machine.

It still offers its queen every once in a while though

Daniel Shawul · Post by **Daniel Shawul** » Tue Jan 16, 2018 10:04 pm

pferd wrote:
gladius wrote:
pferd wrote:This seems like a very interesting project.

I am playing some 5 minute games against it right now and it moves instantly every single time. Is this the expected behaviour?
This is because the default number of playouts is set to 800, and there is no time management.

You can increase the number of playouts by passing the -p argument on the command line, like -p20000, and the engine will think for longer, and play stronger.
Thanks. That helped improving the playing strength quite a bit. Using -p20000 --noponder made the engine think for approximately 5s/move on my machine.

It still offers its queen every once in a while though

This is the inherent tactical issues MCTS have -- which I am very eager to see how the NN could help eliminate it. Not offering your queen requires only a 2-ply lookahead knowledge which the mcts will solve with more simulations but resolving a 4 to 7 plies deep tactics needs tremendous amount of simulations compared to a brute forcer (maybe a 100x more effort is required). Lczero will not be competitive with alpha-beta engines until this issue is solved, lets see if more traiining or bigger network will help in this regard. Using a regular quiescence search instead of the NN, scorpioMCTS does not blunder pieces anymore but can fail to a 7-ply or more tactics -- it is about 100 elo stronger than TSCP.

Daniel

hgm · Post by **hgm** » Tue Jan 16, 2018 10:39 pm

In the eighties, when there was no hope to do a brute-force search of more than 4 ply, I used an extension scheme to see deeper tactics that tried to follow the lines that consisted of logical combinations of moves. That is, it would extend all captures that were new compared to the previous position with that side to move (e.g. recapture, moves over the square evacuated on the previous ply ('pins') and the ply before that ('discovered attacks')) by a full ply, and similar non-captures by a quarter ply. This worked pretty well: sacrifices that were pointless did not enable any follow-up moves, and thus could be refuted with small trees. But for sacrifices that would enable other stuff that other stuff would get extended, and if it would indeed lead to some return on investment, you would see it.

It seems to me that the MCTS move selection would need something similar. Note that the AlphaZero NN doesn't only get the current position as input, but also a few positions before it. This is necessary to recognize repettions. But it can also use it to learn what logical follow-ups are that often (but not always) give good results. E.g. capturing a protected Bishop with a Rook is a bad idea. But when the protector is soft-pinned (i.e. there is a follow-up capture over the square evacuated by the re-capture) it should be considered wth high priority. A NN should not have much problems recogizing such patterns, and recommending the move combinations.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jan 17, 2018 11:47 am

Do you think an MCTS engine which can look ahead to a fixed depth seach at every ply ( say a 7-ply lookahead) can be come competitive with a selective alpha-beta engine that can lookahead 20-plies ahead from the root ? This is the only way I see how it can beat an alpha-beta engine as it seems to me that an MCTS engine can not become tactically stronger at the same rate as the alpha-beta engine. Sure it can begin to realize shallow tactics with more simulation time but the deeper tactics (4 to 7-ply) require a lot more to recongnize.

Here is an example game where I increased the number of simulations to p=15000 for lczero. It takes about 60 secs per move on my intel hd graphics card (i guess very slow). So many blunders in there, e.g. look at the ending where it fails to find a mate-in-2 tactics. Note that the only difference b/n ScorpioMCTS and Lczero is i use qsearch instead of the NN -- which I allows me to do far more playouts/sec than lczero and hence resolve more shallow tactics by growing the tree faster. So far the policy NN is not helping in resolving tactics

[pgn]
[Event "Tour10"]
[Site "daniel-Satellite-C855"]
[Date "2018.01.17"]
[Round "1"]
[White "ScorpioMCTS"]
[Black "Lczero"]
[Result "1-0"]
[BlackElo "2200"]
[Time "03:11:16"]
[WhiteElo "2200"]
[TimeControl "0+60"]
[SetUp "1"]
[FEN "rq2kb1r/p2p1ppp/1pb1pn2/8/2P1P3/P1N5/1PQ1BPPP/R1B1K2R w KQkq - 0 1"]
[Termination "normal"]
[PlyCount "19"]
[WhiteType "program"]
[BlackType "program"]

1. Bg5 {(Bc1-g5 Qb8-e5 f2-f4 Qe5-d4 Ra1-d1 Qd4-e3 Qc2-d3 Qe3xd3 Be2xd3
Ra8-c8 Rh1-f1 h7-h6 Bg5xf6 g7xf6 Nc3-b5 Bc6xb5 c4xb5 Rh8-g8 g2-g3 Bf8-c5
Bd3-e2 Ke8-e7 Rd1-d3 e6-e5 f4xe5 f6xe5 Be2-h5 f7-f6 Rd3-f3 Rg8-g5 g3-g4
Rc8-f8 Rf3-c3 Bc5-d4 Rc3-c2 Rg5-g8 Ke1-e2 Rf8-c8 Rf1-c1 Rc8xc2 Rc1xc2
Ke7-d6 Ke2-f3 Kd6-e6 h2-h3 Ke6-e7 Kf3-g3 Rg8-g5 Kg3-f3 Rg5-g8) -0.11/50 60}
Bxa3 2. Rxa3 {(Ra1xa3 Qb8-e5 Bg5xf6 Qe5xf6 Ke1-g1 Ke8-g8 Rf1-a1 a7-a5
Be2-f3 Qf6-d8 Nc3-e2 Qd8-c7 Ne2-d4 Ra8-c8 Ra1-d1 f7-f5 Nd4xc6 d7xc6 c4-c5
b6xc5 Qc2xc5 Rc8-b8 Qc5-c4 Qc7-e5 e4xf5 Qe5xf5 Qc4-c3 Rf8-d8 Rd1-e1 Qf5-f6
Ra3xa5 Qf6xc3 b2xc3 Rb8-b2 Bf3xc6 Rd8-d2 Ra5-a8) -0.56/37 60} h6 3. Bxf6
{(Bg5xf6 g7xf6 Be2-f3 Qb8-f4 Nc3-e2 Qf4-e5 Ke1-g1 Ke8-g8 Rf1-d1 d7-d6
Ne2-d4 Rf8-c8 Nd4xc6 Rc8xc6 Bf3-e2 Kg8-h8 h2-h4 a7-a6 Rd1-a1 Rc6-c8 Ra3xa6
Ra8xa6 Ra1xa6) -0.60/23 60} gxf6 4. Bf3 {(Be2-f3 Qb8-f4 Nc3-e2 Qf4-e5
Ke1-g1 Ke8-g8 Rf1-d1 a7-a6 Ne2-d4 Rf8-d8 Qc2-d2 Qe5-g5 Qd2-b4 Qg5-c5 Qb4-c3
Bc6-b7 Qc3-d2 Qc5-g5 Qd2-b4 Qg5-c5 Qb4xc5 b6xc5 Nd4-e2) -0.59/23 60} a5 5.
O-O {(Ke1-g1 Qb8-f4 Nc3-e2 Qf4-h4 Rf1-d1 Rh8-g8 Ne2-g3 Rg8-g5 Qc2-c3 Rg5-c5
Qc3-d3 Ke8-f8 Ra3-b3 Bc6-a4 Qd3-d6 Kf8-g8 Rb3xb6 Ba4xd1 Qd6xc5 Bd1xf3 g2xf3
Qh4-g5 Qc5xg5 f6xg5) -0.61/24 60} O-O 6. Qd2 {(Qc2-d2 Kg8-h7 Qd2-e3 Qb8-c7
Ra3-b3 Ra8-b8 Rf1-d1 Rf8-g8 Bf3-h5 d7-d6 Qe3-f4 Rg8-g5 Bh5-f3 a5-a4 Rb3-b4
Kh7-g8 Qf4xf6) -0.65/17 60} e5 7. Qxh6 {(Qd2xh6 f6-f5 e4xf5 f7-f6 Bf3-d5
Bc6xd5 Qh6-g6 Kg8-h8 Nc3xd5 Rf8-f7 Ra3-h3) -0.96/11 60} b5 8. Bd1 {(Bf3-d1
Qb8-b6 Nc3-d5 Qb6xf2 Kg1xf2 b5xc4 Nd5xf6) -2.40/7 60} bxc4 9. Nd5 {(Nc3-d5
Bc6-b5 Nd5xf6) -10.00/3 60} Bxd5 10. Rg3# {(Ra3-g3) -10.00/1 59} 1-0
[/pgn]

hgm · Post by **hgm** » Wed Jan 17, 2018 2:23 pm

I have no experience with MCTS, but since you seem to need so many nodes to see so little, I wonder if it makes proper use of beta cutoffs. E.g. in a hypothetical situation where you have the choice between two moves, one somewhat better than the other, while after that it doesn't matter much what you play for many moves in the case where you chose the weaker move (i.e. a no-progress situation). The UCT algorithm would channel more playouts into the better move, but if the difference is only small, it would be forced to keep exploring the weaker move as well, to make sure the upper confidence bound of that stays below the one of the best move.

Now if I understand MCTS correctly, all visits of the weaker moves would homogeneously sample the sub-tree behind it, as there is no value difference there. So it will converge to a minimax tree. Which seems very wrong, as it would be good eough to have a refutation tree. If all these visits would refrain from exploring any other moves as long as the upper confidence bound of the one move they explore stays below beta (for which you could take the lower confidence bound of the good initial move), it just should keep exploiting that move. MCTS-UCT is modeled on the k-armed-bandit problem, (applying it to every node), where you want to make sure you exploit the best-paying bandit. But it should really try to find a bandit that is good enough.

Usually the k-armed-bandit problem is formulated with arms that have a benign distribution, like Gaussians with fixed standard deviation, only differing in expectation value. Unfortunately random sampling of a Chess tree is not like that at all. It is more like dealing with bandits where most of the contribution to the average comes from a few outcome with very small probablilty but huge payout. You really have no idea of the average until you have enough visits to representatively sample the unlikely events. Likewise, in Chess you will only get a meaningful value when you have enough visits to discover the tactics that define the main line, and start exploiting that.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jan 17, 2018 4:55 pm

Usually the k-armed-bandit problem is formulated with arms that have a benign distribution, like Gaussians with fixed standard deviation, only differing in expectation value. Unfortunately random sampling of a Chess tree is not like that at all. It is more like dealing with bandits where most of the contribution to the average comes from a few outcome with very small probablilty but huge payout. You really have no idea of the average until you have enough visits to representatively sample the unlikely events. Likewise, in Chess you will only get a meaningful value when you have enough visits to discover the tactics that define the main line, and start exploiting that.

That is why I do not use averaging as a backup operator for MCTS in chess -- I return the minimax value of the best bandit instead. This gives a more stable search and i think is better tactically as well. A typical example is a position where you are threatening a mate-in-1 but the opponent has one or two escape move that keeps the score at 0. With averaging as a backup operator, the engine would hugely prefer these position as it can get mate scores from all the other non-escape moves. Since Mate scores are very high , even one of them ruins the actual score. My engine with this backup operator exhibits a wildly varying score from move to move. However, with minimax as a backup operator, the engine will keep its score at 0, and the score remains stable like a standard alpha-beta search. Thus, the averaging backup operator is preferred as the better planning method for games dominated with strategy, and the minimax operator for tactical games like chess. Note that the tactical gain i get from using the minimax operator, though significant, is miniscule compared to that from expanding the tree rapidly every visit.
I have no idea how they removed the tactical blindness from A0 -- I am currently stuck at TSCP + 100 elo with the MCTS + qsearch approach.

Michel · Post by **Michel** » Wed Jan 17, 2018 6:00 pm

Daniel Shawul wrote:
Usually the k-armed-bandit problem is formulated with arms that have a benign distribution, like Gaussians with fixed standard deviation, only differing in expectation value. Unfortunately random sampling of a Chess tree is not like that at all. It is more like dealing with bandits where most of the contribution to the average comes from a few outcome with very small probablilty but huge payout. You really have no idea of the average until you have enough visits to representatively sample the unlikely events. Likewise, in Chess you will only get a meaningful value when you have enough visits to discover the tactics that define the main line, and start exploiting that.
That is why I do not use averaging as a backup operator for MCTS in chess -- I return the minimax value of the best bandit instead. This gives a more stable search and i think is better tactically as well. A typical example is a position where you are threatening a mate-in-1 but the opponent has one or two escape move that keeps the score at 0. With averaging as a backup operator, the engine would hugely prefer these position as it can get mate scores from all the other non-escape moves. Since Mate scores are very high , even one of them ruins the actual score. My engine with this backup operator exhibits a wildly varying score from move to move. However, with minimax as a backup operator, the engine will keep its score at 0, and the score remains stable like a standard alpha-beta search. Thus, the averaging backup operator is preferred as the better planning method for games dominated with strategy, and the minimax operator for tactical games like chess. Note that the tactical gain i get from using the minimax operator, though significant, is miniscule compared to that from expanding the tree rapidly every visit.
I have no idea how they removed the tactical blindness from A0 -- I am currently stuck at TSCP + 100 elo with the MCTS + qsearch approach.

You are obviously much more an expert than I. But I guess the idea is that an escapable mate threat should not be scored as zero (the minimax score) since it is still has a significant value as a threat (restricting the opponent), and moreover there could be nearby lines where the near mate is actually a mate.

BTW. I assume before applying the averaging operator you first convert scores to a winning probability? It seems obvious but I just wanted to make sure.

gladius · Post by **gladius** » Wed Jan 17, 2018 6:14 pm

Daniel Shawul wrote:Do you think an MCTS engine which can look ahead to a fixed depth seach at every ply ( say a 7-ply lookahead) can be come competitive with a selective alpha-beta engine that can lookahead 20-plies ahead from the root ? This is the only way I see how it can beat an alpha-beta engine as it seems to me that an MCTS engine can not become tactically stronger at the same rate as the alpha-beta engine. Sure it can begin to realize shallow tactics with more simulation time but the deeper tactics (4 to 7-ply) require a lot more to recongnize.

Here is an example game where I increased the number of simulations to p=15000 for lczero. It takes about 60 secs per move on my intel hd graphics card (i guess very slow). So many blunders in there, e.g. look at the ending where it fails to find a mate-in-2 tactics. Note that the only difference b/n ScorpioMCTS and Lczero is i use qsearch instead of the NN -- which I allows me to do far more playouts/sec than lczero and hence resolve more shallow tactics by growing the tree faster. So far the policy NN is not helping in resolving tactics

[pgn]
[Event "Tour10"]
[Site "daniel-Satellite-C855"]
[Date "2018.01.17"]
[Round "1"]
[White "ScorpioMCTS"]
[Black "Lczero"]
[Result "1-0"]
[BlackElo "2200"]
[Time "03:11:16"]
[WhiteElo "2200"]
[TimeControl "0+60"]
[SetUp "1"]
[FEN "rq2kb1r/p2p1ppp/1pb1pn2/8/2P1P3/P1N5/1PQ1BPPP/R1B1K2R w KQkq - 0 1"]
[Termination "normal"]
[PlyCount "19"]
[WhiteType "program"]
[BlackType "program"]

1. Bg5 {(Bc1-g5 Qb8-e5 f2-f4 Qe5-d4 Ra1-d1 Qd4-e3 Qc2-d3 Qe3xd3 Be2xd3
Ra8-c8 Rh1-f1 h7-h6 Bg5xf6 g7xf6 Nc3-b5 Bc6xb5 c4xb5 Rh8-g8 g2-g3 Bf8-c5
Bd3-e2 Ke8-e7 Rd1-d3 e6-e5 f4xe5 f6xe5 Be2-h5 f7-f6 Rd3-f3 Rg8-g5 g3-g4
Rc8-f8 Rf3-c3 Bc5-d4 Rc3-c2 Rg5-g8 Ke1-e2 Rf8-c8 Rf1-c1 Rc8xc2 Rc1xc2
Ke7-d6 Ke2-f3 Kd6-e6 h2-h3 Ke6-e7 Kf3-g3 Rg8-g5 Kg3-f3 Rg5-g8) -0.11/50 60}
Bxa3 2. Rxa3 {(Ra1xa3 Qb8-e5 Bg5xf6 Qe5xf6 Ke1-g1 Ke8-g8 Rf1-a1 a7-a5
Be2-f3 Qf6-d8 Nc3-e2 Qd8-c7 Ne2-d4 Ra8-c8 Ra1-d1 f7-f5 Nd4xc6 d7xc6 c4-c5
b6xc5 Qc2xc5 Rc8-b8 Qc5-c4 Qc7-e5 e4xf5 Qe5xf5 Qc4-c3 Rf8-d8 Rd1-e1 Qf5-f6
Ra3xa5 Qf6xc3 b2xc3 Rb8-b2 Bf3xc6 Rd8-d2 Ra5-a8) -0.56/37 60} h6 3. Bxf6
{(Bg5xf6 g7xf6 Be2-f3 Qb8-f4 Nc3-e2 Qf4-e5 Ke1-g1 Ke8-g8 Rf1-d1 d7-d6
Ne2-d4 Rf8-c8 Nd4xc6 Rc8xc6 Bf3-e2 Kg8-h8 h2-h4 a7-a6 Rd1-a1 Rc6-c8 Ra3xa6
Ra8xa6 Ra1xa6) -0.60/23 60} gxf6 4. Bf3 {(Be2-f3 Qb8-f4 Nc3-e2 Qf4-e5
Ke1-g1 Ke8-g8 Rf1-d1 a7-a6 Ne2-d4 Rf8-d8 Qc2-d2 Qe5-g5 Qd2-b4 Qg5-c5 Qb4-c3
Bc6-b7 Qc3-d2 Qc5-g5 Qd2-b4 Qg5-c5 Qb4xc5 b6xc5 Nd4-e2) -0.59/23 60} a5 5.
O-O {(Ke1-g1 Qb8-f4 Nc3-e2 Qf4-h4 Rf1-d1 Rh8-g8 Ne2-g3 Rg8-g5 Qc2-c3 Rg5-c5
Qc3-d3 Ke8-f8 Ra3-b3 Bc6-a4 Qd3-d6 Kf8-g8 Rb3xb6 Ba4xd1 Qd6xc5 Bd1xf3 g2xf3
Qh4-g5 Qc5xg5 f6xg5) -0.61/24 60} O-O 6. Qd2 {(Qc2-d2 Kg8-h7 Qd2-e3 Qb8-c7
Ra3-b3 Ra8-b8 Rf1-d1 Rf8-g8 Bf3-h5 d7-d6 Qe3-f4 Rg8-g5 Bh5-f3 a5-a4 Rb3-b4
Kh7-g8 Qf4xf6) -0.65/17 60} e5 7. Qxh6 {(Qd2xh6 f6-f5 e4xf5 f7-f6 Bf3-d5
Bc6xd5 Qh6-g6 Kg8-h8 Nc3xd5 Rf8-f7 Ra3-h3) -0.96/11 60} b5 8. Bd1 {(Bf3-d1
Qb8-b6 Nc3-d5 Qb6xf2 Kg1xf2 b5xc4 Nd5xf6) -2.40/7 60} bxc4 9. Nd5 {(Nc3-d5
Bc6-b5 Nd5xf6) -10.00/3 60} Bxd5 10. Rg3# {(Ra3-g3) -10.00/1 59} 1-0
[/pgn]

Agreed, the policy network is still not anywhere close to resolving significant tactics. My current thinking is this is due to the training on SF games. It hasn't had a chance to experience bad moves, and so hasn't learned to refute them.

For your setup though, the CPU version would probably be far faster (just comment out #define USE_OPENCL in config.h). Those Intel GPUs are really bad for NN evals.

Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero

Re: Announcing lczero