Page 1 of 11

LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 8:21 pm
by Daniel Shawul
[D]3r1bk1/1p3ppp/2p2p2/2Pq4/1P1Pr3/3R1NP1/2Q2P1P/3R2K1 w - - 5 24

Here LC0 moved Ne5 on TCEC's 43-core hardware! Note that this blunder is probably not due to a bug as most other engines would have it, but that the algorithm is working as intended and can produce such tactical blunders even on this massive hardware.

Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?

I suspect the averaging of scores is responsible for this blunder. When a position has a few good moves and the policy network fails to pick them, these things can happen.

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 9:11 pm
by Michel
Duplicate. Cannot delete for some reason.

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 9:13 pm
by Michel
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 9:33 pm
by Dann Corbit
Since the new neural net setup (0.7) and participation of Google Colab, it has started to take off again.

In fact, the steep, linear, upward slope at this Elo level looks like exponential learning rate.

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 9:34 pm
by Daniel Shawul
Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 9:40 pm
by gladius
Daniel Shawul wrote:
Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).

Don't you think that the network can learn to predict tactics?

This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all :).

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 9:41 pm
by Michel
That is a one ply tactic right there!
It is not really one ply, is it? LC0 did not see the rook check on e1, which to be honest I also had not seen at first sight.

I guess the policy head will have to learn about checks that detract a defender.

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 9:54 pm
by Daniel Shawul
gladius wrote:
Daniel Shawul wrote:
Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).

Don't you think that the network can learn to predict tactics?

This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all :).
Gary, first of i hope you don't take my posts to be negative voice of the L0 project. Infact, I like it a lot so that we can prove once and for all how AlphaZero "did it" ..

As I posted elesewhere, the policy network maybe able to identify things like "don't put you piece where it can be captured", or "move your piece away so that it won't be capturred". I don't see it solving precise tactics even at qsearch level. It can only learn those general rules...

Lets then assume it learned the above kind of rules and has a good policy network. The problem is that a trap, by definiton, is something that looks bad but will turn out to be good if searched to x-plies. So whethere the policy network is good or bad it is not going to help you much -- well it better be good atleast to look decent but a tactical engine will find its tactical weakness anyway. This is because its policy network rules are static unlike alphabeta engines who analyze these tactics dynamically!

Almost every game LC0 is missing some tactics in TCEC.

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 10:01 pm
by Karlo Bala
gladius wrote:
Don't you think that the network can learn to predict tactics?
1. Feed-forward NN - maybe, very shallow tactics
2. Recurrent NN - one day perhaps, but not today, not tomorrow,...

Re: LCzero sacs a knight for nothing

Posted: Thu Apr 19, 2018 10:39 pm
by gladius
Daniel Shawul wrote:
gladius wrote:
Daniel Shawul wrote:
Michel wrote:
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
The purpose of the experiment it to find out... We are only at the beginning.

Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.

Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).

Don't you think that the network can learn to predict tactics?

This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all :).
Gary, first of i hope you don't take my posts to be negative voice of the L0 project. Infact, I like it a lot so that we can prove once and for all how AlphaZero "did it" ..

As I posted elesewhere, the policy network maybe able to identify things like "don't put you piece where it can be captured", or "move your piece away so that it won't be capturred". I don't see it solving precise tactics even at qsearch level. It can only learn those general rules...

Lets then assume it learned the above kind of rules and has a good policy network. The problem is that a trap, by definiton, is something that looks bad but will turn out to be good if searched to x-plies. So whethere the policy network is good or bad it is not going to help you much -- well it better be good atleast to look decent but a tactical engine will find its tactical weakness anyway. This is because its policy network rules are static unlike alphabeta engines who analyze these tactics dynamically!

Almost every game LC0 is missing some tactics in TCEC.
Not at all - I think you've raised some very interesting points! MCTS averaging does seem fundamentally mismatched to Chess. That's why I was so amazed A0 actually worked.

Once it gets getting a lot better, I think it will be pretty fascinating to do raw network evaluation of tactical positions and see how it does :). At the very least, then we can start playing with additional inputs/structure and see if we can make it better.