[D]3r1bk1/1p3ppp/2p2p2/2Pq4/1P1Pr3/3R1NP1/2Q2P1P/3R2K1 w - - 5 24
Here LC0 moved Ne5 on TCEC's 43-core hardware! Note that this blunder is probably not due to a bug as most other engines would have it, but that the algorithm is working as intended and can produce such tactical blunders even on this massive hardware.
Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
I suspect the averaging of scores is responsible for this blunder. When a position has a few good moves and the policy network fails to pick them, these things can happen.
LCzero sacs a knight for nothing
Moderators: hgm, Rebel, chrisw
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
-
- Posts: 2273
- Joined: Mon Sep 29, 2008 1:50 am
Re: LCzero sacs a knight for nothing
Duplicate. Cannot delete for some reason.
Last edited by Michel on Thu Apr 19, 2018 9:15 pm, edited 1 time in total.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2273
- Joined: Mon Sep 29, 2008 1:50 am
Re: LCzero sacs a knight for nothing
The purpose of the experiment it to find out... We are only at the beginning.Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 12562
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: LCzero sacs a knight for nothing
Since the new neural net setup (0.7) and participation of Google Colab, it has started to take off again.
In fact, the steep, linear, upward slope at this Elo level looks like exponential learning rate.
In fact, the steep, linear, upward slope at this Elo level looks like exponential learning rate.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: LCzero sacs a knight for nothing
That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.Michel wrote:The purpose of the experiment it to find out... We are only at the beginning.Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: LCzero sacs a knight for nothing
But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).Daniel Shawul wrote:That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.Michel wrote:The purpose of the experiment it to find out... We are only at the beginning.Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
Don't you think that the network can learn to predict tactics?
This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all .
-
- Posts: 2273
- Joined: Mon Sep 29, 2008 1:50 am
Re: LCzero sacs a knight for nothing
It is not really one ply, is it? LC0 did not see the rook check on e1, which to be honest I also had not seen at first sight.That is a one ply tactic right there!
I guess the policy head will have to learn about checks that detract a defender.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: LCzero sacs a knight for nothing
Gary, first of i hope you don't take my posts to be negative voice of the L0 project. Infact, I like it a lot so that we can prove once and for all how AlphaZero "did it" ..gladius wrote:But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).Daniel Shawul wrote:That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.Michel wrote:The purpose of the experiment it to find out... We are only at the beginning.Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
Don't you think that the network can learn to predict tactics?
This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all .
As I posted elesewhere, the policy network maybe able to identify things like "don't put you piece where it can be captured", or "move your piece away so that it won't be capturred". I don't see it solving precise tactics even at qsearch level. It can only learn those general rules...
Lets then assume it learned the above kind of rules and has a good policy network. The problem is that a trap, by definiton, is something that looks bad but will turn out to be good if searched to x-plies. So whethere the policy network is good or bad it is not going to help you much -- well it better be good atleast to look decent but a tactical engine will find its tactical weakness anyway. This is because its policy network rules are static unlike alphabeta engines who analyze these tactics dynamically!
Almost every game LC0 is missing some tactics in TCEC.
-
- Posts: 373
- Joined: Wed Mar 22, 2006 10:17 am
- Location: Novi Sad, Serbia
- Full name: Karlo Balla
Re: LCzero sacs a knight for nothing
1. Feed-forward NN - maybe, very shallow tacticsgladius wrote:
Don't you think that the network can learn to predict tactics?
2. Recurrent NN - one day perhaps, but not today, not tomorrow,...
Best Regards,
Karlo Balla Jr.
Karlo Balla Jr.
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: LCzero sacs a knight for nothing
Not at all - I think you've raised some very interesting points! MCTS averaging does seem fundamentally mismatched to Chess. That's why I was so amazed A0 actually worked.Daniel Shawul wrote:Gary, first of i hope you don't take my posts to be negative voice of the L0 project. Infact, I like it a lot so that we can prove once and for all how AlphaZero "did it" ..gladius wrote:But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).Daniel Shawul wrote:That is a one ply tactic right there! This tactical problem is not going to go away anytime soon ? Even if you train and train to cover for 5-ply tactics, then there will come the 8-ply ones and the 15-ply ones etc... I can not imagine how it would ever cover for the 15-ply trap for example. AlphaZero did it with a 4-TPUs that is like a 180x more hardware advantage than what Stockfish used; and then probably cherry-picking the results removing games where it makes such total blunders like L0 did. What makes it different from a massive acceleration of a very slow NN eval with a speciality hardware like DeepBlue did with FPGA.Michel wrote:The purpose of the experiment it to find out... We are only at the beginning.Are you telling me that this is not a problem for L0 or A0, and that it can be solved with bigger net and more training !?
Seriously. I think you should give LC0 some time. I know you have shown that MCTS is a disaster in Scorpio. But LC0 has a very different type of evaluation function.
Your statement about MCTS being a disaster in scorpio is so general and uninformed so I suggest you look at the results here here. The current MCTS version actually as good as the standard scorpio ... and not only on massive hardware but just 1 CPU core. You would need a time control of 1 year + 1 month to show that with LCZero. Cleary with more hardware it will start to perform better in tactics but stop this nonsense about the policy network solving tactics ...
Don't you think that the network can learn to predict tactics?
This is totally separate from the HW question as well. I don't really care about the HW issue unless it's not feasible to run it, and it is feasible and commonly available with GPUs, so why not play with a totally different way of doing things? This is what you do with your engines after all .
As I posted elesewhere, the policy network maybe able to identify things like "don't put you piece where it can be captured", or "move your piece away so that it won't be capturred". I don't see it solving precise tactics even at qsearch level. It can only learn those general rules...
Lets then assume it learned the above kind of rules and has a good policy network. The problem is that a trap, by definiton, is something that looks bad but will turn out to be good if searched to x-plies. So whethere the policy network is good or bad it is not going to help you much -- well it better be good atleast to look decent but a tactical engine will find its tactical weakness anyway. This is because its policy network rules are static unlike alphabeta engines who analyze these tactics dynamically!
Almost every game LC0 is missing some tactics in TCEC.
Once it gets getting a lot better, I think it will be pretty fascinating to do raw network evaluation of tactical positions and see how it does . At the very least, then we can start playing with additional inputs/structure and see if we can make it better.