chrisw wrote: ↑Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.
Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.
Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
No, SF_dev was Contempt=0. I have no idea what contempt the NNUE SF has.
With another 1000 data points (MikeB's 10000 games), median game length of that batch is 110 ply, plus maybe another 10 for the moves to FEN starting position, call it 120 ply median. Indicating again, a lot of long drawn out games going to endings and not too much in the the way of opening or middle game fireworks.
If I contrast to my maniac (which is a lot weaker of course), median game length against ELO matching opposition is 89 ply mine to win, 111 ply the opponent to win, with a lot of early fireworks. I guess it does depend what the goal is, but NNUE, so far, training on SF-alikes doesn't look like it makes for exciting chess.
Cdani, I'm sure you're correct that training with different net architecture (away from the one-hot sparse piece encode) might change the engine-style, but the problem there will be how to get the NNUE advantage of only computing the changes in the input layer with a more complex input layer, or more complexity in the higher layers.
NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.
Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
chrisw wrote: ↑Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.
Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.
Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
No, SF_dev was Contempt=0. I have no idea what contempt the NNUE SF has.
With another 1000 data points (MikeB's 10000 games), median game length of that batch is 110 ply, plus maybe another 10 for the moves to FEN starting position, call it 120 ply median. Indicating again, a lot of long drawn out games going to endings and not too much in the the way of opening or middle game fireworks.
If I contrast to my maniac (which is a lot weaker of course), median game length against ELO matching opposition is 89 ply mine to win, 111 ply the opponent to win, with a lot of early fireworks. I guess it does depend what the goal is, but NNUE, so far, training on SF-alikes doesn't look like it makes for exciting chess.
Cdani, I'm sure you're correct that training with different net architecture (away from the one-hot sparse piece encode) might change the engine-style, but the problem there will be how to get the NNUE advantage of only computing the changes in the input layer with a more complex input layer, or more complexity in the higher layers.
Despite the point of the net structure you have to consider A0/LC0 Reinforcement Learning vs. NNUE Supervised Learning. I am pretty sure with NNUE RL you will see more fireworks vs SF handcrafted eval.
chrisw wrote: ↑Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Raphexon wrote: ↑Mon Jul 20, 2020 12:10 pm
NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.
Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
Good point. So to find out, we would would have to map a NNEU outputs scores and look for a positive skew.
Raphexon wrote: ↑Mon Jul 20, 2020 12:10 pm
NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.
Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
Good point. So to find out, we would would have to map a NNEU outputs scores and look for a positive skew.
Tried on opening position, no visible skew, only possible a very small one.
Raphexon wrote: ↑Mon Jul 20, 2020 12:10 pm
NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.
Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
Ha! I realised I got decoyed away from my point! Which was that SF-dev appeared to be playing as if it had some contempt set, because it seemed to be the one that stopped the usual long shuffling sequences and lost the draw, in several games. Anyway, Laskos says not so theory bites dust.
I didn’t really work out SF contempt algorithm (yet), it does something called “dynamic contempt”, I think at root, depending on something or other from the searches. Is it possible SF is self-adjusting over and above user settings?