Can the sardine! NNUE clobbers SF.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, chrisw, Rebel

chrisw
Posts: 4556
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Laskos wrote: Mon Jul 20, 2020 9:40 am
chrisw wrote: Mon Jul 20, 2020 9:09 am
cdani wrote: Mon Jul 20, 2020 7:45 am
chrisw wrote: Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.

Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.

Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
No, SF_dev was Contempt=0. I have no idea what contempt the NNUE SF has.
With another 1000 data points (MikeB's 10000 games), median game length of that batch is 110 ply, plus maybe another 10 for the moves to FEN starting position, call it 120 ply median. Indicating again, a lot of long drawn out games going to endings and not too much in the the way of opening or middle game fireworks.

If I contrast to my maniac (which is a lot weaker of course), median game length against ELO matching opposition is 89 ply mine to win, 111 ply the opponent to win, with a lot of early fireworks. I guess it does depend what the goal is, but NNUE, so far, training on SF-alikes doesn't look like it makes for exciting chess.

Cdani, I'm sure you're correct that training with different net architecture (away from the one-hot sparse piece encode) might change the engine-style, but the problem there will be how to get the NNUE advantage of only computing the changes in the input layer with a more complex input layer, or more complexity in the higher layers.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: Can the sardine! NNUE clobbers SF.

Post by Raphexon »

NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.

Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
smatovic
Posts: 2991
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Can the sardine! NNUE clobbers SF.

Post by smatovic »

chrisw wrote: Mon Jul 20, 2020 10:58 am
Laskos wrote: Mon Jul 20, 2020 9:40 am
chrisw wrote: Mon Jul 20, 2020 9:09 am
cdani wrote: Mon Jul 20, 2020 7:45 am
chrisw wrote: Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.

Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.

Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
No, SF_dev was Contempt=0. I have no idea what contempt the NNUE SF has.
With another 1000 data points (MikeB's 10000 games), median game length of that batch is 110 ply, plus maybe another 10 for the moves to FEN starting position, call it 120 ply median. Indicating again, a lot of long drawn out games going to endings and not too much in the the way of opening or middle game fireworks.

If I contrast to my maniac (which is a lot weaker of course), median game length against ELO matching opposition is 89 ply mine to win, 111 ply the opponent to win, with a lot of early fireworks. I guess it does depend what the goal is, but NNUE, so far, training on SF-alikes doesn't look like it makes for exciting chess.

Cdani, I'm sure you're correct that training with different net architecture (away from the one-hot sparse piece encode) might change the engine-style, but the problem there will be how to get the NNUE advantage of only computing the changes in the input layer with a more complex input layer, or more complexity in the higher layers.
Despite the point of the net structure you have to consider A0/LC0 Reinforcement Learning vs. NNUE Supervised Learning. I am pretty sure with NNUE RL you will see more fireworks vs SF handcrafted eval.

--
Srdja
FormazChar
Posts: 7
Joined: Sat Apr 11, 2020 11:32 am
Full name: Mikael Johnsson

Re: Can the sardine! NNUE clobbers SF.

Post by FormazChar »

Yeah, I think it is importaint to remember how early this is and how primitive the net training is so far.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

cdani wrote: Mon Jul 20, 2020 7:45 am
chrisw wrote: Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
I am getting on my positional test suites:

4 threads
1s/position

Code: Select all

Openings_200

NNUE GK
121/200

SF_dev
126/200

Code: Select all

Midgames_236

NNUE GK
163/236

SF_dev
146/236
I don't have much endgame suites, but positionally the result seems to confirm a lesser accent on openings, unlike Lc0 zero nets.
chrisw
Posts: 4556
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Raphexon wrote: Mon Jul 20, 2020 12:10 pm NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.

Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
Good point. So to find out, we would would have to map a NNEU outputs scores and look for a positive skew.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

chrisw wrote: Mon Jul 20, 2020 4:05 pm
Raphexon wrote: Mon Jul 20, 2020 12:10 pm NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.

Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
Good point. So to find out, we would would have to map a NNEU outputs scores and look for a positive skew.
Tried on opening position, no visible skew, only possible a very small one.
chrisw
Posts: 4556
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Raphexon wrote: Mon Jul 20, 2020 12:10 pm NNUE has no contempt.
Contempt is located inside eval, and since it doesn't use SF's eval it doesn't have contempt.

Maybe it has some learned contempt from SF games, but I don't think contempt with depth 8 and depth 12 games is too gamechanging.
Depth 8 only has a few % drawrate either way, and way too many blunders happen for contempt to play a big role.
Ditto for depth 12 to a lesser degree.
Ha! I realised I got decoyed away from my point! Which was that SF-dev appeared to be playing as if it had some contempt set, because it seemed to be the one that stopped the usual long shuffling sequences and lost the draw, in several games. Anyway, Laskos says not so theory bites dust.
I didn’t really work out SF contempt algorithm (yet), it does something called “dynamic contempt”, I think at root, depending on something or other from the searches. Is it possible SF is self-adjusting over and above user settings?
Nay Lin Tun
Posts: 710
Joined: Mon Jan 16, 2012 6:34 am

Re: Can the sardine! NNUE clobbers SF.

Post by Nay Lin Tun »

@lasko, can you share your opening test suit?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

Nay Lin Tun wrote: Mon Jul 20, 2020 5:40 pm @lasko, can you share your opening test suit?

I drop here a link to diverse 3-mover opening suites, according to their imbalance (03 is 30cp).

http://s000.tinyupload.com/?file_id=057 ... 1119263171