Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.
Add to that that I used adjudication in Cutechess-Cli, !!!
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.
Add to that that I used adjudication in Cutechess-Cli, !!!
Yes, I was expecting and noticed anyway. Games were clearly being terminated.
I'm using this one, basically 'cos Ed wrote the batch files for me!
-draw movenumber=160 movecount=3 score=100 -resign movecount=5 score=500
opening book might play a part, I'm using Ed's 32000.pgn set at 12 plies with duplicates pre-removed. Duplicates are a constant problem, sneak in everywhere.
Ferdy, what's the matter with some of your matrix numbers? There are hardly any 2 strong engines showing less than 25% similarity at 100ms on one core. You matrix contains lots of below 25% values, especially with with SF and SF NNUE. The similarities range usually from percentages in 30s for very unrelated engines (see "Shredder 6" in my matrix) to percentages in 60s for very related engines (see SF_11 and SF_dev or 2 SF_dev). Your matrix numbers are simply weird and often way too low, therefore the clustering is maybe meaningless.
The original sim sends command to the engine via go depth 50 then sends stop and collect the bestmove returned. Mine is sending go movetime <time>, then wait for the engine to send its bestmove and collect it. Perhaps this could be the difference.
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.
Add to that that I used adjudication in Cutechess-Cli, !!!
Yes, I was expecting and noticed anyway. Games were clearly being terminated.
I'm using this one, basically 'cos Ed wrote the batch files for me!
-draw movenumber=160 movecount=3 score=100 -resign movecount=5 score=500
opening book might play a part, I'm using Ed's 32000.pgn set at 12 plies with duplicates pre-removed. Duplicates are a constant problem, sneak in everywhere.
I am often using EPDs of my 3-movers (6 plies) from a large human games (Elo 2200+) database. Different sorts of suites, often imbalanced opening suites. I am removing duplicates from EPD files using EPDTools.
Now I have 2 NNUE nets in the cluster, the FiNN02 384x2-32-32 net is even less related to anything compared to the KG 256x2-32-32 net. They seem to be close in strength. They aren't even closely related between themselves, 2 NNUE nets.
Laskos wrote: ↑Sun Jul 19, 2020 6:08 pm
Now I have 2 NNUE nets in the cluster, the FiNN02 384x2-32-32 net is even less related to anything compared to the KG 256x2-32-32 net. They seem to be close in strength. They aren't even closely related between themselves, 2 NNUE nets.
Isn't that awesome?
Totally different from anything else while also adding a ton of elo over the previous strongest (CPU) engine.
chrisw wrote: ↑Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
chrisw wrote: ↑Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.
Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.
Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
chrisw wrote: ↑Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.
Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.
Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
No, SF_dev was Contempt=0. I have no idea what contempt the NNUE SF has.
smatovic wrote: ↑Sun Jul 19, 2020 9:08 am
Maybe a depth 1 match between LC0 and NNUE will be useful, to get an idea of how the networks perform against each other, and of what importance the whole search is, or alike.
--
Srdja
No, the search is still SF. To depth=1 I compared SF NNUE to SF_dev, and SF NNUE is significantly stronger to depth=1:
depth=1
Score of SF_NNUE vs SF_dev: 655 - 265 - 80 [0.695] 1000
Elo difference: 143.1 +/- 22.3, LOS: 100.0 %, DrawRatio: 8.0 %
Finished match
So, the net eval helps a lot at depth=1.
As chess is more search related than evaluation related, NNUE loses part of its big static eval advantage as more games end due to search happenings. Also, it loses more being slower, as it goes a little less deep.
If someone tunes somehow static eval of regular Stockfish to imitate NNUE static eval, this will probably overcome NNUE current maybe 30 elo advantage and win more than this.
Also, I suppose there is something to be won tunning search parameters of NNUE to take advantage of the better static eval, so its able to visit less unneeded nodes.
Also probably some NN tuning of search parameters probably can net a nice gain.
All easier said than done, of course.