Ferdy, what's the matter with some of your matrix numbers? There are hardly any 2 strong engines showing less than 25% similarity at 100ms on one core. You matrix contains lots of below 25% values, especially with with SF and SF NNUE. The similarities range usually from percentages in 30s for very unrelated engines (see "Shredder 6" in my matrix) to percentages in 60s for very related engines (see SF_11 and SF_dev or 2 SF_dev). Your matrix numbers are simply weird and often way too low, therefore the clustering is maybe meaningless.
Is this any different than Rybka using millions of super fast games to tune the evaluation many years back? Obviously with GPU hardware this can be done dramatically more efficiently!
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev
The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev
The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev
The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
Thanks Chris for the assessment. The openings are 6-pliers.
From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.
Both correlation and distance methods for cluster give the same clustering shown here:
Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev
The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.