Made a similar comparison from the data in ff2 folder with different metrics. The result is similar to table 4. As movetime increases similarity increases. This comparison also involved ff2 and sf12. ff2 is more similar to sf13 than sf12.
MinErr : The minimum absolute score difference between score1 and score2.
Or minimmum from all abs(score1 - score2)
MaxErr : The maximum absolute score difference between score1 and score2.
Or maximum from all abs(score1 - score2)
ErrSqStdev: The sample standard deviation of the error square.
Or sqrt(sum((errsq_i - mean) * (errsq_i - mean))/N-1)
where: N=Postried, errsq_i = score1-score2 @ posnum i
higher value means the errsq are more spread out and similarity is weaker.
RMS : The Root Mean Square or sqrt(Sum(error*error)/PosTried)
where: error = score1 - score2
PosTried : The number of positions that are actually compared.
When engine score is above 500 or below -500
that position is not included in the score comparison.
MvSimPct : Move similarity Percentage, if engine1 and engine2 moves are the same
count it as similar. Or (100 * num_similar_move/TotalPos)
Remarks:
1. nets are tested as much as possible with SF12 and SF13.
2. Nemorino and Orion 0.8 are the exceptions since they have a different file format.
3. The Orion 0.7 version has the exact SF12 net. 4. the folder ff2 contains the epd's of sf12, 13 and ff2 at 100ms, 250 and 500ms, made by someone else on a different pc.
5. epd's labelled with "sv" are the tested "sergio" nets.
6. epd's labelled with "sf" are Stockfish nets.
Under folder ff2:
FF2-100ms.epd is the output when ff2 engine uses the ff2 net?
Remarks:
1. nets are tested as much as possible with SF12 and SF13.
2. Nemorino and Orion 0.8 are the exceptions since they have a different file format.
3. The Orion 0.7 version has the exact SF12 net. 4. the folder ff2 contains the epd's of sf12, 13 and ff2 at 100ms, 250 and 500ms, made by someone else on a different pc.
5. epd's labelled with "sv" are the tested "sergio" nets.
6. epd's labelled with "sf" are Stockfish nets.
Under folder ff2:
FF2-100ms.epd is the output when ff2 engine uses the ff2 net?
Rebel wrote: ↑Wed Mar 10, 2021 9:29 pm
NNUE Research Project
March 10, 2021
It´s generally known by now similarity testing on moves does not work with NNUE nets. On this page we will try to research if it is not possible using other methods. One method is to calculate the Root-mean-square deviation (or RMS) of the scores instead of moves as after all NNUE is a set of scores. We will present data and the source code for discussion.
Let´s start at the beginning of NNUE in the summer of 2020 the starting point of the NNUE revolution when the Stockfish team implemented the Sergio nets. Our first goal is to measure the stability of the RMS of Stockfish NNUE nets. From the Sergio nets we calculate the RMS of the very first 3 nets (july) and the last 3 (september) and compare the RMS with the final SF12 net, see table one. In table two the nets between SF12 and SF13 are compared plus 5 nets after the release of SF13.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".