Tested several NNUE nets with SF12 and Minic (depth=1) and it seems to me there is a lot a variation between nets, even between the last 7 Sergio nets.
What worries me about neural nets (also Lc0) is that it changes your engine playing style while you are not aware of it. Oh wait, that already happens when you only look at the cute-chess results without ever replaying a game
90% of coding is debugging, the other 10% is writing bugs.
Raphexon wrote: ↑Tue Sep 29, 2020 11:38 am
If anything that shows that you can just take a network, add some very minor modifications and pass the similarity test with ease.
Could be interesting to test Vondele's net since that's a Sergio net with the last layer SPSA tuned.
Maybe depth=1 similarity simply doesn't work as good with nnue...
There should be more tests with various depths and times too for getting a clearer picture regarding the simtest.
Raphexon wrote: ↑Tue Sep 29, 2020 11:38 am
If anything that shows that you can just take a network, add some very minor modifications and pass the similarity test with ease.
Could be interesting to test Vondele's net since that's a Sergio net with the last layer SPSA tuned.
Maybe depth=1 similarity simply doesn't work as good with nnue...
There should be more tests with various depths and times too for getting a clearer picture regarding the simtest.
What do you suggest?
While Fire 7.1 scores 78% at depth=1, it is below 60% at 100ms.
It isn't simple.
90% of coding is debugging, the other 10% is writing bugs.
Raphexon wrote: ↑Tue Sep 29, 2020 11:38 am
If anything that shows that you can just take a network, add some very minor modifications and pass the similarity test with ease.
Could be interesting to test Vondele's net since that's a Sergio net with the last layer SPSA tuned.
Maybe depth=1 similarity simply doesn't work as good with nnue...
There should be more tests with various depths and times too for getting a clearer picture regarding the simtest.
What do you suggest?
While Fire 7.1 scores 78% at depth=1, it is below 60% at 100ms.
It isn't simple.
? I did not know Fire 7.1 is nnue...
We are in a nnue thread, so what has Fire to do here?
Ed, please read again what I wrote, especially sentence one, which is the base for sentence two.
Raphexon wrote: ↑Tue Sep 29, 2020 11:38 am
If anything that shows that you can just take a network, add some very minor modifications and pass the similarity test with ease.
Could be interesting to test Vondele's net since that's a Sergio net with the last layer SPSA tuned.
Maybe depth=1 similarity simply doesn't work as good with nnue...
There should be more tests with various depths and times too for getting a clearer picture regarding the simtest.
What do you suggest?
While Fire 7.1 scores 78% at depth=1, it is below 60% at 100ms.
It isn't simple.
? I did not know Fire 7.1 is nnue...
We are in a nnue thread, so what has Fire to do here?
Ed, please read again what I wrote, especially sentence one, which is the base for sentence two.
Ok, more clear, when search moves in (note the Fire example) the same will happen with NNUE, huge swings in similarity and the longer the time control the more similar engines become. But I already made a start with 100ms, 250, 500, 1000ms and maybe even 4000ms to see if my prediction for NNUE is also true.
90% of coding is debugging, the other 10% is writing bugs.
Raphexon wrote: ↑Tue Sep 29, 2020 11:38 am
If anything that shows that you can just take a network, add some very minor modifications and pass the similarity test with ease.
Could be interesting to test Vondele's net since that's a Sergio net with the last layer SPSA tuned.
Maybe depth=1 similarity simply doesn't work as good with nnue...
There should be more tests with various depths and times too for getting a clearer picture regarding the simtest.
What do you suggest?
While Fire 7.1 scores 78% at depth=1, it is below 60% at 100ms.
It isn't simple.
? I did not know Fire 7.1 is nnue...
We are in a nnue thread, so what has Fire to do here?
Ed, please read again what I wrote, especially sentence one, which is the base for sentence two.
Ok, more clear, when search moves in (note the Fire example) the same will happen with NNUE, huge swings in similarity and the longer the time control the more similar engines become. But I already made a start with 100ms, 250, 500, 1000ms and maybe even 4000ms to see if my prediction for NNUE is also true.
I wouldn't even test so long tcs, just a few clock cycles (e.g. 16ms rounded up) so 20 40 50 and what ever is slightly above N= X/16
and depths 2-12 or so. (not every depth needed)
All tested engines in this report are of the alpha-beta type, so our proposed baseline is an alpha-beta baseline. When we test as many neural net engines as possible for our next report, we may well discover a different baseline figure for move variance, since neural net engines anecdotally evaluate positions differently to alpha-beta handcrafted evaluation functions.
To police CPU NN origins, you'll need to lower the thresholds. From the user perspective, though, I'm just happy to see the bigger variety regardless of the baseline.