SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
pohl4711
Posts: 2434
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by pohl4711 »

Testrun Testrun of StockFiNN 0.2 (Stockfish nnue with bigger net by Josh) finished (Slow Mover was set to 60, because this was the clearly strongest setting in my pre-tests)).
NN-testrun of Lc0 0.26.0 J90-40 (new 30x384 net by jhorthos) finished.

https://www.sp-cc.de

(Perhaps you have to clear your browsercache or reload the website)
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by lkaufman »

pohl4711 wrote: Sat Jul 11, 2020 12:39 pm Testrun Testrun of StockFiNN 0.2 (Stockfish nnue with bigger net by Josh) finished (Slow Mover was set to 60, because this was the clearly strongest setting in my pre-tests)).
NN-testrun of Lc0 0.26.0 J90-40 (new 30x384 net by jhorthos) finished.

https://www.sp-cc.de

(Perhaps you have to clear your browsercache or reload the website)
Since your results for SFnn are about a hundred elo worse than results I and others have gotten in direct runs vs. SF, does your data suggest that it performs much worse vs. unrelated engines than it does against SF? That is the only obvious explanation I can think of.
Komodo rules!
User avatar
pohl4711
Posts: 2434
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by pohl4711 »

lkaufman wrote: Sat Jul 11, 2020 9:07 pm
pohl4711 wrote: Sat Jul 11, 2020 12:39 pm Testrun Testrun of StockFiNN 0.2 (Stockfish nnue with bigger net by Josh) finished (Slow Mover was set to 60, because this was the clearly strongest setting in my pre-tests)).
NN-testrun of Lc0 0.26.0 J90-40 (new 30x384 net by jhorthos) finished.

https://www.sp-cc.de

(Perhaps you have to clear your browsercache or reload the website)
Since your results for SFnn are about a hundred elo worse than results I and others have gotten in direct runs vs. SF, does your data suggest that it performs much worse vs. unrelated engines than it does against SF? That is the only obvious explanation I can think of.
All I can say is, that StockFiNN 0.2 played all 5000 games without crashes or timelosses. So, the result is valid IMHO.
And I think, the result is not bad. The project (net and binary) is only 2 or 3 weeks old and because of this, a rating between Stockfish 9 and 10 is quite impressive.
And all my testings of Stockfish-devs vs. unrelated engines are based on my belief, that selfplay-testings are not as valid as testings vs. unrelated engines. Perhaps we have a proof of this belief here in this result of StockFiNN...
This will definitly not be the last testrun of Sf nnue by me. We should wait and see, what happens in the future...
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by lkaufman »

pohl4711 wrote: Sat Jul 11, 2020 9:27 pm
lkaufman wrote: Sat Jul 11, 2020 9:07 pm
pohl4711 wrote: Sat Jul 11, 2020 12:39 pm Testrun Testrun of StockFiNN 0.2 (Stockfish nnue with bigger net by Josh) finished (Slow Mover was set to 60, because this was the clearly strongest setting in my pre-tests)).
NN-testrun of Lc0 0.26.0 J90-40 (new 30x384 net by jhorthos) finished.

https://www.sp-cc.de

(Perhaps you have to clear your browsercache or reload the website)
Since your results for SFnn are about a hundred elo worse than results I and others have gotten in direct runs vs. SF, does your data suggest that it performs much worse vs. unrelated engines than it does against SF? That is the only obvious explanation I can think of.
All I can say is, that StockFiNN 0.2 played all 5000 games without crashes or timelosses. So, the result is valid IMHO.
And I think, the result is not bad. The project (net and binary) is only 2 or 3 weeks old and because of this, a rating between Stockfish 9 and 10 is quite impressive.
And all my testings of Stockfish-devs vs. unrelated engines are based on my belief, that selfplay-testings are not as valid as testings vs. unrelated engines. Perhaps we have a proof of this belief here in this result of StockFiNN...
This will definitly not be the last testrun of Sf nnue by me. We should wait and see, what happens in the future...
While I agree that selfplay-testing is less valid than unrelated testing, SFNN vs normal SF is hardly selfplay; they do share a common search, but the evals are drastically different, perhaps more different than stockfish is from other engines you test it against. Another possible explanation is that the relatively high 180 to 1 base to increment ratio that you use, while fine for normal engines, may be a problem if the SF time management is sufficiently bad for SFnn. I'm starting a test vs latest SF with your time control to see if this is the problem.
Komodo rules!
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by MMarco »

The built and the CPU used can also have a large impact on the results I've seen on discord.

EDIT: The ratio of nps can vary from about 80% (newer zen2 cores with avx2) to as low as 30-40% for older cpu. That can explain quite a large amount of elo gap.

I would be interested to know what SFiNN bench on your computer relative to Stockfish, with say "bench 512 1 3000 default movetime" as it would reflects your match conditions.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by lkaufman »

pohl4711 wrote: Sat Jul 11, 2020 9:27 pm
lkaufman wrote: Sat Jul 11, 2020 9:07 pm
pohl4711 wrote: Sat Jul 11, 2020 12:39 pm Testrun Testrun of StockFiNN 0.2 (Stockfish nnue with bigger net by Josh) finished (Slow Mover was set to 60, because this was the clearly strongest setting in my pre-tests)).
NN-testrun of Lc0 0.26.0 J90-40 (new 30x384 net by jhorthos) finished.

https://www.sp-cc.de

(Perhaps you have to clear your browsercache or reload the website)
Since your results for SFnn are about a hundred elo worse than results I and others have gotten in direct runs vs. SF, does your data suggest that it performs much worse vs. unrelated engines than it does against SF? That is the only obvious explanation I can think of.
All I can say is, that StockFiNN 0.2 played all 5000 games without crashes or timelosses. So, the result is valid IMHO.
And I think, the result is not bad. The project (net and binary) is only 2 or 3 weeks old and because of this, a rating between Stockfish 9 and 10 is quite impressive.
And all my testings of Stockfish-devs vs. unrelated engines are based on my belief, that selfplay-testings are not as valid as testings vs. unrelated engines. Perhaps we have a proof of this belief here in this result of StockFiNN...
This will definitly not be the last testrun of Sf nnue by me. We should wait and see, what happens in the future...
It appears that the time control was not the key difference. At your 3' + 1" TC, I got a +15 elo result in 192 games for SFnn2 vs Stockfish dev July 6. Now I'm running SFnn2 vs Komodo 14 at same to see if the opponent matters much; I expect Komodo to lose badly but your results suggest otherwise, so we'll see.
Komodo rules!
User avatar
pohl4711
Posts: 2434
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by pohl4711 »

MMarco wrote: Sun Jul 12, 2020 2:07 am The built and the CPU used can also have a large impact on the results I've seen on discord.

EDIT: The ratio of nps can vary from about 80% (newer zen2 cores with avx2) to as low as 30-40% for older cpu. That can explain quite a large amount of elo gap.

I would be interested to know what SFiNN bench on your computer relative to Stockfish, with say "bench 512 1 3000 default movetime" as it would reflects your match conditions.
I used the Intel-compile of StockfFiNN by josh, because the test was done on an intel-CPU. Classical Stockfish (bmi2 compile) is around 2.2x faster on my machine (i7-6700HQ 2.6GHz Notebook (Skylake CPU)), than this compile. This a pretty good value, because the net of StockFiNN is bigger, than other nets and because of this, the engine runs slower than Stockfish nnue, using the smaller nets.
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by MMarco »

2.2x faster means SFiNN gets about 45-46% of Stockfish speed. Now I'm curious about what Larry's number are.

On my comp (ryzen 7), using jjosh binaries I get around 40% of Stockfish _11_modern speed.

This is 3m+1s results against Lc0 running on my GTX 1660 Ti (Lc0 bench gives around 42 000 nps) and network 703350 (MLH off). TCEC 18 openings.

Code: Select all

   # PLAYER               :  RATING  ERROR  PLAYED    (%)   CFS    W    D    L   D(%)
   1 Stockfish 11 4CPU    :     0.0   37.7     100  51.00    64   30   42   28  42.00
   2 Lc0 26.0 + 703350    :    -8.6   23.3     200  53.75   100   66   83   51  41.50
   3 StockFiNN02 4CPU     :   -82.4   40.8     100  41.50   ---   21   41   38  41.00

White advantage = 162.33 +/- 21.69
Draw rate (equal opponents) = 51.95 % +/- 4.07
My sample is very small compared to yours, but SFiNN 0.2 did about -80 elo compared to SF 11. It is consistent with what you get. That is why I'm wonderind about the cpu's and SFiNN speed.

I also ran a test with nodestime, doubling the nodes budget to SFiNN 0.2 to see the impact going from 40% of SF 11 speed to 80%. This time the test was on 1 CPU and the opening set was World Ch. games from Smyslov, Botvinnik and Tal (from Noomen test sets). SF 11 gets around 2000k nps in Stockfish bench at 1000ms so these games were run much faster though.

Here are the results:

Code: Select all

   # PLAYER               :  RATING  ERROR  PLAYED    (%)   CFS    W    D    L   D(%)
   1 StockFiNN02  1600k   :    27.7   19.9     222  53.83    97   48  143   31  64.41
   2 Stockfish 11 2000k   :     0.0   14.2     444  52.14   100   95  273   76  61.49
   3 StockFiNN02   800k   :   -59.1   19.6     222  41.89   ---   28  130   64  58.56

White advantage = 60.80 +/- 10.23
Draw rate (equal opponents) = 65.51 % +/- 2.45
So going from 40% to 80% of SF 11 speed (or from my Ryzen 7 to a Zen2 core) would yield around 85-90 elo here.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by lkaufman »

MMarco wrote: Sun Jul 12, 2020 4:10 pm 2.2x faster means SFiNN gets about 45-46% of Stockfish speed. Now I'm curious about what Larry's number are.

On my comp (ryzen 7), using jjosh binaries I get around 40% of Stockfish _11_modern speed.

This is 3m+1s results against Lc0 running on my GTX 1660 Ti (Lc0 bench gives around 42 000 nps) and network 703350 (MLH off). TCEC 18 openings.

Code: Select all

   # PLAYER               :  RATING  ERROR  PLAYED    (%)   CFS    W    D    L   D(%)
   1 Stockfish 11 4CPU    :     0.0   37.7     100  51.00    64   30   42   28  42.00
   2 Lc0 26.0 + 703350    :    -8.6   23.3     200  53.75   100   66   83   51  41.50
   3 StockFiNN02 4CPU     :   -82.4   40.8     100  41.50   ---   21   41   38  41.00

White advantage = 162.33 +/- 21.69
Draw rate (equal opponents) = 51.95 % +/- 4.07
My sample is very small compared to yours, but SFiNN 0.2 did about -80 elo compared to SF 11. It is consistent with what you get. That is why I'm wonderind about the cpu's and SFiNN speed.

I also ran a test with nodestime, doubling the nodes budget to SFiNN 0.2 to see the impact going from 40% of SF 11 speed to 80%. This time the test was on 1 CPU and the opening set was World Ch. games from Smyslov, Botvinnik and Tal (from Noomen test sets). SF 11 gets around 2000k nps in Stockfish bench at 1000ms so these games were run much faster though.

Here are the results:

Code: Select all

   # PLAYER               :  RATING  ERROR  PLAYED    (%)   CFS    W    D    L   D(%)
   1 StockFiNN02  1600k   :    27.7   19.9     222  53.83    97   48  143   31  64.41
   2 Stockfish 11 2000k   :     0.0   14.2     444  52.14   100   95  273   76  61.49
   3 StockFiNN02   800k   :   -59.1   19.6     222  41.89   ---   28  130   64  58.56

White advantage = 60.80 +/- 10.23
Draw rate (equal opponents) = 65.51 % +/- 2.45
So going from 40% to 80% of SF 11 speed (or from my Ryzen 7 to a Zen2 core) would yield around 85-90 elo here.
The bench test didn't work when I tried it in SFNN2, only in normal SF, but anyway based on NPS numbers shown by the GUI (Little Blitzer), SFnn2 gets 60% (almost exactly) of the nps of SF11bmi2 on my 4.9 GMz 8 core i7. I got a crushing 70% score for SFnn vs K14 in 500 games at 3' + 1" on one thread, hert lowdraw book, right in line with what I would expect for an engine that beat latest SF by about 15 elo after 192 games and is leading SF11 by 36 elo so far after 50 games. These results seem inconsistent with yours, but perhaps if SFnn2 scales well my possibly faster cpu is a partial explanation, and the opening books are relevant. In particular, the use of TCEC 18 openings probably favors the conventional SF over the NN version since the positions are very different from the normal ones used for training.
Komodo rules!
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: SPCC: Testruns of StockFiNN 0.2 and Lc0 J90-40 finished

Post by jjoshua2 »

lkaufman wrote: Sun Jul 12, 2020 8:12 pm
The bench test didn't work when I tried it in SFNN2,
Simply type isready and it will load net. Then bench command will work. So you cannot just run sf.exe bench outside, but rather load the exe, then the net, then run bench. Single threaded bench I got 82% of SF-dev speed, but 16 thread it is down to 55%. AVX2 instructions have a tendency to overheat and lower clocks, so if you test on a laptop this will be a much bigger deal than my well cooled PC, but it is intrinsic to design and chip TDP limits even in best case scenarios. (Pure AVX2 loads are the hottest load possible besides AVX512 on select xeons)