SFNNue test, 3m + 1s

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Dann Corbit, Harvey Williamson

MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

SFNNue test, 3m + 1s

Post by MMarco »

I ran a quick test at 3m + 1s with recent SFNNue binaries, and nets.

Options: SFNNue-384 net = FiNN 0.2 (from jjosh), SFNNue-256 net = GK 11-07 (from Gekkehenker).
SEE 4.1 Binaries from Pleomati (update from 110720), SFNNue-384 with Slow Mover = 60.

Match Conditions: 3min + 1s Ponder Off, Hash=64, 5-man syzygy, Ryzen-7 3750h.
Book: Noomen WchMatches: Botvinnik-Smyslov(1954,57,58), Tal-Botvinnik(1959,60), 111 positions.
Adjudication: 5-men TB, draw movenumber=50 movecount=5 score=5, resign movecount=5 score=800.
Bench 64 1 3000ms: Stockfish 11_modern=2249 knps, SFNNue-256 + GK 11-07=1212 knps (54%), SFNNue-384 + FiNN 0.2=975 knps(43%).

Comments: In my set-up, both SFNNues are slightly above Stockfish 11. SFNNue-384 + FiNN 0.2 did relatively better against Stockfish 11. The ratings happens to be very close to CEGT 3m + 1S Ponder On rating list for SF-11, K-14 and H-6.03.

Games: https://gofile.io/d/nQZFDl

Code: Select all

   # PLAYER                   :  RATING  ERROR  PLAYED   (%)   CFS    W    D    L  D(%)
   1 SFNNue-256 + GK 11-07    :    3497     15     666  62.0    73  203  420   43  63.1
   2 SFNNue-384 + FiNN 0.2    :    3489     14     666  61.0    57  195  422   49  63.4
   3 Stockfish 11             :    3487     14     444  49.2   100   44  349   51  78.6
   4 Komodo 14                :    3373     18     444  33.7    72   22  255  167  57.4
   5 Houdidit 6.03            :    3365     15     444  32.7   ---   26  238  180  53.6

White advantage = 37.18 +/- 5.98
Draw rate (equal opponents) = 72.90 % +/- 1.33
For comparison, CEGT rating list: http://www.cegt.net/3Plus1Rating/3plus1 ... liste.html

Code: Select all

1	Stockfish 11.0 x64	3486	11	11	2800	81.6%	3212	34.5%
2	Komodo 14.0 x64		3371	10	10	2800	69.1%	3221	50.6%
3	Houdini 6.0 x64		3369	10	10	2800	68.9%	3221	47.9%
Head to head statistics:

Code: Select all

1) SFNNue-256 + GK 11-07 3509.6 :    666 (+203,=420,-43),  62.0 %

   vs.                          :  games (   +,   =,  -),   (%) :    Diff,    SD, CFS (%)
   Stockfish 11                 :    222 (  21, 177, 24),  49.3 :    +9.6,  11.7,   79.5
   Komodo 14                    :    222 (  87, 128,  7),  68.0 :  +124.1,  13.0,  100.0
   Houdidit 6.03                :    222 (  95, 115, 12),  68.7 :  +132.2,  11.8,  100.0

2) SFNNue-384 + FiNN 0.2 3501.6 :    666 (+195,=422,-49),  61.0 %

   vs.                          :  games (   +,   =,  -),   (%) :    Diff,    SD, CFS (%)
   Stockfish 11                 :    222 (  30, 172, 20),  52.3 :    +1.6,   9.5,   56.5
   Komodo 14                    :    222 (  80, 127, 15),  64.6 :  +116.1,  12.9,  100.0
   Houdidit 6.03                :    222 (  85, 123, 14),  66.0 :  +124.2,  11.3,  100.0

3) Stockfish 11          3500.0 :    444 (+44,=349,-51),  49.2 %

   vs.                          :  games (  +,   =,  -),   (%) :    Diff,    SD, CFS (%)
   SFNNue-256 + GK 11-07        :    222 ( 24, 177, 21),  50.7 :    -9.6,  11.7,   20.5
   SFNNue-384 + FiNN 0.2        :    222 ( 20, 172, 30),  47.7 :    -1.6,   9.5,   43.5

4) Komodo 14             3385.5 :    444 (+22,=255,-167),  33.7 %

   vs.                          :  games (  +,   =,   -),   (%) :    Diff,    SD, CFS (%)
   SFNNue-256 + GK 11-07        :    222 (  7, 128,  87),  32.0 :  -124.1,  13.0,    0.0
   SFNNue-384 + FiNN 0.2        :    222 ( 15, 127,  80),  35.4 :  -116.1,  12.9,    0.0

5) Houdidit 6.03         3377.4 :    444 (+26,=238,-180),  32.7 %

   vs.                          :  games (  +,   =,   -),   (%) :    Diff,    SD, CFS (%)
   SFNNue-256 + GK 11-07        :    222 ( 12, 115,  95),  31.3 :  -132.2,  11.8,    0.0
   SFNNue-384 + FiNN 0.2        :    222 ( 14, 123,  85),  34.0 :  -124.2,  11.3,    0.0
EDIT: The Gekkehenker's net used is dated from 11-07.
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SFNNue test, 3m + 1s

Post by MMarco »

A quick inspection with SOMU reveals that SFNNue-384 + FiNN 0.2 played way faster than the other engines, so Slow Mover=60 was probably detrimental to its performance. Apparently, Slow Mover = 60 is no longer needed with 110720 binaries. I'll rerun the test with SM=100 (defaults) for SFNNue-384 + FiNN 0.2.

Code: Select all

Engine                 Depth       Time   Games   Moves  Average Forfeit   MIDG   EARLY    ENDG    LATE
Houdidit 6.03          22.00   27:20:53    444    29647    3.32     0     20.53 | 20.57 | 22.52 | 28.22
SFNNue-256 + GK 11-07  26.49   37:26:08    666    42120    3.20     0     24.25 | 24.08 | 26.68 | 38.90
Komodo 14              25.80   25:15:22    444    28488    3.19     0     23.85 | 24.75 | 25.97 | 33.13
Stockfish 11           28.01   22:21:14    444    25896    3.11     0     25.72 | 25.65 | 28.31 | 40.47
SFNNue-384 + FiNN 0.2  25.68   30:30:12    666    41721    2.63     0     22.54 | 22.49 | 26.58 | 37.28
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SFNNue test, 3m + 1s

Post by MMarco »

The tests are still ongoing, but it seems promising:

Code: Select all

   # PLAYER                        :  RATING  ERROR  PLAYED   (%)   CFS    W    D    L  D(%)
   1 SFNNue-384 + FiNN 0.2 SM100   :    3495     21     312  64.4    50  110  182   20  58.3
   2 SFNNue-256 + GK 27-06         :    3495     22     222  64.4    90   76  134   12  60.4
   3 SFNNue-256 + GK 12-07         :    3476     14     666  62.0    76  203  420   43  63.1
   4 SFNNue-384 + FiNN 0.2 SM60    :    3468     15     666  61.0    53  195  422   49  63.4
   5 Stockfish 11                  :    3467     13     622  48.4   100   64  474   84  76.2
   6 Komodo 14                     :    3350     15     622  32.7    71   27  353  242  56.8
   7 Houdidit 6.03                 :    3344     15     622  31.9   ---   33  331  258  53.2

White advantage = 44.20 +/- 4.38
Draw rate (equal opponents) = 72.94 % +/- 1.38
Both Gekkehenker(256) 27-06 net and jjosh(384) StockFiNN0.2 are now 25-30 elo above SF-11.
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SFNNue test, 3m + 1s

Post by MMarco »

I added a new entry: LizardFish by dkappe, trained on Komodo eval data. For now it played only against Stockfish 11, with a different binary (150720 from Nodchip). Althought more recent, these binaries happens to be 10% slower on my comp than the Pleomati's I was using up to now. I'll rerun these games too to see if LizardFish would gain some elo with Pleomati's binaries.

Code: Select all

3) LizardFish                 3508 :    222 (+32,=170,-20),  52.7 %

   vs.                              :  games (  +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   Stockfish 11                     :    222 ( 32, 170, 20),  52.7 :    +19,   12,   93.9
I adjusted again the scale of the rating list to be comparable to CEGT 3m+1s Ponder On rating list.

Code: Select all

   # PLAYER                        :  RATING  ERROR  PLAYED   (%)   CFS    W    D    L  D(%)
   1 SFNNue-384 + FiNN 0.2 SM100   :    3516     24    312   64.4    50  110  182   20  58.3
   2 SFNNue-256 + GK 27-06         :    3516     28    222   64.4    67   76  134   12  60.4
   3 SFNNue-256 + LizardFish       :    3508     24    222   52.7    75   32  170   20  76.6
   4 SFNNue-256 + GK 12-07         :    3497     16    666   62.0    76  203  420   43  63.1
   5 SFNNue-384 + FiNN 0.2 SM60    :    3489     15    666   61.0    54  195  422   49  63.4
   6 Stockfish 11                  :    3488     10    844   48.1   100   84  644  116  76.3
   7 Komodo 14                     :    3371     16    622   32.7    70   27  353  242  56.8
   8 Houdidit 6.03                 :    3365     17    622   31.9   ---   33  331  258  53.2

White advantage = 46.47 +/- 4.50
Draw rate (equal opponents) = 74.12 % +/- 1.20
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SFNNue test, 3m + 1s

Post by MMarco »

Well, LizardFish is in fact Gekkehenker's 12-07 net ( see viewtopic.php?f=2&t=74480&start=40#p852140 ). The good thing is that I had used a different binary (Nodchip 150720, here https://github.com/nodchip/Stockfish/re ... 2020-07-15 ) to test "LizardFish" and that these shown superior to Pleomati's (CF=76% but still) albeit slower. I'll use them from now on. I'll rerun StockFinn 0.2 and others using them, and (I hope) eventually post the results.

Code: Select all

1) LizardFish                 3506 :    666 (+206,=429,-31),  63.1 %

   vs.                              :  games (   +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   Stockfish 11                     :    222 (  32, 170, 20),  52.7 :    +19,   10,   96.5
   Komodo 14                        :    222 (  87, 130,  5),  68.5 :   +135,   11,  100.0
   Houdidit 6.03                    :    222 (  87, 129,  6),  68.2 :   +139,   12,  100.0

2) SFNNue-256 + GK 12-07      3497 :    666 (+203,=420,-43),  62.0 %

   vs.                              :  games (   +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   Stockfish 11                     :    222 (  21, 177, 24),  49.3 :    +10,    9,   85.6
   Komodo 14                        :    222 (  87, 128,  7),  68.0 :   +126,   11,  100.0
   Houdidit 6.03                    :    222 (  95, 115, 12),  68.7 :   +131,   10,  100.0
Uri Blass
Posts: 10102
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: SFNNue test, 3m + 1s

Post by Uri Blass »

Thanks for your tests but I see that latest developement versions of stockfish that do not use NN are not in the test.

Stockfish developement version is already 25 elo better than stockfish11 based on the last regression tests so the results of 32:20 with 170 draws for LizardFish do not convince me that NN help stockfish.

In the following test that is not of latest developement version I see 10361 wins for the developement version and only 5936 wins for stockfish11.
10361/5936>32/20
If NN help stockfish I think that we can expect better results against stockfish11.

https://tests.stockfishchess.org/tests/ ... 13834a975d
User avatar
Rebel
Posts: 6946
Joined: Thu Aug 18, 2011 12:04 pm

Re: SFNNue test, 3m + 1s

Post by Rebel »

Seems this round is for the AB engines.

Put your costly 2080 Ti on EBay :D
90% of coding is debugging, the other 10% is writing bugs.
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SFNNue test, 3m + 1s

Post by MMarco »

Uri Blass wrote: Fri Jul 17, 2020 7:57 am Thanks for your tests but I see that latest developement versions of stockfish that do not use NN are not in the test.

Stockfish developement version is already 25 elo better than stockfish11 based on the last regression tests so the results of 32:20 with 170 draws for LizardFish do not convince me that NN help stockfish.

In the following test that is not of latest developement version I see 10361 wins for the developement version and only 5936 wins for stockfish11.
10361/5936>32/20
If NN help stockfish I think that we can expect better results against stockfish11.

https://tests.stockfishchess.org/tests/ ... 13834a975d
I'm curious too to see wether SFNNue can beat SF dev. The last commit used in Nodchip 150720 binaries was "Introduce bad outpost penalty" ( see https://github.com/nodchip/Stockfish/pull/45/commits . When my test is over, I'll run the best of the SFnnue against SF dev with "Introduce bad outpost penalty" .
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SFNNue test, 3m + 1s

Post by MMarco »

I'll start a gauntlet of SF 110720 vs the SFNNues very soon as the tourney rerun with Nodchips 150720 compiles is almost over:

Code: Select all

   # PLAYER                   :  RATING  ERROR  PLAYED   (%)   CFS    W    D    L  D(%)
   1 SFNNue-256 + GK 27-06    :    3524     18    396   65.7    81  142  236   18  59.6
   2 SFNNue-384 + FiNN 0.2    :    3512     13    666   64.2    80  229  397   40  59.6
   3 SFNNue-256 + GK 12-07    :    3504     12    666   63.1    91  206  429   31  64.4
   4 Stockfish 11             :    3491     13    576   47.0   100   53  436   87  75.7
   5 Houdidit 6.03            :    3369     16    576   30.9    77   20  316  240  54.9
   6 Komodo 14                :    3359     17    576   29.7   ---   16  310  250  53.8

White advantage = 38.25 +/- 4.28
Draw rate (equal opponents) = 74.84 % +/- 1.38
MMarco
Posts: 195
Joined: Sun Apr 12, 2020 1:09 am
Full name: Marc-O Moisan-Plante

Re: SFNNue test, 3m + 1s

Post by MMarco »

Wow, it ended almost as a tie between the top two SFNNue! FiNN 0.2 scored half a point more in 666 games!

Code: Select all

   # PLAYER                   :  RATING  ERROR  PLAYED   (%)   CFS    W    D    L  D(%)
   1 SFNNue-384 + FiNN 0.2    :    3514     15     666  64.2    52  229  397   40  59.6
   2 SFNNue-256 + GK 27-06    :    3514     15     666  64.1    74  222  410   34  61.6
   3 SFNNue-256 + GK 12-07    :    3506     14     666  63.1    97  206  429   31  64.4
   4 Stockfish 11             :    3491     11     666  47.1   100   59  509   98  76.4
   5 Houdidit 6.03            :    3372     15     666  31.3    74   28  361  277  54.2
   6 Komodo 14                :    3363     17     666  30.2   ---   18  366  282  55.0

White advantage = 38.41 +/- 4.49
Draw rate (equal opponents) = 74.72 % +/- 1.20
Head-to-head statistics:

Code: Select all

1) SFNNue-384 + FiNN 0.2 3514 :    666 (+229,=397,-40),  64.2 %

   vs.                         :  games (   +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   Stockfish 11                :    222 (  35, 167, 20),  53.4 :    +24,   10,   99.2
   Houdidit 6.03               :    222 (  91, 119, 12),  67.8 :   +142,   11,  100.0
   Komodo 14                   :    222 ( 103, 111,  8),  71.4 :   +152,   11,  100.0

2) SFNNue-256 + GK 27-06 3514 :    666 (+222,=410,-34),  64.1 %

   vs.                         :  games (   +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   Stockfish 11                :    222 (  31, 172, 19),  52.7 :    +23,   10,   98.6
   Houdidit 6.03               :    222 (  99, 113, 10),  70.0 :   +142,   12,  100.0
   Komodo 14                   :    222 (  92, 125,  5),  69.6 :   +151,   11,  100.0

3) SFNNue-256 + GK 12-07 3506 :    666 (+206,=429,-31),  63.1 %

   vs.                         :  games (   +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   Stockfish 11                :    222 (  32, 170, 20),  52.7 :    +15,    8,   97.3
   Houdidit 6.03               :    222 (  87, 129,  6),  68.2 :   +134,   10,  100.0
   Komodo 14                   :    222 (  87, 130,  5),  68.5 :   +143,   13,  100.0

4) Stockfish 11          3491 :    666 (+59,=509,-98),  47.1 %

   vs.                         :  games (  +,   =,  -),   (%) :   Diff,   SD, CFS (%)
   SFNNue-384 + FiNN 0.2       :    222 ( 20, 167, 35),  46.6 :    -24,   10,    0.8
   SFNNue-256 + GK 27-06       :    222 ( 19, 172, 31),  47.3 :    -23,   10,    1.4
   SFNNue-256 + GK 12-07       :    222 ( 20, 170, 32),  47.3 :    -15,    8,    2.7

5) Houdidit 6.03         3372 :    666 (+28,=361,-277),  31.3 %

   vs.                         :  games (  +,   =,   -),   (%) :   Diff,   SD, CFS (%)
   SFNNue-384 + FiNN 0.2       :    222 ( 12, 119,  91),  32.2 :   -142,   11,    0.0
   SFNNue-256 + GK 27-06       :    222 ( 10, 113,  99),  30.0 :   -142,   12,    0.0
   SFNNue-256 + GK 12-07       :    222 (  6, 129,  87),  31.8 :   -134,   10,    0.0

6) Komodo 14             3363 :    666 (+18,=366,-282),  30.2 %

   vs.                         :  games (  +,   =,   -),   (%) :   Diff,   SD, CFS (%)
   SFNNue-384 + FiNN 0.2       :    222 (  8, 111, 103),  28.6 :   -152,   11,    0.0
   SFNNue-256 + GK 27-06       :    222 (  5, 125,  92),  30.4 :   -151,   11,    0.0
   SFNNue-256 + GK 12-07       :    222 (  5, 130,  87),  31.5 :   -143,   13,    0.0
Match Conditions: 3min + 1s Ponder Off, Hash=64, 5-man syzygy, Ryzen-7 3750h.
Book: Noomen WchMatches: Botvinnik-Smyslov(1954,57,58), Tal-Botvinnik(1959,60), 111 positions.
Adjudication: 5-men TB, draw movenumber=50 movecount=5 score=5, resign movecount=5 score=800.
Games: https://gofile.io/d/wUKSJn