Can the sardine! NNUE clobbers SF.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

chrisw wrote: Sun Jul 19, 2020 5:30 pm
Laskos wrote: Sun Jul 19, 2020 5:25 pm
chrisw wrote: Sun Jul 19, 2020 4:44 pm
Laskos wrote: Sun Jul 19, 2020 3:09 pm

Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev

The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match

The PGN is here:
http://s000.tinyupload.com/?file_id=838 ... 7353082500
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.

Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.
Add to that that I used adjudication in Cutechess-Cli, !!!

Code: Select all

-draw movenumber=60 movecount=3 score=20 -resign movecount=3 score=900 -tb C:\syzygy5
chrisw
Posts: 4359
Joined: Tue Apr 03, 2012 4:28 pm

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Laskos wrote: Sun Jul 19, 2020 5:36 pm
chrisw wrote: Sun Jul 19, 2020 5:30 pm
Laskos wrote: Sun Jul 19, 2020 5:25 pm
chrisw wrote: Sun Jul 19, 2020 4:44 pm
Laskos wrote: Sun Jul 19, 2020 3:09 pm

Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev

The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match

The PGN is here:
http://s000.tinyupload.com/?file_id=838 ... 7353082500
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.

Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.
Add to that that I used adjudication in Cutechess-Cli, !!!

Code: Select all

-draw movenumber=60 movecount=3 score=20 -resign movecount=3 score=900 -tb C:\syzygy5
Yes, I was expecting and noticed anyway. Games were clearly being terminated.

I'm using this one, basically 'cos Ed wrote the batch files for me!

-draw movenumber=160 movecount=3 score=100 -resign movecount=5 score=500
opening book might play a part, I'm using Ed's 32000.pgn set at 12 plies with duplicates pre-removed. Duplicates are a constant problem, sneak in everywhere.
Ferdy
Posts: 4840
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Can the sardine! NNUE clobbers SF.

Post by Ferdy »

Laskos wrote: Sun Jul 19, 2020 10:15 am
Ferdy wrote: Sat Jul 18, 2020 10:27 am

Code: Select all

sim version 3

  Key:

  1) Amoeba 2.8 (time: 200 ms scale: 1.0)
  2) Andscacs 0.95 (time: 200 ms scale: 1.0)
  3) Arasan 21.1 (time: 200 ms scale: 1.0)
  4) Atlas 3.91 (time: 200 ms scale: 1.0)
  5) Bobcat v8.0 (time: 200 ms scale: 1.0)
  6) Booot 6.3.1 (time: 200 ms scale: 1.0)
  7) Cheng 4.39 (time: 200 ms scale: 1.0)
  8) Demolito 2018-10-29 (time: 200 ms scale: 1.0)
  9) Ethereal 11.25 (time: 200 ms scale: 1.0)
 10) Fizbo 2 (time: 200 ms scale: 1.0)
 11) Fruit 2.1 (time: 200 ms scale: 1.0)
 12) Fruit reloaded 3.2.1 x64-pop (time: 200 ms scale: 1.0)
 13) GreKo 2018.08 (time: 200 ms scale: 1.0)
 14) Gull 3 x64 (time: 200 ms scale: 1.0)
 15) Hannibal 1.7 x64 (time: 200 ms scale: 1.0)
 16) iCE 3.0 v658 (time: 200 ms scale: 1.0)
 17) Laser 1.7 (time: 200 ms scale: 1.0)
 18) Lc0 v0.26.0 blas w591226 (time: 200 ms scale: 1.0)
 19) Nemorino 5.00 (time: 200 ms scale: 1.0)
 20) Pedone 1.9 (time: 200 ms scale: 1.0)
 21) Rodent 0.275 (time: 200 ms scale: 1.0)
 22) RofChade Version 2.0 (time: 200 ms scale: 1.0)
 23) Senpai 2.0 (time: 200 ms scale: 1.0)
 24) SFNNUE_2020-07-15_GK_06-27 (time: 10 ms scale: 1.0)
 25) SmarThink 1.98 (time: 200 ms scale: 1.0)
 26) Stockfish 10 (time: 20 ms scale: 1.0)
 27) Stockfish 10 r1 (time: 20 ms scale: 1.0)
 28) Stockfish 11 (time: 10 ms scale: 1.0)
 29) Stockfish 2020-07-11 (time: 10 ms scale: 1.0)
 30) Texel 1.07 (time: 200 ms scale: 1.0)
 31) The Baron 3.44 (time: 200 ms scale: 1.0)
 32) Toga II 4.01 (time: 200 ms scale: 1.0)
 33) Vajolet2 2.6.1 (time: 200 ms scale: 1.0)
 34) Wasp 3.50 (time: 200 ms scale: 1.0)
 35) Winter 0.5 (time: 200 ms scale: 1.0)
 36) Xiphos 0.5 (time: 200 ms scale: 1.0)
 37) zurichess neuchatel (time: 200 ms scale: 1.0)

         1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29    30    31    32    33    34    35    36    37
  1.  ----- 20.32 25.20 25.69 27.14 22.55 19.66 25.46 18.17 17.36 27.01 10.18 27.09 24.41 23.49 22.10 16.64 12.95 14.65 21.61 26.33 23.04 25.53 17.47 14.36 19.86 20.02 16.16 19.06 24.87 22.99 26.34 18.38 23.55 24.76 15.99 26.51
  2.  20.32 ----- 40.16 40.56 40.62 40.40 29.70 42.69 32.82 29.16 35.34 11.12 37.81 43.53 40.18 36.60 28.74 17.32 21.09 31.71 41.19 41.15 43.03 23.59 20.73 33.41 33.99 21.74 26.63 41.85 32.90 40.03 29.97 40.08 36.05 26.95 38.15
  3.  25.20 40.16 ----- 45.28 48.24 41.56 33.53 46.72 34.07 31.03 43.11 12.39 44.34 46.65 42.72 40.43 29.23 18.03 23.27 35.37 47.49 44.72 47.73 26.73 22.49 36.70 36.96 23.85 30.31 47.45 38.91 44.99 32.73 44.77 40.69 27.74 45.14
  4.  25.69 40.56 45.28 ----- 47.48 42.00 33.13 47.97 32.41 29.36 47.04 13.09 44.84 45.85 45.36 38.99 28.70 18.66 24.00 37.56 46.55 43.89 46.83 25.47 23.59 34.58 35.18 23.10 29.25 45.51 38.63 48.62 33.09 45.57 42.30 27.54 45.71
  5.  27.14 40.62 48.24 47.48 ----- 41.92 36.57 48.49 34.83 31.39 47.62 12.82 48.66 47.57 43.26 42.77 30.23 18.63 23.80 37.48 50.90 46.84 49.88 25.97 22.35 35.63 36.51 24.07 30.42 50.30 41.64 48.66 32.07 44.91 44.06 28.73 49.48
  6.  22.55 40.40 41.56 42.00 41.92 ----- 30.26 45.52 33.52 29.63 39.01 12.07 39.79 45.24 41.67 38.03 29.51 17.52 23.08 33.47 43.26 42.51 43.28 24.65 20.48 35.26 36.25 22.92 28.50 42.52 34.40 43.81 31.74 43.28 37.22 26.73 41.48
  7.  19.66 29.70 33.53 33.13 36.57 30.26 ----- 32.92 24.62 23.71 31.86 11.34 32.30 33.66 30.83 30.72 22.14 14.82 19.12 26.68 34.74 31.97 34.12 18.85 17.54 25.49 26.27 17.29 21.19 34.45 28.91 33.54 23.96 32.20 30.70 21.57 32.48
  8.  25.46 42.69 46.72 47.97 48.49 45.52 32.92 ----- 35.13 31.46 44.80 12.75 46.64 48.92 45.55 42.26 30.49 19.24 25.42 38.77 46.52 47.35 49.14 26.38 23.27 37.63 38.50 23.55 30.76 46.30 39.33 48.37 34.62 47.45 42.67 29.40 48.73
  9.  18.17 32.82 34.07 32.41 34.83 33.52 24.62 35.13 ----- 24.97 28.44 10.34 31.57 35.65 32.06 31.29 26.40 15.77 19.59 26.88 33.93 36.96 36.21 20.38 17.37 29.15 29.40 18.28 22.61 35.34 28.07 32.22 25.65 33.76 30.23 22.98 32.61
 10.  17.36 29.16 31.03 29.36 31.39 29.63 23.71 31.46 24.97 ----- 26.78  9.53 29.59 31.35 29.36 28.94 22.29 16.19 17.35 24.99 31.83 31.31 32.47 21.36 17.43 26.26 26.81 17.47 21.92 33.02 26.60 28.84 22.83 29.59 27.94 21.50 30.27
 11.  27.01 35.34 43.11 47.04 47.62 39.01 31.86 44.80 28.44 26.78 ----- 16.27 48.11 43.37 40.87 36.19 26.17 17.07 23.37 35.47 46.75 39.60 43.01 25.26 21.83 32.37 32.90 23.16 28.93 43.12 39.28 55.23 29.83 43.12 41.99 24.47 45.01
 12.  10.18 11.12 12.39 13.09 12.82 12.07 11.34 12.75 10.34  9.53 16.27 ----- 13.10 11.85 11.93 11.68 10.63  8.38 10.43 11.51 13.04 11.52 12.09  8.59  8.25 10.06 10.46  8.82  9.40 12.01 12.67 15.09 11.76 13.06 12.41 10.21 12.45
 13.  27.09 37.81 44.34 44.84 48.66 39.79 32.30 46.64 31.57 29.59 48.11 13.10 ----- 43.43 40.79 38.44 28.14 18.15 23.04 35.75 46.95 42.44 47.50 25.35 20.87 34.06 34.60 22.94 29.23 44.09 40.45 46.97 31.59 42.77 43.71 26.22 46.13
 14.  24.41 43.53 46.65 45.85 47.57 45.24 33.66 48.92 35.65 31.35 43.37 11.85 43.43 ----- 44.45 41.85 31.31 18.41 24.13 36.36 47.48 46.78 49.24 25.53 23.02 36.94 37.22 23.85 29.41 47.24 37.93 46.50 34.07 45.87 40.19 29.01 45.63
 15.  23.49 40.18 42.72 45.36 43.26 41.67 30.83 45.55 32.06 29.36 40.87 11.93 40.79 44.45 ----- 36.97 28.82 17.75 23.84 34.32 43.81 41.85 44.76 24.16 21.49 34.38 35.08 22.23 28.48 42.57 37.00 44.55 32.14 43.18 39.20 26.94 41.99
 16.  22.10 36.60 40.43 38.99 42.77 38.03 30.72 42.26 31.29 28.94 36.19 11.68 38.44 41.85 36.97 ----- 26.91 16.91 21.80 31.35 41.21 41.20 42.38 23.39 20.47 31.56 32.54 21.15 25.26 41.62 33.95 39.35 29.16 38.70 35.76 26.68 40.31
 17.  16.64 28.74 29.23 28.70 30.23 29.51 22.14 30.49 26.40 22.29 26.17 10.63 28.14 31.31 28.82 26.91 ----- 13.89 17.39 25.14 30.10 30.66 31.20 17.12 14.76 25.69 25.78 16.40 19.85 30.71 24.42 28.96 23.05 30.35 26.88 20.61 28.21
 18.  12.95 17.32 18.03 18.66 18.63 17.52 14.82 19.24 15.77 16.19 17.07  8.38 18.15 18.41 17.75 16.91 13.89 ----- 11.88 15.34 18.57 18.86 18.83 15.49 10.59 16.87 17.50 12.94 15.17 18.84 16.17 17.82 14.51 18.16 17.61 13.63 18.09
 19.  14.65 21.09 23.27 24.00 23.80 23.08 19.12 25.42 19.59 17.35 23.37 10.43 23.04 24.13 23.84 21.80 17.39 11.88 ----- 20.49 24.19 22.93 25.01 14.53 13.18 20.45 21.36 14.82 16.52 23.82 20.98 24.68 20.66 24.70 22.06 17.32 23.32
 20.  21.61 31.71 35.37 37.56 37.48 33.47 26.68 38.77 26.88 24.99 35.47 11.51 35.75 36.36 34.32 31.35 25.14 15.34 20.49 ----- 36.88 35.52 37.11 22.09 17.88 29.93 30.81 20.65 25.39 36.77 30.77 36.03 27.77 35.65 34.18 24.12 37.27
 21.  26.33 41.19 47.49 46.55 50.90 43.26 34.74 46.52 33.93 31.83 46.75 13.04 46.95 47.48 43.81 41.21 30.10 18.57 24.19 36.88 ----- 46.44 48.39 26.77 22.58 36.60 37.62 24.25 31.50 49.15 40.33 48.79 32.76 45.82 43.25 28.85 48.92
 22.  23.04 41.15 44.72 43.89 46.84 42.51 31.97 47.35 36.96 31.31 39.60 11.52 42.44 46.78 41.85 41.20 30.66 18.86 22.93 35.52 46.44 ----- 46.49 25.42 21.63 37.11 37.96 23.66 30.15 46.32 35.89 43.21 32.35 43.70 40.20 29.61 45.05
 23.  25.53 43.03 47.73 46.83 49.88 43.28 34.12 49.14 36.21 32.47 43.01 12.09 47.50 49.24 44.76 42.38 31.20 18.83 25.01 37.11 48.39 46.49 ----- 27.18 22.88 38.06 38.75 24.06 31.09 48.59 39.99 46.98 33.42 46.41 43.35 29.97 47.82
 24.  17.47 23.59 26.73 25.47 25.97 24.65 18.85 26.38 20.38 21.36 25.26  8.59 25.35 25.53 24.16 23.39 17.12 15.49 14.53 22.09 26.77 25.42 27.18 ----- 15.19 25.09 25.38 20.82 24.59 26.18 23.34 25.35 18.89 24.42 24.45 17.70 26.66
 25.  14.36 20.73 22.49 23.59 22.35 20.48 17.54 23.27 17.37 17.43 21.83  8.25 20.87 23.02 21.49 20.47 14.76 10.59 13.18 17.88 22.58 21.63 22.88 15.19 ----- 18.06 18.39 13.57 16.12 22.12 20.53 22.60 16.81 21.91 19.82 15.22 22.41
 26.  19.86 33.41 36.70 34.58 35.63 35.26 25.49 37.63 29.15 26.26 32.37 10.06 34.06 36.94 34.38 31.56 25.69 16.87 20.45 29.93 36.60 37.11 38.06 25.09 18.06 ----- 68.67 24.02 30.13 37.10 31.03 34.22 26.40 35.28 32.50 24.63 36.13
 27.  20.02 33.99 36.96 35.18 36.51 36.25 26.27 38.50 29.40 26.81 32.90 10.46 34.60 37.22 35.08 32.54 25.78 17.50 21.36 30.81 37.62 37.96 38.75 25.38 18.39 68.67 ----- 24.70 30.95 37.79 31.62 35.54 27.23 36.03 33.78 24.82 36.06
 28.  16.16 21.74 23.85 23.10 24.07 22.92 17.29 23.55 18.28 17.47 23.16  8.82 22.94 23.85 22.23 21.15 16.40 12.94 14.82 20.65 24.25 23.66 24.06 20.82 13.57 24.02 24.70 ----- 28.07 23.25 21.47 22.63 17.97 23.09 22.87 16.50 24.17
 29.  19.06 26.63 30.31 29.25 30.42 28.50 21.19 30.76 22.61 21.92 28.93  9.40 29.23 29.41 28.48 25.26 19.85 15.17 16.52 25.39 31.50 30.15 31.09 24.59 16.12 30.13 30.95 28.07 ----- 30.57 26.40 29.22 21.12 28.51 28.53 20.16 30.57
 30.  24.87 41.85 47.45 45.51 50.30 42.52 34.45 46.30 35.34 33.02 43.12 12.01 44.09 47.24 42.57 41.62 30.71 18.84 23.82 36.77 49.15 46.32 48.59 26.18 22.12 37.10 37.79 23.25 30.57 ----- 38.65 46.04 31.88 45.74 42.34 29.11 46.21
 31.  22.99 32.90 38.91 38.63 41.64 34.40 28.91 39.33 28.07 26.60 39.28 12.67 40.45 37.93 37.00 33.95 24.42 16.17 20.98 30.77 40.33 35.89 39.99 23.34 20.53 31.03 31.62 21.47 26.40 38.65 ----- 39.33 27.29 37.11 36.84 23.80 39.58
 32.  26.34 40.03 44.99 48.62 48.66 43.81 33.54 48.37 32.22 28.84 55.23 15.09 46.97 46.50 44.55 39.35 28.96 17.82 24.68 36.03 48.79 43.21 46.98 25.35 22.60 34.22 35.54 22.63 29.22 46.04 39.33 ----- 32.64 46.49 43.02 25.94 46.37
 33.  18.38 29.97 32.73 33.09 32.07 31.74 23.96 34.62 25.65 22.83 29.83 11.76 31.59 34.07 32.14 29.16 23.05 14.51 20.66 27.77 32.76 32.35 33.42 18.89 16.81 26.40 27.23 17.97 21.12 31.88 27.29 32.64 ----- 32.51 29.11 21.74 31.23
 34.  23.55 40.08 44.77 45.57 44.91 43.28 32.20 47.45 33.76 29.59 43.12 13.06 42.77 45.87 43.18 38.70 30.35 18.16 24.70 35.65 45.82 43.70 46.41 24.42 21.91 35.28 36.03 23.09 28.51 45.74 37.11 46.49 32.51 ----- 39.33 26.56 43.18
 35.  24.76 36.05 40.69 42.30 44.06 37.22 30.70 42.67 30.23 27.94 41.99 12.41 43.71 40.19 39.20 35.76 26.88 17.61 22.06 34.18 43.25 40.20 43.35 24.45 19.82 32.50 33.78 22.87 28.53 42.34 36.84 43.02 29.11 39.33 ----- 25.53 43.11
 36.  15.99 26.95 27.74 27.54 28.73 26.73 21.57 29.40 22.98 21.50 24.47 10.21 26.22 29.01 26.94 26.68 20.61 13.63 17.32 24.12 28.85 29.61 29.97 17.70 15.22 24.63 24.82 16.50 20.16 29.11 23.80 25.94 21.74 26.56 25.53 ----- 28.71
 37.  26.51 38.15 45.14 45.71 49.48 41.48 32.48 48.73 32.61 30.27 45.01 12.45 46.13 45.63 41.99 40.31 28.21 18.09 23.32 37.27 48.92 45.05 47.82 26.66 22.41 36.13 36.06 24.17 30.57 46.21 39.58 46.37 31.23 43.18 43.11 28.71 -----


Ferdy, what's the matter with some of your matrix numbers? There are hardly any 2 strong engines showing less than 25% similarity at 100ms on one core. You matrix contains lots of below 25% values, especially with with SF and SF NNUE. The similarities range usually from percentages in 30s for very unrelated engines (see "Shredder 6" in my matrix) to percentages in 60s for very related engines (see SF_11 and SF_dev or 2 SF_dev). Your matrix numbers are simply weird and often way too low, therefore the clustering is maybe meaningless.

Code: Select all

sim version 3

  Key:

  1) Shredder 6 (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----

The original sim sends command to the engine via go depth 50 then sends stop and collect the bestmove returned. Mine is sending go movetime <time>, then wait for the engine to send its bestmove and collect it. Perhaps this could be the difference.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

chrisw wrote: Sun Jul 19, 2020 5:44 pm
Laskos wrote: Sun Jul 19, 2020 5:36 pm
chrisw wrote: Sun Jul 19, 2020 5:30 pm
Laskos wrote: Sun Jul 19, 2020 5:25 pm
chrisw wrote: Sun Jul 19, 2020 4:44 pm
Laskos wrote: Sun Jul 19, 2020 3:09 pm

Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev

The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match

The PGN is here:
http://s000.tinyupload.com/?file_id=838 ... 7353082500
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.

Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.
Add to that that I used adjudication in Cutechess-Cli, !!!

Code: Select all

-draw movenumber=60 movecount=3 score=20 -resign movecount=3 score=900 -tb C:\syzygy5
Yes, I was expecting and noticed anyway. Games were clearly being terminated.

I'm using this one, basically 'cos Ed wrote the batch files for me!

-draw movenumber=160 movecount=3 score=100 -resign movecount=5 score=500
opening book might play a part, I'm using Ed's 32000.pgn set at 12 plies with duplicates pre-removed. Duplicates are a constant problem, sneak in everywhere.
I am often using EPDs of my 3-movers (6 plies) from a large human games (Elo 2200+) database. Different sorts of suites, often imbalanced opening suites. I am removing duplicates from EPD files using EPDTools.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

Now I have 2 NNUE nets in the cluster, the FiNN02 384x2-32-32 net is even less related to anything compared to the KG 256x2-32-32 net. They seem to be close in strength. They aren't even closely related between themselves, 2 NNUE nets.


Image
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: Can the sardine! NNUE clobbers SF.

Post by Raphexon »

Laskos wrote: Sun Jul 19, 2020 6:08 pm Now I have 2 NNUE nets in the cluster, the FiNN02 384x2-32-32 net is even less related to anything compared to the KG 256x2-32-32 net. They seem to be close in strength. They aren't even closely related between themselves, 2 NNUE nets.


Image
Isn't that awesome?
Totally different from anything else while also adding a ton of elo over the previous strongest (CPU) engine.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Can the sardine! NNUE clobbers SF.

Post by cdani »

chrisw wrote: Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
chrisw
Posts: 4359
Joined: Tue Apr 03, 2012 4:28 pm

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

cdani wrote: Mon Jul 20, 2020 7:45 am
chrisw wrote: Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.

Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.

Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

chrisw wrote: Mon Jul 20, 2020 9:09 am
cdani wrote: Mon Jul 20, 2020 7:45 am
chrisw wrote: Sun Jul 19, 2020 4:44 pmOk, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
I'm analyzing also with it the games I'm playing at ICCF and it seems to be able to understand that the endgames are wining clearly before other engines. The endgame static evaluation of Stockfish is not evolved, in the sense that there are not much specific eval positional parameters for the endgame, so I think NNUE had more to win there.
And also the NNUE net organization, where the parameters are the piece positions related to the king positions, favors heavily the endgame evaluation, as is well known that the king positions are paramount in the endgames. Change the net organization and you will probably favor other phase or type of play.
Yes, basically I agree, although I want to profile more games to be sure. The two together (SF and NNUE) went typically rapidly into exchanging pieces with the NNUE merging with better endgame, and winning. My thoughts (initial) are around the idea that NNUE has a good handle on pawn structure and likelihood to convert into a win. SF-dev seemed unduly ready, early on to give up a pawn in exchange for piece activity, but then went along with piece exchanges, negating the purpose.

Laskos, what was your contempt set to? Because I noticed games (ending positions) where SF was slightly worse, but was still the side, after a l o n g series of no pawn moves, no captures shuffle, to break the shuffle with a non-reversing move, when is could have just taken the draw.

Still, the main point, for me, is that from what seen so far, they are not producing interesting chess.
No, SF_dev was Contempt=0. I have no idea what contempt the NNUE SF has.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Can the sardine! NNUE clobbers SF.

Post by cdani »

Laskos wrote: Sun Jul 19, 2020 9:28 am
smatovic wrote: Sun Jul 19, 2020 9:08 am Maybe a depth 1 match between LC0 and NNUE will be useful, to get an idea of how the networks perform against each other, and of what importance the whole search is, or alike.

--
Srdja
No, the search is still SF. To depth=1 I compared SF NNUE to SF_dev, and SF NNUE is significantly stronger to depth=1:

depth=1
Score of SF_NNUE vs SF_dev: 655 - 265 - 80 [0.695] 1000
Elo difference: 143.1 +/- 22.3, LOS: 100.0 %, DrawRatio: 8.0 %
Finished match

So, the net eval helps a lot at depth=1.

As chess is more search related than evaluation related, NNUE loses part of its big static eval advantage as more games end due to search happenings. Also, it loses more being slower, as it goes a little less deep.
If someone tunes somehow static eval of regular Stockfish to imitate NNUE static eval, this will probably overcome NNUE current maybe 30 elo advantage and win more than this.
Also, I suppose there is something to be won tunning search parameters of NNUE to take advantage of the better static eval, so its able to visit less unneeded nodes.
Also probably some NN tuning of search parameters probably can net a nice gain.
All easier said than done, of course.