Can the sardine! NNUE clobbers SF.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

Ferdy wrote: Sat Jul 18, 2020 10:27 am

Code: Select all

sim version 3

  Key:

  1) Amoeba 2.8 (time: 200 ms scale: 1.0)
  2) Andscacs 0.95 (time: 200 ms scale: 1.0)
  3) Arasan 21.1 (time: 200 ms scale: 1.0)
  4) Atlas 3.91 (time: 200 ms scale: 1.0)
  5) Bobcat v8.0 (time: 200 ms scale: 1.0)
  6) Booot 6.3.1 (time: 200 ms scale: 1.0)
  7) Cheng 4.39 (time: 200 ms scale: 1.0)
  8) Demolito 2018-10-29 (time: 200 ms scale: 1.0)
  9) Ethereal 11.25 (time: 200 ms scale: 1.0)
 10) Fizbo 2 (time: 200 ms scale: 1.0)
 11) Fruit 2.1 (time: 200 ms scale: 1.0)
 12) Fruit reloaded 3.2.1 x64-pop (time: 200 ms scale: 1.0)
 13) GreKo 2018.08 (time: 200 ms scale: 1.0)
 14) Gull 3 x64 (time: 200 ms scale: 1.0)
 15) Hannibal 1.7 x64 (time: 200 ms scale: 1.0)
 16) iCE 3.0 v658 (time: 200 ms scale: 1.0)
 17) Laser 1.7 (time: 200 ms scale: 1.0)
 18) Lc0 v0.26.0 blas w591226 (time: 200 ms scale: 1.0)
 19) Nemorino 5.00 (time: 200 ms scale: 1.0)
 20) Pedone 1.9 (time: 200 ms scale: 1.0)
 21) Rodent 0.275 (time: 200 ms scale: 1.0)
 22) RofChade Version 2.0 (time: 200 ms scale: 1.0)
 23) Senpai 2.0 (time: 200 ms scale: 1.0)
 24) SFNNUE_2020-07-15_GK_06-27 (time: 10 ms scale: 1.0)
 25) SmarThink 1.98 (time: 200 ms scale: 1.0)
 26) Stockfish 10 (time: 20 ms scale: 1.0)
 27) Stockfish 10 r1 (time: 20 ms scale: 1.0)
 28) Stockfish 11 (time: 10 ms scale: 1.0)
 29) Stockfish 2020-07-11 (time: 10 ms scale: 1.0)
 30) Texel 1.07 (time: 200 ms scale: 1.0)
 31) The Baron 3.44 (time: 200 ms scale: 1.0)
 32) Toga II 4.01 (time: 200 ms scale: 1.0)
 33) Vajolet2 2.6.1 (time: 200 ms scale: 1.0)
 34) Wasp 3.50 (time: 200 ms scale: 1.0)
 35) Winter 0.5 (time: 200 ms scale: 1.0)
 36) Xiphos 0.5 (time: 200 ms scale: 1.0)
 37) zurichess neuchatel (time: 200 ms scale: 1.0)

         1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25    26    27    28    29    30    31    32    33    34    35    36    37
  1.  ----- 20.32 25.20 25.69 27.14 22.55 19.66 25.46 18.17 17.36 27.01 10.18 27.09 24.41 23.49 22.10 16.64 12.95 14.65 21.61 26.33 23.04 25.53 17.47 14.36 19.86 20.02 16.16 19.06 24.87 22.99 26.34 18.38 23.55 24.76 15.99 26.51
  2.  20.32 ----- 40.16 40.56 40.62 40.40 29.70 42.69 32.82 29.16 35.34 11.12 37.81 43.53 40.18 36.60 28.74 17.32 21.09 31.71 41.19 41.15 43.03 23.59 20.73 33.41 33.99 21.74 26.63 41.85 32.90 40.03 29.97 40.08 36.05 26.95 38.15
  3.  25.20 40.16 ----- 45.28 48.24 41.56 33.53 46.72 34.07 31.03 43.11 12.39 44.34 46.65 42.72 40.43 29.23 18.03 23.27 35.37 47.49 44.72 47.73 26.73 22.49 36.70 36.96 23.85 30.31 47.45 38.91 44.99 32.73 44.77 40.69 27.74 45.14
  4.  25.69 40.56 45.28 ----- 47.48 42.00 33.13 47.97 32.41 29.36 47.04 13.09 44.84 45.85 45.36 38.99 28.70 18.66 24.00 37.56 46.55 43.89 46.83 25.47 23.59 34.58 35.18 23.10 29.25 45.51 38.63 48.62 33.09 45.57 42.30 27.54 45.71
  5.  27.14 40.62 48.24 47.48 ----- 41.92 36.57 48.49 34.83 31.39 47.62 12.82 48.66 47.57 43.26 42.77 30.23 18.63 23.80 37.48 50.90 46.84 49.88 25.97 22.35 35.63 36.51 24.07 30.42 50.30 41.64 48.66 32.07 44.91 44.06 28.73 49.48
  6.  22.55 40.40 41.56 42.00 41.92 ----- 30.26 45.52 33.52 29.63 39.01 12.07 39.79 45.24 41.67 38.03 29.51 17.52 23.08 33.47 43.26 42.51 43.28 24.65 20.48 35.26 36.25 22.92 28.50 42.52 34.40 43.81 31.74 43.28 37.22 26.73 41.48
  7.  19.66 29.70 33.53 33.13 36.57 30.26 ----- 32.92 24.62 23.71 31.86 11.34 32.30 33.66 30.83 30.72 22.14 14.82 19.12 26.68 34.74 31.97 34.12 18.85 17.54 25.49 26.27 17.29 21.19 34.45 28.91 33.54 23.96 32.20 30.70 21.57 32.48
  8.  25.46 42.69 46.72 47.97 48.49 45.52 32.92 ----- 35.13 31.46 44.80 12.75 46.64 48.92 45.55 42.26 30.49 19.24 25.42 38.77 46.52 47.35 49.14 26.38 23.27 37.63 38.50 23.55 30.76 46.30 39.33 48.37 34.62 47.45 42.67 29.40 48.73
  9.  18.17 32.82 34.07 32.41 34.83 33.52 24.62 35.13 ----- 24.97 28.44 10.34 31.57 35.65 32.06 31.29 26.40 15.77 19.59 26.88 33.93 36.96 36.21 20.38 17.37 29.15 29.40 18.28 22.61 35.34 28.07 32.22 25.65 33.76 30.23 22.98 32.61
 10.  17.36 29.16 31.03 29.36 31.39 29.63 23.71 31.46 24.97 ----- 26.78  9.53 29.59 31.35 29.36 28.94 22.29 16.19 17.35 24.99 31.83 31.31 32.47 21.36 17.43 26.26 26.81 17.47 21.92 33.02 26.60 28.84 22.83 29.59 27.94 21.50 30.27
 11.  27.01 35.34 43.11 47.04 47.62 39.01 31.86 44.80 28.44 26.78 ----- 16.27 48.11 43.37 40.87 36.19 26.17 17.07 23.37 35.47 46.75 39.60 43.01 25.26 21.83 32.37 32.90 23.16 28.93 43.12 39.28 55.23 29.83 43.12 41.99 24.47 45.01
 12.  10.18 11.12 12.39 13.09 12.82 12.07 11.34 12.75 10.34  9.53 16.27 ----- 13.10 11.85 11.93 11.68 10.63  8.38 10.43 11.51 13.04 11.52 12.09  8.59  8.25 10.06 10.46  8.82  9.40 12.01 12.67 15.09 11.76 13.06 12.41 10.21 12.45
 13.  27.09 37.81 44.34 44.84 48.66 39.79 32.30 46.64 31.57 29.59 48.11 13.10 ----- 43.43 40.79 38.44 28.14 18.15 23.04 35.75 46.95 42.44 47.50 25.35 20.87 34.06 34.60 22.94 29.23 44.09 40.45 46.97 31.59 42.77 43.71 26.22 46.13
 14.  24.41 43.53 46.65 45.85 47.57 45.24 33.66 48.92 35.65 31.35 43.37 11.85 43.43 ----- 44.45 41.85 31.31 18.41 24.13 36.36 47.48 46.78 49.24 25.53 23.02 36.94 37.22 23.85 29.41 47.24 37.93 46.50 34.07 45.87 40.19 29.01 45.63
 15.  23.49 40.18 42.72 45.36 43.26 41.67 30.83 45.55 32.06 29.36 40.87 11.93 40.79 44.45 ----- 36.97 28.82 17.75 23.84 34.32 43.81 41.85 44.76 24.16 21.49 34.38 35.08 22.23 28.48 42.57 37.00 44.55 32.14 43.18 39.20 26.94 41.99
 16.  22.10 36.60 40.43 38.99 42.77 38.03 30.72 42.26 31.29 28.94 36.19 11.68 38.44 41.85 36.97 ----- 26.91 16.91 21.80 31.35 41.21 41.20 42.38 23.39 20.47 31.56 32.54 21.15 25.26 41.62 33.95 39.35 29.16 38.70 35.76 26.68 40.31
 17.  16.64 28.74 29.23 28.70 30.23 29.51 22.14 30.49 26.40 22.29 26.17 10.63 28.14 31.31 28.82 26.91 ----- 13.89 17.39 25.14 30.10 30.66 31.20 17.12 14.76 25.69 25.78 16.40 19.85 30.71 24.42 28.96 23.05 30.35 26.88 20.61 28.21
 18.  12.95 17.32 18.03 18.66 18.63 17.52 14.82 19.24 15.77 16.19 17.07  8.38 18.15 18.41 17.75 16.91 13.89 ----- 11.88 15.34 18.57 18.86 18.83 15.49 10.59 16.87 17.50 12.94 15.17 18.84 16.17 17.82 14.51 18.16 17.61 13.63 18.09
 19.  14.65 21.09 23.27 24.00 23.80 23.08 19.12 25.42 19.59 17.35 23.37 10.43 23.04 24.13 23.84 21.80 17.39 11.88 ----- 20.49 24.19 22.93 25.01 14.53 13.18 20.45 21.36 14.82 16.52 23.82 20.98 24.68 20.66 24.70 22.06 17.32 23.32
 20.  21.61 31.71 35.37 37.56 37.48 33.47 26.68 38.77 26.88 24.99 35.47 11.51 35.75 36.36 34.32 31.35 25.14 15.34 20.49 ----- 36.88 35.52 37.11 22.09 17.88 29.93 30.81 20.65 25.39 36.77 30.77 36.03 27.77 35.65 34.18 24.12 37.27
 21.  26.33 41.19 47.49 46.55 50.90 43.26 34.74 46.52 33.93 31.83 46.75 13.04 46.95 47.48 43.81 41.21 30.10 18.57 24.19 36.88 ----- 46.44 48.39 26.77 22.58 36.60 37.62 24.25 31.50 49.15 40.33 48.79 32.76 45.82 43.25 28.85 48.92
 22.  23.04 41.15 44.72 43.89 46.84 42.51 31.97 47.35 36.96 31.31 39.60 11.52 42.44 46.78 41.85 41.20 30.66 18.86 22.93 35.52 46.44 ----- 46.49 25.42 21.63 37.11 37.96 23.66 30.15 46.32 35.89 43.21 32.35 43.70 40.20 29.61 45.05
 23.  25.53 43.03 47.73 46.83 49.88 43.28 34.12 49.14 36.21 32.47 43.01 12.09 47.50 49.24 44.76 42.38 31.20 18.83 25.01 37.11 48.39 46.49 ----- 27.18 22.88 38.06 38.75 24.06 31.09 48.59 39.99 46.98 33.42 46.41 43.35 29.97 47.82
 24.  17.47 23.59 26.73 25.47 25.97 24.65 18.85 26.38 20.38 21.36 25.26  8.59 25.35 25.53 24.16 23.39 17.12 15.49 14.53 22.09 26.77 25.42 27.18 ----- 15.19 25.09 25.38 20.82 24.59 26.18 23.34 25.35 18.89 24.42 24.45 17.70 26.66
 25.  14.36 20.73 22.49 23.59 22.35 20.48 17.54 23.27 17.37 17.43 21.83  8.25 20.87 23.02 21.49 20.47 14.76 10.59 13.18 17.88 22.58 21.63 22.88 15.19 ----- 18.06 18.39 13.57 16.12 22.12 20.53 22.60 16.81 21.91 19.82 15.22 22.41
 26.  19.86 33.41 36.70 34.58 35.63 35.26 25.49 37.63 29.15 26.26 32.37 10.06 34.06 36.94 34.38 31.56 25.69 16.87 20.45 29.93 36.60 37.11 38.06 25.09 18.06 ----- 68.67 24.02 30.13 37.10 31.03 34.22 26.40 35.28 32.50 24.63 36.13
 27.  20.02 33.99 36.96 35.18 36.51 36.25 26.27 38.50 29.40 26.81 32.90 10.46 34.60 37.22 35.08 32.54 25.78 17.50 21.36 30.81 37.62 37.96 38.75 25.38 18.39 68.67 ----- 24.70 30.95 37.79 31.62 35.54 27.23 36.03 33.78 24.82 36.06
 28.  16.16 21.74 23.85 23.10 24.07 22.92 17.29 23.55 18.28 17.47 23.16  8.82 22.94 23.85 22.23 21.15 16.40 12.94 14.82 20.65 24.25 23.66 24.06 20.82 13.57 24.02 24.70 ----- 28.07 23.25 21.47 22.63 17.97 23.09 22.87 16.50 24.17
 29.  19.06 26.63 30.31 29.25 30.42 28.50 21.19 30.76 22.61 21.92 28.93  9.40 29.23 29.41 28.48 25.26 19.85 15.17 16.52 25.39 31.50 30.15 31.09 24.59 16.12 30.13 30.95 28.07 ----- 30.57 26.40 29.22 21.12 28.51 28.53 20.16 30.57
 30.  24.87 41.85 47.45 45.51 50.30 42.52 34.45 46.30 35.34 33.02 43.12 12.01 44.09 47.24 42.57 41.62 30.71 18.84 23.82 36.77 49.15 46.32 48.59 26.18 22.12 37.10 37.79 23.25 30.57 ----- 38.65 46.04 31.88 45.74 42.34 29.11 46.21
 31.  22.99 32.90 38.91 38.63 41.64 34.40 28.91 39.33 28.07 26.60 39.28 12.67 40.45 37.93 37.00 33.95 24.42 16.17 20.98 30.77 40.33 35.89 39.99 23.34 20.53 31.03 31.62 21.47 26.40 38.65 ----- 39.33 27.29 37.11 36.84 23.80 39.58
 32.  26.34 40.03 44.99 48.62 48.66 43.81 33.54 48.37 32.22 28.84 55.23 15.09 46.97 46.50 44.55 39.35 28.96 17.82 24.68 36.03 48.79 43.21 46.98 25.35 22.60 34.22 35.54 22.63 29.22 46.04 39.33 ----- 32.64 46.49 43.02 25.94 46.37
 33.  18.38 29.97 32.73 33.09 32.07 31.74 23.96 34.62 25.65 22.83 29.83 11.76 31.59 34.07 32.14 29.16 23.05 14.51 20.66 27.77 32.76 32.35 33.42 18.89 16.81 26.40 27.23 17.97 21.12 31.88 27.29 32.64 ----- 32.51 29.11 21.74 31.23
 34.  23.55 40.08 44.77 45.57 44.91 43.28 32.20 47.45 33.76 29.59 43.12 13.06 42.77 45.87 43.18 38.70 30.35 18.16 24.70 35.65 45.82 43.70 46.41 24.42 21.91 35.28 36.03 23.09 28.51 45.74 37.11 46.49 32.51 ----- 39.33 26.56 43.18
 35.  24.76 36.05 40.69 42.30 44.06 37.22 30.70 42.67 30.23 27.94 41.99 12.41 43.71 40.19 39.20 35.76 26.88 17.61 22.06 34.18 43.25 40.20 43.35 24.45 19.82 32.50 33.78 22.87 28.53 42.34 36.84 43.02 29.11 39.33 ----- 25.53 43.11
 36.  15.99 26.95 27.74 27.54 28.73 26.73 21.57 29.40 22.98 21.50 24.47 10.21 26.22 29.01 26.94 26.68 20.61 13.63 17.32 24.12 28.85 29.61 29.97 17.70 15.22 24.63 24.82 16.50 20.16 29.11 23.80 25.94 21.74 26.56 25.53 ----- 28.71
 37.  26.51 38.15 45.14 45.71 49.48 41.48 32.48 48.73 32.61 30.27 45.01 12.45 46.13 45.63 41.99 40.31 28.21 18.09 23.32 37.27 48.92 45.05 47.82 26.66 22.41 36.13 36.06 24.17 30.57 46.21 39.58 46.37 31.23 43.18 43.11 28.71 -----


Ferdy, what's the matter with some of your matrix numbers? There are hardly any 2 strong engines showing less than 25% similarity at 100ms on one core. You matrix contains lots of below 25% values, especially with with SF and SF NNUE. The similarities range usually from percentages in 30s for very unrelated engines (see "Shredder 6" in my matrix) to percentages in 60s for very related engines (see SF_11 and SF_dev or 2 SF_dev). Your matrix numbers are simply weird and often way too low, therefore the clustering is maybe meaningless.

Code: Select all

sim version 3

  Key:

  1) Shredder 6 (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----

User avatar
M ANSARI
Posts: 3707
Joined: Thu Mar 16, 2006 7:10 pm

Re: Can the sardine! NNUE clobbers SF.

Post by M ANSARI »

Is this any different than Rybka using millions of super fast games to tune the evaluation many years back? Obviously with GPU hardware this can be done dramatically more efficiently!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Laskos wrote: Sun Jul 19, 2020 10:49 am
Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
I’ld be interested to analyse the pgns if you can work out a way to send them?
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

chrisw wrote: Sun Jul 19, 2020 2:02 pm
Laskos wrote: Sun Jul 19, 2020 10:49 am
Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Laskos wrote: Sun Jul 19, 2020 2:11 pm
chrisw wrote: Sun Jul 19, 2020 2:02 pm
Laskos wrote: Sun Jul 19, 2020 10:49 am
Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

chrisw wrote: Sun Jul 19, 2020 2:28 pm
Laskos wrote: Sun Jul 19, 2020 2:11 pm
chrisw wrote: Sun Jul 19, 2020 2:02 pm
Laskos wrote: Sun Jul 19, 2020 10:49 am
Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev

The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match

The PGN is here:
http://s000.tinyupload.com/?file_id=838 ... 7353082500
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Laskos wrote: Sun Jul 19, 2020 3:09 pm
chrisw wrote: Sun Jul 19, 2020 2:28 pm
Laskos wrote: Sun Jul 19, 2020 2:11 pm
chrisw wrote: Sun Jul 19, 2020 2:02 pm
Laskos wrote: Sun Jul 19, 2020 10:49 am
Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev

The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match

The PGN is here:
http://s000.tinyupload.com/?file_id=838 ... 7353082500
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Can the sardine! NNUE clobbers SF.

Post by Laskos »

chrisw wrote: Sun Jul 19, 2020 4:44 pm
Laskos wrote: Sun Jul 19, 2020 3:09 pm
chrisw wrote: Sun Jul 19, 2020 2:28 pm
Laskos wrote: Sun Jul 19, 2020 2:11 pm
chrisw wrote: Sun Jul 19, 2020 2:02 pm
Laskos wrote: Sun Jul 19, 2020 10:49 am
Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev

The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match

The PGN is here:
http://s000.tinyupload.com/?file_id=838 ... 7353082500
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.

Thanks Chris for the assessment. The openings are 6-pliers.
chrisw
Posts: 4315
Joined: Tue Apr 03, 2012 4:28 pm

Re: Can the sardine! NNUE clobbers SF.

Post by chrisw »

Laskos wrote: Sun Jul 19, 2020 5:25 pm
chrisw wrote: Sun Jul 19, 2020 4:44 pm
Laskos wrote: Sun Jul 19, 2020 3:09 pm
chrisw wrote: Sun Jul 19, 2020 2:28 pm
Laskos wrote: Sun Jul 19, 2020 2:11 pm
chrisw wrote: Sun Jul 19, 2020 2:02 pm
Laskos wrote: Sun Jul 19, 2020 10:49 am
Laskos wrote: Sun Jul 19, 2020 8:48 am
MMarco wrote: Sat Jul 18, 2020 3:08 pm Very interesting.

I noticed that on the second plot SF 10 seems unrelated to any other engine. Is there a particular reason for this?

Thanks!
Look at his matrix, some weird numbers for Sim. I guess the dendrogram is nonsense too.

I managed to test under SIM the SF NNUE_GK_06-27, it was a location issue, one has to have the executable in the same folder as the Sim tool.

Code: Select all

sim version 3

  Key:

  1) Shredder (time: 100 ms  scale: 1)
  2) Lc0 SV_3010 (time: 100 ms  scale: 1.0)
  3) Lc0 T70 (time: 100 ms  scale: 1.0)
  4) SF_NNUE_bis_location (time: 100 ms  scale: 1)
  5) Stockfish 10 (time: 100 ms  scale: 1)
  6) Stockfish 11 (time: 100 ms  scale: 1)
  7) Stockfish 6 (time: 100 ms  scale: 1)
  8) Stockfish 8 (time: 100 ms  scale: 1)
  9) Stockfish dev (time: 100 ms  scale: 1)
 10) Stockfish dev Bis (time: 100 ms  scale: 1)
 11) 1Andscacs (time: 100 ms  scale: 1)
 12) Komodo 14 (time: 100 ms  scale: 1)
 13) Ethereal (time: 100 ms  scale: 1)

         1     2     3     4     5     6     7     8     9    10    11    12    13
  1.  ----- 36.00 38.76 38.53 40.05 39.80 42.40 40.52 39.04 39.41 40.76 40.64 40.67
  2.  36.00 ----- 67.27 55.32 52.15 52.82 48.94 50.72 53.88 53.34 49.54 49.90 49.07
  3.  38.76 67.27 ----- 55.83 52.62 52.61 49.78 51.07 52.62 52.18 49.73 50.44 50.30
  4.  38.53 55.32 55.83 ----- 55.12 55.38 51.52 54.33 55.80 55.24 52.05 52.11 51.76
  5.  40.05 52.15 52.62 55.12 ----- 64.79 56.43 61.67 63.57 63.51 56.05 55.83 56.25
  6.  39.80 52.82 52.61 55.38 64.79 ----- 55.52 60.31 66.04 65.53 55.55 55.90 54.95
  7.  42.40 48.94 49.78 51.52 56.43 55.52 ----- 60.96 54.79 54.86 56.02 54.90 54.05
  8.  40.52 50.72 51.07 54.33 61.67 60.31 60.96 ----- 59.03 58.62 58.55 56.19 55.45
  9.  39.04 53.88 52.62 55.80 63.57 66.04 54.79 59.03 ----- 66.06 54.88 54.60 53.96
 10.  39.41 53.34 52.18 55.24 63.51 65.53 54.86 58.62 66.06 ----- 54.22 54.90 54.47
 11.  40.76 49.54 49.73 52.05 56.05 55.55 56.02 58.55 54.88 54.22 ----- 53.91 53.44
 12.  40.64 49.90 50.44 52.11 55.83 55.90 54.90 56.19 54.60 54.90 53.91 ----- 53.86
 13.  40.67 49.07 50.30 51.76 56.25 54.95 54.05 55.45 53.96 54.47 53.44 53.86 -----


From the matrix, it is apparent that SF NNUE GK is approximately as close to SF_dev as to Lc0, similarly distanced with respect to them, and is not very similar to anything.


Both correlation and distance methods for cluster give the same clustering shown here:

Image

Which seems to show that SF NNUE is a bit closer to Lc0 than to SF_dev, but not very very close to anything.
In fact a pretty remarkable achievement. SF NNUE_GK is not closely similar to anything, is as close to Lc0 as it is to SF, if not closer, an original engine which now is functioning better without the "slowmover" parameter, and beating anything in its way in head-to-head encounters.

Games Completed = 100 of 100 (Avg game length = 58.256 sec)
Settings = RR/128MB/15000ms+250ms/M 1500cp for 3 moves, D 70 moves/EPD:C:\LittleBlitzer\3M_08_10.epd(395)
Time = 1488 sec elapsed, 0 sec remaining
1. Stockfish 110720 64 BMI2 41.5/100 13-30-57 (L: m=0 t=0 i=0 a=30) (D: r=7 i=7 f=0 s=0 a=43) (tpm=464.3 d=20.01 nps=1678031)
2. SF NNUE halfkp-256 090720 x64 bmi2 58.5/100 30-13-57 (L: m=1 t=0 i=0 a=12) (D: r=7 i=7 f=0 s=0 a=43) (tpm=457.9 d=18.52 nps=947839)

It seems to beat heavily SF_dev.
I’ld be interested to analyse the pgns if you can work out a way to send them?
It was played in LittleBlitzer instead of Cutechess-Cli for a better control of engines' speed, depth, causes of losses, draws etc. I don't keep the PGNs in LittleBlitzer, they are cryptic anyway there and not in the usual PGN standard.
That’s a shame. With a bunch of pgns I’ld probably have been able to work out what/why/how was happening. Otherwise, it’s just all magic, some black box network.
Ok, just played a match in Cutechess-Cli, 100 games at 15'' + 0.25'' between SF NNUE GK and SF_dev

The result is here:
15'' + 0.25''
Score of SF_NNUE vs SF_dev: 31 - 17 - 52 [0.570] 100
... SF_NNUE playing White: 26 - 0 - 24 [0.760] 50
... SF_NNUE playing Black: 5 - 17 - 28 [0.380] 50
... White vs Black: 43 - 5 - 52 [0.690] 100
Elo difference: 49.0 +/- 47.4, LOS: 97.8 %, DrawRatio: 52.0 %
Finished match

The PGN is here:
http://s000.tinyupload.com/?file_id=838 ... 7353082500
Ok, thanks, first assessment is a bit disappointing. There's no winning game by either side game that's over before move 80, median game length is 111 ply for SF_NNUE and 118 ply for SF_dev, but actually it's longer than that because you're starting from FENs, and cutechess is counting from the FEN position, maybe those FENs are 16 ply in? So median game length is 125-135 ply. Took at look at the five shortest NNUE wins, they last for at least 82 ply and all were endings. No fireworks, nothing exciting.
Critically, I'm getting the initial impression the NNUE is not doing anything AZ-ish to Stockfish, and that implies simply the technique squashes Stockfish into a slightly (or markedly, who knows) more effective version of the same thing. Gradual grind out into the ending. Nothing new or superior, just more of the same. Disappointing, I was hoping the NN was going to have found some "new" knowledge, but, initially, looks like not.

Thanks Chris for the assessment. The openings are 6-pliers.
That's median plycount of 117 NN,125 SF and no wins at all before 88 ply. Of course, only 100 games. But.