Stockfish NNUE Doctoring

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Stockfish NNUE Doctoring

Post by Ferdy »

Tried to add nnue to Deuterium using probing code from Daniel/CFish and sf evaluation file nn-62ef826d1a6d

Result at TC 60s+1s is a total annihilation. The 2820 in "Deuterium v2019.2.37.73 2820" is the 60s fgrl rating.

Code: Select all

Score of Deuterium v2021.1.38.27 nnue vs Deuterium v2019.2.37.73 2820: 154 - 7 - 39 [0.868]
...      Deuterium v2021.1.38.27 nnue playing White: 79 - 4 - 17  [0.875] 100
...      Deuterium v2021.1.38.27 nnue playing Black: 75 - 3 - 22  [0.860] 100
...      White vs Black: 82 - 79 - 39  [0.507] 200
Elo difference: 326.4 +/- 55.0, LOS: 100.0 %, DrawRatio: 19.5 %
And result from a small sample against other engines as gauntlet.

Code: Select all

Rank Name                          Elo     +/-   Games    Wins  Losses   Draws   Points   Score    Draw 
   0 Deuterium v2021.1.38.27 nnue   73      53      96      37      17      42     58.0   60.4%   43.8% 
   1 Ethereal 12.75 3271            44      96      24       7       4      13     13.5   56.3%   54.2% 
   2 Pedone 3.0 3140               -44      96      24       4       7      13     10.5   43.8%   54.2% 
   3 Wasp 4.50 3033               -120     101      24       2      10      12      8.0   33.3%   50.0% 
   4 Vajolet2 2.8.0 2930          -191     159      24       4      16       4      6.0   25.0%   16.7% 

Search parameters maybe off by now, so I tried to optimize couple of search parameters with Lakas for a low games per budget using oneplusone optimizer that reevaluates promising parameter values as tuning progresses.

Basic tuning conditions:

Code: Select all

2021-01-26 19:58:18,108 | INFO  | total budget: 100
2021-01-26 19:58:18,110 | INFO  | games_per_budget: 32
2021-01-26 19:58:18,110 | INFO  | tuning match move control: base_time_sec: 5, inc_time_sec: 0.1, depth=1000
2021-01-26 19:58:18,112 | INFO  | parameter dimension: 4
2021-01-26 19:58:18,112 | INFO  | optimizer: oneplusone, noise_handling: ('optimistic', 0.01), mutation: gaussian, crossover: False
Parameters to be optimized:

Code: Select all

init param: {'FutilityMargin': 60, 'LmrFactor': 100, 'QsearchFutilityMargin': 100, 'WinEvalDepth': 7}
After 62 budgets it likes the following param values and keeps on reevaluating it for as early as budget 35 then 36 to 50, 52 to 59 then 61 to 62.

Code: Select all

2021-01-26 20:59:05,000 | INFO  | budget: 35
2021-01-26 20:59:05,000 | INFO  | recommended param: {'FutilityMargin': 97, 'LmrFactor': 98, 'QsearchFutilityMargin': 175, 'WinEvalDepth': 6}
2021-01-26 20:59:05,000 | INFO  | best param: {'FutilityMargin': 96, 'LmrFactor': 100, 'QsearchFutilityMargin': 184, 'WinEvalDepth': 5}
2021-01-26 20:59:05,000 | INFO  | init param: {'FutilityMargin': 60, 'LmrFactor': 100, 'QsearchFutilityMargin': 100, 'WinEvalDepth': 7}
2021-01-26 20:59:05,001 | INFO  | recommended vs init
2021-01-26 21:00:46,655 | INFO  | actual result: 0.56250 @32 games, minimized result: 0.43750, point of view: recommended
2021-01-26 21:00:46,655 | INFO  | best loss: 0.375

...

Code: Select all

2021-01-26 21:45:44,929 | INFO  | budget: 62
2021-01-26 21:45:44,929 | INFO  | recommended param: {'FutilityMargin': 97, 'LmrFactor': 98, 'QsearchFutilityMargin': 175, 'WinEvalDepth': 6}
2021-01-26 21:45:44,929 | INFO  | best param: {'FutilityMargin': 97, 'LmrFactor': 98, 'QsearchFutilityMargin': 175, 'WinEvalDepth': 6}
2021-01-26 21:45:44,929 | INFO  | init param: {'FutilityMargin': 60, 'LmrFactor': 100, 'QsearchFutilityMargin': 100, 'WinEvalDepth': 7}
2021-01-26 21:45:44,930 | INFO  | recommended vs init
2021-01-26 21:47:21,610 | INFO  | actual result: 0.54688 @32 games, minimized result: 0.45312, point of view: recommended
2021-01-26 21:47:21,611 | INFO  | best loss: 0.375
Played a 1k games at TC 5s+100ms as verification using the recommended or tuned against the default or init values. The recommended or tuned won the match by a decent margin.

Code: Select all

Score of tuned vs default: 220 - 183 - 597  [0.518] 1000
...      tuned playing White: 135 - 70 - 295  [0.565] 500
...      tuned playing Black: 85 - 113 - 302  [0.472] 500
...      White vs Black: 248 - 155 - 597  [0.546] 1000
Elo difference: 12.9 +/- 13.7, LOS: 96.7 %, DrawRatio: 59.7 %
Now let's see how your engine gets healthier :)