Tuning Search Parameters

jtwright · Post by **jtwright** » Thu Dec 16, 2021 11:44 pm

As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:

- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.

To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:

- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.

For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.

jdart · Post by **jdart** » Fri Dec 17, 2021 3:56 am

There are some automated methods for this, for example SPSA: see https://link.springer.com/content/pdf/1 ... 6888-8.pdf, or CLOP (https://www.chessprogramming.org/CLOP) or blackbox methods such as the NOMAD optimizer (https://sourceforge.net/projects/nomad-bb-opt/).

I have had relatively poor results from these however. CLOP for example can be very slow to converge.

Testing with large numbers of games is generally necessary to find changes that are statistically significant. This requires a lot of computing resources.

Sven · Post by **Sven** » Sat Dec 18, 2021 6:06 pm

One problem when trying automatic tuning of search parameters is that ultra-fast games help less for that purpose than they do for eval tuning. In general you need a deeper search to ensure that the search feature using the parameters you tune kicks in sufficiently often. For instance you only want to perform some kind of pruning within the last three plies above full-width horizon but that makes your search much weaker at low depth and therefore requires a higher depth to be helpful. That may lead to a drastical increase of tuning duration.

jdart · Post by **jdart** » Sat Dec 18, 2021 6:19 pm

Another problem is that you are trying to optimize a function (results from varying a search parameter) that is usually nonlinear and often has a small gradient (i.e. varying the number within a range doesn't give you a big difference in results). Plus you frequently need to tune several parameters together. So in mathematical terms, you are trying to find the minimum on a very shallow uneven surface and on top the objective you are measuring is noisy and expensive to compute. That is about the worst optimization problem you can have.

Chessnut1071 · Post by **Chessnut1071** » Sat Dec 18, 2021 6:47 pm

jtwright wrote: ↑Thu Dec 16, 2021 11:44 pm As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:

- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.

To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:

- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.

For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.

If your objective is accuracy, I use 300 puzzles with known mates and minimize the time of solution with an optimized evaluation function which includes approximately 20 variables.

If your objective is maximizing ELO, you need massive amounts of games with some type of objective criteria like % of games won, or some other similar criteria,

The solution I use is a modified Hooke-Jeeves nonlinear optimization algorithm , which is very short and attaches to any model that can pass back an objective criteria. On checkmate puzzles it reduces time of solution by 96.75% over just alpha/beta and history by ply. It can be applied to ELO, but, I have no data on performance.

jtwright · Post by **jtwright** » Sat Dec 18, 2021 6:51 pm

Agreed. Doing such automated tuning seemed to not be particularly effective, basically for all the reasons you all have mentioned. On the other hand, if you look at Stockfish source for instance you'll see all sorts of strangely specific constants in their code that feel like they weren't picked by hand (but I could be wrong)

So, shifting away from automated tuning, I guess I was curious if people had methods of their own for having ideas of which changes to search to test. One straightforward idea might be as simple as "I don't know if the margin for my RFP should be higher or lower." Make two versions of the engine, one with the margin a bit higher and one a bit lower (say, a quarter-pawn each way, for example), and play them until you're convinced one is better than the other. This runs into some similar problems to what you've mentioned earlier: 1. Time control matters, because it might only help at deep depths or shallow ones, 2. these parameters don't exist in isolation. You might come to the right conclusion for *just* modifying RFP margin, but it might be harmful once you try to edit or add something else. Still though, it's probably a start on things.

I'm curious what the design process is like for really experienced chess programmers and people who have to work on engines that are already very fine-tuned, so finding improvements is hard.

jdart · Post by **jdart** » Sun Dec 19, 2021 12:55 am

jtwright wrote: ↑Sat Dec 18, 2021 6:51 pm Agreed. Doing such automated tuning seemed to not be particularly effective, basically for all the reasons you all have mentioned. On the other hand, if you look at Stockfish source for instance you'll see all sorts of strangely specific constants in their code that feel like they weren't picked by hand (but I could be wrong)

So, shifting away from automated tuning, I guess I was curious if people had methods of their own for having ideas of which changes to search to test. One straightforward idea might be as simple as "I don't know if the margin for my RFP should be higher or lower." Make two versions of the engine, one with the margin a bit higher and one a bit lower (say, a quarter-pawn each way, for example), and play them until you're convinced one is better than the other. This runs into some similar problems to what you've mentioned earlier: 1. Time control matters, because it might only help at deep depths or shallow ones, 2. these parameters don't exist in isolation. You might come to the right conclusion for *just* modifying RFP margin, but it might be harmful once you try to edit or add something else. Still though, it's probably a start on things.

I'm curious what the design process is like for really experienced chess programmers and people who have to work on engines that are already very fine-tuned, so finding improvements is hard.

You can turn the knob a little, measure, turn it some more, measure, etc. but it is time consuming. I have tried a number of automated tuners but not for a while. For noisy optimization with expensive functions, RBF methods are one of the more effective ones: see https://pysot.readthedocs.io/en/latest/options.html for some Python versions. I used DYCORS a little bit at one point.

Ferdy · Post by **Ferdy** » Sun Dec 19, 2021 5:33 am

jtwright wrote: ↑Thu Dec 16, 2021 11:44 pm As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:

- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.

To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:

- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.

For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.

I have 2 python packages to optimize engine parameters, can be search, evaluation and others.
1. Optuna Game Parameter Tuner built on top of Optuna hyperparameter framework.
2. Lakas on top of Nevergrad framework.

I recently added an Elo as objective value in Optuna tuner. Here is a sample log from one of the benchmarks I did for this optimizer. The parameters being optimized are from search and evaluation.

Code: Select all

starting trial: 20 ...
deterministic function: False
suggested param for test engine: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60}
param for base engine          : {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50}
init param: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50}
init objective value: 0.0
study best param: {'FutilityMargin': 33, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 128, 'LMRFactor': 216, 'LateMovePruningMargin': 200, 'MobilityWeight': 58}
study best objective value: Elo 177.0
study best trial number: 17
games_to_play: 16 ...
result: {intermediate: Elo 191.0, G/W/D/L: 16/9/6/1}
result: {average: Elo 191.0, G/W/D/L: 16/9/6/1}
games_to_play: 16 ...
result: {intermediate: Elo 112.0, G/W/D/L: 16/8/5/3}
result: {average: Elo 151.5, G/W/D/L: 32/17/11/4}
Actual match result: Elo 150.0, CI: [+44.2, +255.4], CL: 95%, G/W/D/L: 32/17/11/4, POV: optimizer
Elo Diff: +149.8, ErrMargin: +/- 105.6, CI: [+44.2, +255.4], LOS: 99.8%, DrawRatio: 34.38%
test param format for match manager: option.FutilityMargin=25 option.FutilityMoveCountFactor=804 option.KingAttackWeightOp=98 option.LMRFactor=199 option.LateMovePruningMargin=144 option.MobilityWeight=60
result sent to optimizer: 150.0
elapse: 0h:2m:32s
Trial 20 finished with value: 150.0 and parameters: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60}. Best is trial 17 with value: 177.0.
Saving plots ...

After matching the default param used by base engine against the param suggested by the optimizer used by test engine, the Elo difference is then sent to the optimizer. In this trial 20 the parameter suggested by the optimizer won by a margin of 150 Elo. But the best parameter so far was achieved as early as trial 17 with a 177 Elo margin.

It generates some plots when the --plot flag is activated. One interesting plot that I usually check is the hyperparameter importances like below. Here we can see which parameter contributes more for a better objective value. This gives us some ideas on where to focus the optimization or review the code of those parameters that do not contribute much to the strength - there could be a bug in there or probably its effect is now redundant. This sample plot is only after 20 trials. Anything can happen as we increase the trials. So ranking may still vary.

picture host ru

In this sample benchmark the base engine is using unoptimized parameter of my engine. The values are:

Code: Select all

option.FutilityMargin=25 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=50 option.LMRFactor=90 option.LateMovePruningMargin=50 option.MobilityWeight=50

Optuna tuner gives trial 17 as best trial with the following parameters.

Code: Select all

option.FutilityMargin=33 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=128 option.LMRFactor=216 option.LateMovePruningMargin=200 option.MobilityWeight=58

In this benchmark the match per trial only consists of 32 (more is better) games only at TC 15s+50ms.
After every 10 trials it logs some trial info.

Code: Select all

 trial  value  params_FutilityMargin  params_FutilityMoveCountFactor  params_KingAttackWeightOp  params_LMRFactor  params_LateMovePruningMargin  params_MobilityWeight     state  value_mean  trial_cnt
    17  177.0                     33                             800                        128               216                           200                     58  COMPLETE       177.0          1
     3  163.0                     39                             842                         82               218                           212                     76  COMPLETE       163.0          1
    14  150.0                     43                             844                        122               193                           100                    128  COMPLETE       150.0          1
    20  150.0                     25                             804                         98               199                           144                     60  COMPLETE       150.0          1
    18  137.0                     38                             878                        140               193                           210                     52  COMPLETE       137.0          1
    13  112.0                     46                             820                        138               216                            78                     86  COMPLETE       112.0          1
     1  100.0                     66                             910                        114               200                            50                     68  COMPLETE       100.0          1
     7  100.0                     99                             824                        184               165                           198                    144  COMPLETE       100.0          1
     2   89.0                     75                            1130                         70               165                           230                     80  COMPLETE        89.0          1
    10   89.0                     25                             932                         82               208                           188                     84  COMPLETE        89.0          1
     8   77.0                     69                             808                         80               161                           204                     88  COMPLETE        77.0          1
    11   77.0                     65                             926                        110               181                            50                     82  COMPLETE        77.0          1
     4   66.0                     87                             910                        114               213                           214                    100  COMPLETE        66.0          1
    12   66.0                     46                             818                        122               205                           248                    126  COMPLETE        66.0          1
    15   66.0                     40                             824                        118               190                            88                    174  COMPLETE        66.0          1
     5   55.0                     38                             948                         50               123                           210                     52  COMPLETE        55.0          1
     6   55.0                     70                            1042                         64               140                            56                    184  COMPLETE        55.0          1
     9   55.0                     46                            1142                        198               205                           122                    140  COMPLETE        55.0          1
    19   55.0                     48                             876                        176               160                           166                     98  COMPLETE        55.0          1
    16   11.0                     56                             834                         74               212                           144                     88  COMPLETE        11.0          1
     0    0.0                     25                             800                         50                90                            50                     50  COMPLETE         0.0          1

The numbers below column value are the Elo numbers achieved by the parameter suggested by the optimizer against the default values used by the base engine. Practically you can try to use the parameters of those top trials.

It is necessary to run a verification test of the best param, now with more games, I use TC 15s+50ms.
The param suggested by the optimizer won.

Code: Select all

Score of trial_17 vs def: 64 - 17 - 49  [0.681] 130
...      trial_17 playing White: 36 - 8 - 21  [0.715] 65
...      trial_17 playing Black: 28 - 9 - 28  [0.646] 65
...      White vs Black: 45 - 36 - 49  [0.535] 130
Elo difference: 131.6 +/- 48.6, LOS: 100.0 %, DrawRatio: 37.7 %

That is only a benchmark for this tuner to see if tuner is working.

I agree that guessing is indeed inefficient, 1 or 2 parameter manual tuning can be tolerable but no longer fun if there are more.

This is my command line for this benchmark. Only applies for Optuna Game Parameter Tuner.

Code: Select all

set studyname=deu_tpe_search_and_eval

python tuner.py --study-name %studyname% ^
--noisy-result ^
--elo-objective ^
--sampler name=tpe multivariate=true seed=100 constant_liar=true ^
--threshold-pruner result=-100 ^
--games-per-trial 32 ^
--trials 200 ^
--concurrency 6 ^
--base-time-sec 15 ^
--inc-time-sec 0.05 ^
--draw-movenumber 20 --draw-movecount 6 --draw-score 0 ^
--resign-movecount 3 --resign-score 200 ^
--engine F:\Project\Deuterium2021-1\deuterium\\deuterium_v2021.1.38.118.exe ^
--input-param "{'LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}, 'FutilityMoveCountFactor': {'default':800, 'min':800, 'max':1200, 'step':2}, 'FutilityMargin': {'default':25, 'min':25, 'max':100, 'step':1}, 'LateMovePruningMargin': {'default':50, 'min':50, 'max':250, 'step':2}, 'KingAttackWeightOp': {'default':50, 'min':50, 'max':200, 'step':2}, 'MobilityWeight': {'default':50, 'min':50, 'max':200, 'step':2}}" ^
--opening-file ./start_opening/ogpt_chess_startpos.epd ^
--opening-format epd ^
--pgn-out %studyname%_games.pgn ^
--match-manager cutechess --plot

In the parameter to be optimized format,

Code: Select all

LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}

the default value of 90 is the one that will be used by the base engine. While the parameter suggested by the optimizer will be in the range [90, 220].

If you try to optimize your param be sure to use the best value or default value of your program under the "default" key. The optimizer will try to beat those default param values.

Studies can be very expensive, especially for search parameters. If during trial we use a TC of 15s+50ms, the best parameter it comes up with may not be good at higher TC. This can also be true for evaluation parameters especially those involves in king safeties.

This tuner has interrupt/resume capability, if we get bored or computer will be used for other task you can interrupt it. Then use the same command line to resume from previous state.

There is also a nice framework called nni. I have tried it, not yet shared in github. There are some kind of optimizers that are interesting like BOHB. The interface is very good.

picture host ru

There is also a parameter tuner by kiudee at https://github.com/kiudee/chess-tuning-tools. This is used to tune the parameters of Lc0. I tried this and it is good.

algerbrex · Post by **algerbrex** » Sun Dec 19, 2021 6:05 am

Ferdy wrote: ↑Sun Dec 19, 2021 5:33 am
jtwright wrote: ↑Thu Dec 16, 2021 11:44 pm As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:

- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.

To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:

- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.

For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.
I have 2 python packages to optimize engine parameters, can be search, evaluation and others.
1. Optuna Game Parameter Tuner built on top of Optuna hyperparameter framework.
2. Lakas on top of Nevergrad framework.

I recently added an Elo as objective value in Optuna tuner. Here is a sample log from one of the benchmarks I did for this optimizer. The parameters being optimized are from search and evaluation.
Code: Select all
starting trial: 20 ...
deterministic function: False
suggested param for test engine: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60}
param for base engine          : {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50}
init param: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50}
init objective value: 0.0
study best param: {'FutilityMargin': 33, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 128, 'LMRFactor': 216, 'LateMovePruningMargin': 200, 'MobilityWeight': 58}
study best objective value: Elo 177.0
study best trial number: 17
games_to_play: 16 ...
result: {intermediate: Elo 191.0, G/W/D/L: 16/9/6/1}
result: {average: Elo 191.0, G/W/D/L: 16/9/6/1}
games_to_play: 16 ...
result: {intermediate: Elo 112.0, G/W/D/L: 16/8/5/3}
result: {average: Elo 151.5, G/W/D/L: 32/17/11/4}
Actual match result: Elo 150.0, CI: [+44.2, +255.4], CL: 95%, G/W/D/L: 32/17/11/4, POV: optimizer
Elo Diff: +149.8, ErrMargin: +/- 105.6, CI: [+44.2, +255.4], LOS: 99.8%, DrawRatio: 34.38%
test param format for match manager: option.FutilityMargin=25 option.FutilityMoveCountFactor=804 option.KingAttackWeightOp=98 option.LMRFactor=199 option.LateMovePruningMargin=144 option.MobilityWeight=60
result sent to optimizer: 150.0
elapse: 0h:2m:32s
Trial 20 finished with value: 150.0 and parameters: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60}. Best is trial 17 with value: 177.0.
Saving plots ...
After matching the default param used by base engine against the param suggested by the optimizer used by test engine, the Elo difference is then sent to the optimizer. In this trial 20 the parameter suggested by the optimizer won by a margin of 150 Elo. But the best parameter so far was achieved as early as trial 17 with a 177 Elo margin.

It generates some plots when the --plot flag is activated. One interesting plot that I usually check is the hyperparameter importances like below. Here we can see which parameter contributes more for a better objective value. This gives us some ideas on where to focus the optimization or review the code of those parameters that do not contribute much to the strength - there could be a bug in there or probably its effect is now redundant. This sample plot is only after 20 trials. Anything can happen as we increase the trials. So ranking may still vary.

picture host ru

In this sample benchmark the base engine is using unoptimized parameter of my engine. The values are:
Code: Select all
option.FutilityMargin=25 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=50 option.LMRFactor=90 option.LateMovePruningMargin=50 option.MobilityWeight=50
Optuna tuner gives trial 17 as best trial with the following parameters.
Code: Select all
option.FutilityMargin=33 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=128 option.LMRFactor=216 option.LateMovePruningMargin=200 option.MobilityWeight=58
In this benchmark the match per trial only consists of 32 (more is better) games only at TC 15s+50ms.
After every 10 trials it logs some trial info.
Code: Select all
 trial  value  params_FutilityMargin  params_FutilityMoveCountFactor  params_KingAttackWeightOp  params_LMRFactor  params_LateMovePruningMargin  params_MobilityWeight     state  value_mean  trial_cnt
    17  177.0                     33                             800                        128               216                           200                     58  COMPLETE       177.0          1
     3  163.0                     39                             842                         82               218                           212                     76  COMPLETE       163.0          1
    14  150.0                     43                             844                        122               193                           100                    128  COMPLETE       150.0          1
    20  150.0                     25                             804                         98               199                           144                     60  COMPLETE       150.0          1
    18  137.0                     38                             878                        140               193                           210                     52  COMPLETE       137.0          1
    13  112.0                     46                             820                        138               216                            78                     86  COMPLETE       112.0          1
     1  100.0                     66                             910                        114               200                            50                     68  COMPLETE       100.0          1
     7  100.0                     99                             824                        184               165                           198                    144  COMPLETE       100.0          1
     2   89.0                     75                            1130                         70               165                           230                     80  COMPLETE        89.0          1
    10   89.0                     25                             932                         82               208                           188                     84  COMPLETE        89.0          1
     8   77.0                     69                             808                         80               161                           204                     88  COMPLETE        77.0          1
    11   77.0                     65                             926                        110               181                            50                     82  COMPLETE        77.0          1
     4   66.0                     87                             910                        114               213                           214                    100  COMPLETE        66.0          1
    12   66.0                     46                             818                        122               205                           248                    126  COMPLETE        66.0          1
    15   66.0                     40                             824                        118               190                            88                    174  COMPLETE        66.0          1
     5   55.0                     38                             948                         50               123                           210                     52  COMPLETE        55.0          1
     6   55.0                     70                            1042                         64               140                            56                    184  COMPLETE        55.0          1
     9   55.0                     46                            1142                        198               205                           122                    140  COMPLETE        55.0          1
    19   55.0                     48                             876                        176               160                           166                     98  COMPLETE        55.0          1
    16   11.0                     56                             834                         74               212                           144                     88  COMPLETE        11.0          1
     0    0.0                     25                             800                         50                90                            50                     50  COMPLETE         0.0          1
The numbers below column value are the Elo numbers achieved by the parameter suggested by the optimizer against the default values used by the base engine. Practically you can try to use the parameters of those top trials.

It is necessary to run a verification test of the best param, now with more games, I use TC 15s+50ms.
The param suggested by the optimizer won.
Code: Select all
Score of trial_17 vs def: 64 - 17 - 49  [0.681] 130
...      trial_17 playing White: 36 - 8 - 21  [0.715] 65
...      trial_17 playing Black: 28 - 9 - 28  [0.646] 65
...      White vs Black: 45 - 36 - 49  [0.535] 130
Elo difference: 131.6 +/- 48.6, LOS: 100.0 %, DrawRatio: 37.7 %
That is only a benchmark for this tuner to see if tuner is working.

I agree that guessing is indeed inefficient, 1 or 2 parameter manual tuning can be tolerable but no longer fun if there are more.

This is my command line for this benchmark. Only applies for Optuna Game Parameter Tuner.
Code: Select all
set studyname=deu_tpe_search_and_eval

python tuner.py --study-name %studyname% ^
--noisy-result ^
--elo-objective ^
--sampler name=tpe multivariate=true seed=100 constant_liar=true ^
--threshold-pruner result=-100 ^
--games-per-trial 32 ^
--trials 200 ^
--concurrency 6 ^
--base-time-sec 15 ^
--inc-time-sec 0.05 ^
--draw-movenumber 20 --draw-movecount 6 --draw-score 0 ^
--resign-movecount 3 --resign-score 200 ^
--engine F:\Project\Deuterium2021-1\deuterium\\deuterium_v2021.1.38.118.exe ^
--input-param "{'LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}, 'FutilityMoveCountFactor': {'default':800, 'min':800, 'max':1200, 'step':2}, 'FutilityMargin': {'default':25, 'min':25, 'max':100, 'step':1}, 'LateMovePruningMargin': {'default':50, 'min':50, 'max':250, 'step':2}, 'KingAttackWeightOp': {'default':50, 'min':50, 'max':200, 'step':2}, 'MobilityWeight': {'default':50, 'min':50, 'max':200, 'step':2}}" ^
--opening-file ./start_opening/ogpt_chess_startpos.epd ^
--opening-format epd ^
--pgn-out %studyname%_games.pgn ^
--match-manager cutechess --plot
In the parameter to be optimized format,
Code: Select all
LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}
the default value of 90 is the one that will be used by the base engine. While the parameter suggested by the optimizer will be in the range [90, 220].

If you try to optimize your param be sure to use the best value or default value of your program under the "default" key. The optimizer will try to beat those default param values.

Studies can be very expensive, especially for search parameters. If during trial we use a TC of 15s+50ms, the best parameter it comes up with may not be good at higher TC. This can also be true for evaluation parameters especially those involves in king safeties.

This tuner has interrupt/resume capability, if we get bored or computer will be used for other task you can interrupt it. Then use the same command line to resume from previous state.

There is also a nice framework called nni. I have tried it, not yet shared in github. There are some kind of optimizers that are interesting like BOHB. The interface is very good.

picture host ru

There is also a parameter tuner by kiudee at https://github.com/kiudee/chess-tuning-tools. This is used to tune the parameters of Lc0. I tried this and it is good.

I'm not the OP, but thanks for the very detailed post. I tried using Optuna a couple of months ago, but I think I'll try using it again to see if it can help me tweak some margins in Blunder. I'm sure there's at least 50 Elo's worth still hiding in tuning all of the various margins...

Tuning Search Parameters

Tuning Search Parameters

Re: Tuning Search Parameters

Re: Tuning Search Parameters

Re: Tuning Search Parameters

Re: Tuning Search Parameters

Re: Tuning Search Parameters

Re: Tuning Search Parameters

Re: Tuning Search Parameters

Re: Tuning Search Parameters