Tuning Search Parameters
Moderator: Ras
-
- Posts: 48
- Joined: Wed Sep 22, 2021 9:20 pm
- Full name: Jeremy Wright
Tuning Search Parameters
As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:
- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.
To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:
- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.
For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.
- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.
To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:
- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.
For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.
Mantissa: https://github.com/jtheardw/mantissa
-
- Posts: 4410
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Tuning Search Parameters
There are some automated methods for this, for example SPSA: see https://link.springer.com/content/pdf/1 ... 6888-8.pdf, or CLOP (https://www.chessprogramming.org/CLOP) or blackbox methods such as the NOMAD optimizer (https://sourceforge.net/projects/nomad-bb-opt/).
I have had relatively poor results from these however. CLOP for example can be very slow to converge.
Testing with large numbers of games is generally necessary to find changes that are statistically significant. This requires a lot of computing resources.
I have had relatively poor results from these however. CLOP for example can be very slow to converge.
Testing with large numbers of games is generally necessary to find changes that are statistically significant. This requires a lot of computing resources.
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Tuning Search Parameters
One problem when trying automatic tuning of search parameters is that ultra-fast games help less for that purpose than they do for eval tuning. In general you need a deeper search to ensure that the search feature using the parameters you tune kicks in sufficiently often. For instance you only want to perform some kind of pruning within the last three plies above full-width horizon but that makes your search much weaker at low depth and therefore requires a higher depth to be helpful. That may lead to a drastical increase of tuning duration.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
-
- Posts: 4410
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Tuning Search Parameters
Another problem is that you are trying to optimize a function (results from varying a search parameter) that is usually nonlinear and often has a small gradient (i.e. varying the number within a range doesn't give you a big difference in results). Plus you frequently need to tune several parameters together. So in mathematical terms, you are trying to find the minimum on a very shallow uneven surface and on top the objective you are measuring is noisy and expensive to compute. That is about the worst optimization problem you can have.
-
- Posts: 313
- Joined: Tue Aug 03, 2021 2:41 pm
- Full name: Bill Beame
Re: Tuning Search Parameters
If your objective is accuracy, I use 300 puzzles with known mates and minimize the time of solution with an optimized evaluation function which includes approximately 20 variables.jtwright wrote: ↑Thu Dec 16, 2021 11:44 pm As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:
- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.
To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:
- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.
For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.
If your objective is maximizing ELO, you need massive amounts of games with some type of objective criteria like % of games won, or some other similar criteria,
The solution I use is a modified Hooke-Jeeves nonlinear optimization algorithm , which is very short and attaches to any model that can pass back an objective criteria. On checkmate puzzles it reduces time of solution by 96.75% over just alpha/beta and history by ply. It can be applied to ELO, but, I have no data on performance.
-
- Posts: 48
- Joined: Wed Sep 22, 2021 9:20 pm
- Full name: Jeremy Wright
Re: Tuning Search Parameters
Agreed. Doing such automated tuning seemed to not be particularly effective, basically for all the reasons you all have mentioned. On the other hand, if you look at Stockfish source for instance you'll see all sorts of strangely specific constants in their code that feel like they weren't picked by hand (but I could be wrong)
So, shifting away from automated tuning, I guess I was curious if people had methods of their own for having ideas of which changes to search to test. One straightforward idea might be as simple as "I don't know if the margin for my RFP should be higher or lower." Make two versions of the engine, one with the margin a bit higher and one a bit lower (say, a quarter-pawn each way, for example), and play them until you're convinced one is better than the other. This runs into some similar problems to what you've mentioned earlier: 1. Time control matters, because it might only help at deep depths or shallow ones, 2. these parameters don't exist in isolation. You might come to the right conclusion for *just* modifying RFP margin, but it might be harmful once you try to edit or add something else. Still though, it's probably a start on things.
I'm curious what the design process is like for really experienced chess programmers and people who have to work on engines that are already very fine-tuned, so finding improvements is hard.
So, shifting away from automated tuning, I guess I was curious if people had methods of their own for having ideas of which changes to search to test. One straightforward idea might be as simple as "I don't know if the margin for my RFP should be higher or lower." Make two versions of the engine, one with the margin a bit higher and one a bit lower (say, a quarter-pawn each way, for example), and play them until you're convinced one is better than the other. This runs into some similar problems to what you've mentioned earlier: 1. Time control matters, because it might only help at deep depths or shallow ones, 2. these parameters don't exist in isolation. You might come to the right conclusion for *just* modifying RFP margin, but it might be harmful once you try to edit or add something else. Still though, it's probably a start on things.
I'm curious what the design process is like for really experienced chess programmers and people who have to work on engines that are already very fine-tuned, so finding improvements is hard.
Mantissa: https://github.com/jtheardw/mantissa
-
- Posts: 4410
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Tuning Search Parameters
You can turn the knob a little, measure, turn it some more, measure, etc. but it is time consuming. I have tried a number of automated tuners but not for a while. For noisy optimization with expensive functions, RBF methods are one of the more effective ones: see https://pysot.readthedocs.io/en/latest/options.html for some Python versions. I used DYCORS a little bit at one point.jtwright wrote: ↑Sat Dec 18, 2021 6:51 pm Agreed. Doing such automated tuning seemed to not be particularly effective, basically for all the reasons you all have mentioned. On the other hand, if you look at Stockfish source for instance you'll see all sorts of strangely specific constants in their code that feel like they weren't picked by hand (but I could be wrong)
So, shifting away from automated tuning, I guess I was curious if people had methods of their own for having ideas of which changes to search to test. One straightforward idea might be as simple as "I don't know if the margin for my RFP should be higher or lower." Make two versions of the engine, one with the margin a bit higher and one a bit lower (say, a quarter-pawn each way, for example), and play them until you're convinced one is better than the other. This runs into some similar problems to what you've mentioned earlier: 1. Time control matters, because it might only help at deep depths or shallow ones, 2. these parameters don't exist in isolation. You might come to the right conclusion for *just* modifying RFP margin, but it might be harmful once you try to edit or add something else. Still though, it's probably a start on things.
I'm curious what the design process is like for really experienced chess programmers and people who have to work on engines that are already very fine-tuned, so finding improvements is hard.
-
- Posts: 4851
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: Tuning Search Parameters
I have 2 python packages to optimize engine parameters, can be search, evaluation and others.jtwright wrote: ↑Thu Dec 16, 2021 11:44 pm As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:
- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.
To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:
- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.
For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.
1. Optuna Game Parameter Tuner built on top of Optuna hyperparameter framework.
2. Lakas on top of Nevergrad framework.
I recently added an Elo as objective value in Optuna tuner. Here is a sample log from one of the benchmarks I did for this optimizer. The parameters being optimized are from search and evaluation.
Code: Select all
starting trial: 20 ...
deterministic function: False
suggested param for test engine: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60}
param for base engine : {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50}
init param: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50}
init objective value: 0.0
study best param: {'FutilityMargin': 33, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 128, 'LMRFactor': 216, 'LateMovePruningMargin': 200, 'MobilityWeight': 58}
study best objective value: Elo 177.0
study best trial number: 17
games_to_play: 16 ...
result: {intermediate: Elo 191.0, G/W/D/L: 16/9/6/1}
result: {average: Elo 191.0, G/W/D/L: 16/9/6/1}
games_to_play: 16 ...
result: {intermediate: Elo 112.0, G/W/D/L: 16/8/5/3}
result: {average: Elo 151.5, G/W/D/L: 32/17/11/4}
Actual match result: Elo 150.0, CI: [+44.2, +255.4], CL: 95%, G/W/D/L: 32/17/11/4, POV: optimizer
Elo Diff: +149.8, ErrMargin: +/- 105.6, CI: [+44.2, +255.4], LOS: 99.8%, DrawRatio: 34.38%
test param format for match manager: option.FutilityMargin=25 option.FutilityMoveCountFactor=804 option.KingAttackWeightOp=98 option.LMRFactor=199 option.LateMovePruningMargin=144 option.MobilityWeight=60
result sent to optimizer: 150.0
elapse: 0h:2m:32s
Trial 20 finished with value: 150.0 and parameters: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60}. Best is trial 17 with value: 177.0.
Saving plots ...
It generates some plots when the --plot flag is activated. One interesting plot that I usually check is the hyperparameter importances like below. Here we can see which parameter contributes more for a better objective value. This gives us some ideas on where to focus the optimization or review the code of those parameters that do not contribute much to the strength - there could be a bug in there or probably its effect is now redundant. This sample plot is only after 20 trials. Anything can happen as we increase the trials. So ranking may still vary.

picture host ru
In this sample benchmark the base engine is using unoptimized parameter of my engine. The values are:
Code: Select all
option.FutilityMargin=25 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=50 option.LMRFactor=90 option.LateMovePruningMargin=50 option.MobilityWeight=50
Code: Select all
option.FutilityMargin=33 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=128 option.LMRFactor=216 option.LateMovePruningMargin=200 option.MobilityWeight=58
After every 10 trials it logs some trial info.
Code: Select all
trial value params_FutilityMargin params_FutilityMoveCountFactor params_KingAttackWeightOp params_LMRFactor params_LateMovePruningMargin params_MobilityWeight state value_mean trial_cnt
17 177.0 33 800 128 216 200 58 COMPLETE 177.0 1
3 163.0 39 842 82 218 212 76 COMPLETE 163.0 1
14 150.0 43 844 122 193 100 128 COMPLETE 150.0 1
20 150.0 25 804 98 199 144 60 COMPLETE 150.0 1
18 137.0 38 878 140 193 210 52 COMPLETE 137.0 1
13 112.0 46 820 138 216 78 86 COMPLETE 112.0 1
1 100.0 66 910 114 200 50 68 COMPLETE 100.0 1
7 100.0 99 824 184 165 198 144 COMPLETE 100.0 1
2 89.0 75 1130 70 165 230 80 COMPLETE 89.0 1
10 89.0 25 932 82 208 188 84 COMPLETE 89.0 1
8 77.0 69 808 80 161 204 88 COMPLETE 77.0 1
11 77.0 65 926 110 181 50 82 COMPLETE 77.0 1
4 66.0 87 910 114 213 214 100 COMPLETE 66.0 1
12 66.0 46 818 122 205 248 126 COMPLETE 66.0 1
15 66.0 40 824 118 190 88 174 COMPLETE 66.0 1
5 55.0 38 948 50 123 210 52 COMPLETE 55.0 1
6 55.0 70 1042 64 140 56 184 COMPLETE 55.0 1
9 55.0 46 1142 198 205 122 140 COMPLETE 55.0 1
19 55.0 48 876 176 160 166 98 COMPLETE 55.0 1
16 11.0 56 834 74 212 144 88 COMPLETE 11.0 1
0 0.0 25 800 50 90 50 50 COMPLETE 0.0 1
It is necessary to run a verification test of the best param, now with more games, I use TC 15s+50ms.
The param suggested by the optimizer won.
Code: Select all
Score of trial_17 vs def: 64 - 17 - 49 [0.681] 130
... trial_17 playing White: 36 - 8 - 21 [0.715] 65
... trial_17 playing Black: 28 - 9 - 28 [0.646] 65
... White vs Black: 45 - 36 - 49 [0.535] 130
Elo difference: 131.6 +/- 48.6, LOS: 100.0 %, DrawRatio: 37.7 %
I agree that guessing is indeed inefficient, 1 or 2 parameter manual tuning can be tolerable but no longer fun if there are more.
This is my command line for this benchmark. Only applies for Optuna Game Parameter Tuner.
Code: Select all
set studyname=deu_tpe_search_and_eval
python tuner.py --study-name %studyname% ^
--noisy-result ^
--elo-objective ^
--sampler name=tpe multivariate=true seed=100 constant_liar=true ^
--threshold-pruner result=-100 ^
--games-per-trial 32 ^
--trials 200 ^
--concurrency 6 ^
--base-time-sec 15 ^
--inc-time-sec 0.05 ^
--draw-movenumber 20 --draw-movecount 6 --draw-score 0 ^
--resign-movecount 3 --resign-score 200 ^
--engine F:\Project\Deuterium2021-1\deuterium\\deuterium_v2021.1.38.118.exe ^
--input-param "{'LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}, 'FutilityMoveCountFactor': {'default':800, 'min':800, 'max':1200, 'step':2}, 'FutilityMargin': {'default':25, 'min':25, 'max':100, 'step':1}, 'LateMovePruningMargin': {'default':50, 'min':50, 'max':250, 'step':2}, 'KingAttackWeightOp': {'default':50, 'min':50, 'max':200, 'step':2}, 'MobilityWeight': {'default':50, 'min':50, 'max':200, 'step':2}}" ^
--opening-file ./start_opening/ogpt_chess_startpos.epd ^
--opening-format epd ^
--pgn-out %studyname%_games.pgn ^
--match-manager cutechess --plot
Code: Select all
LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}
If you try to optimize your param be sure to use the best value or default value of your program under the "default" key. The optimizer will try to beat those default param values.
Studies can be very expensive, especially for search parameters. If during trial we use a TC of 15s+50ms, the best parameter it comes up with may not be good at higher TC. This can also be true for evaluation parameters especially those involves in king safeties.
This tuner has interrupt/resume capability, if we get bored or computer will be used for other task you can interrupt it. Then use the same command line to resume from previous state.
There is also a nice framework called nni. I have tried it, not yet shared in github. There are some kind of optimizers that are interesting like BOHB. The interface is very good.

picture host ru
There is also a parameter tuner by kiudee at https://github.com/kiudee/chess-tuning-tools. This is used to tune the parameters of Lc0. I tried this and it is good.
-
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: Tuning Search Parameters
I'm not the OP, but thanks for the very detailed post. I tried using Optuna a couple of months ago, but I think I'll try using it again to see if it can help me tweak some margins in Blunder. I'm sure there's at least 50 Elo's worth still hiding in tuning all of the various margins...Ferdy wrote: ↑Sun Dec 19, 2021 5:33 amI have 2 python packages to optimize engine parameters, can be search, evaluation and others.jtwright wrote: ↑Thu Dec 16, 2021 11:44 pm As an engine gets more mature, growing means less of acquiring new techniques and becomes more honing what you have. For a lot of aspects of search, there are many constants/margins that can be tweaked. Examples:
- The margin on Reverse Futility Pruning
- The reduction factor on LMR
- The amount of moves to look at before engaging in LMP.
- etc. etc.
To date, I've mostly dealt with these values by wild guessing followed by a lot of self play using cutechess or endian, but I feel this is somewhat inefficient. I know that fishtest exists for SF and there are multiple engines on the main OpenBench network, and that can help a lot with checking the effectiveness of a change, but as a solo developer with a laptop and a desktop, I was curious if people knew ways to improve my process. So I really have two questions:
- Are there good principals/methods for informing a change to the constants used in search and its techniques.
- What are the efficient methods of testing a change when made.
For example, I once tried a whole elaborate automated search parameter tuning scheme where I used a ton of ultrafast games between the Mantissas, extracted about 400 blunders, confirmed via SF, Komodo, and Ethereal agreeing, and then tried to make a kind of puzzle suite. It turned out to be a bit of a bust, as it seemed trying to fit to a few hundred positions doesn't necessarily lead to great play in the general case, but it's possible I was simply clumsy in execution.
1. Optuna Game Parameter Tuner built on top of Optuna hyperparameter framework.
2. Lakas on top of Nevergrad framework.
I recently added an Elo as objective value in Optuna tuner. Here is a sample log from one of the benchmarks I did for this optimizer. The parameters being optimized are from search and evaluation.After matching the default param used by base engine against the param suggested by the optimizer used by test engine, the Elo difference is then sent to the optimizer. In this trial 20 the parameter suggested by the optimizer won by a margin of 150 Elo. But the best parameter so far was achieved as early as trial 17 with a 177 Elo margin.Code: Select all
starting trial: 20 ... deterministic function: False suggested param for test engine: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60} param for base engine : {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50} init param: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 50, 'LMRFactor': 90, 'LateMovePruningMargin': 50, 'MobilityWeight': 50} init objective value: 0.0 study best param: {'FutilityMargin': 33, 'FutilityMoveCountFactor': 800, 'KingAttackWeightOp': 128, 'LMRFactor': 216, 'LateMovePruningMargin': 200, 'MobilityWeight': 58} study best objective value: Elo 177.0 study best trial number: 17 games_to_play: 16 ... result: {intermediate: Elo 191.0, G/W/D/L: 16/9/6/1} result: {average: Elo 191.0, G/W/D/L: 16/9/6/1} games_to_play: 16 ... result: {intermediate: Elo 112.0, G/W/D/L: 16/8/5/3} result: {average: Elo 151.5, G/W/D/L: 32/17/11/4} Actual match result: Elo 150.0, CI: [+44.2, +255.4], CL: 95%, G/W/D/L: 32/17/11/4, POV: optimizer Elo Diff: +149.8, ErrMargin: +/- 105.6, CI: [+44.2, +255.4], LOS: 99.8%, DrawRatio: 34.38% test param format for match manager: option.FutilityMargin=25 option.FutilityMoveCountFactor=804 option.KingAttackWeightOp=98 option.LMRFactor=199 option.LateMovePruningMargin=144 option.MobilityWeight=60 result sent to optimizer: 150.0 elapse: 0h:2m:32s Trial 20 finished with value: 150.0 and parameters: {'FutilityMargin': 25, 'FutilityMoveCountFactor': 804, 'KingAttackWeightOp': 98, 'LMRFactor': 199, 'LateMovePruningMargin': 144, 'MobilityWeight': 60}. Best is trial 17 with value: 177.0. Saving plots ...
It generates some plots when the --plot flag is activated. One interesting plot that I usually check is the hyperparameter importances like below. Here we can see which parameter contributes more for a better objective value. This gives us some ideas on where to focus the optimization or review the code of those parameters that do not contribute much to the strength - there could be a bug in there or probably its effect is now redundant. This sample plot is only after 20 trials. Anything can happen as we increase the trials. So ranking may still vary.
picture host ru
In this sample benchmark the base engine is using unoptimized parameter of my engine. The values are:Optuna tuner gives trial 17 as best trial with the following parameters.Code: Select all
option.FutilityMargin=25 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=50 option.LMRFactor=90 option.LateMovePruningMargin=50 option.MobilityWeight=50
In this benchmark the match per trial only consists of 32 (more is better) games only at TC 15s+50ms.Code: Select all
option.FutilityMargin=33 option.FutilityMoveCountFactor=800 option.KingAttackWeightOp=128 option.LMRFactor=216 option.LateMovePruningMargin=200 option.MobilityWeight=58
After every 10 trials it logs some trial info.The numbers below column value are the Elo numbers achieved by the parameter suggested by the optimizer against the default values used by the base engine. Practically you can try to use the parameters of those top trials.Code: Select all
trial value params_FutilityMargin params_FutilityMoveCountFactor params_KingAttackWeightOp params_LMRFactor params_LateMovePruningMargin params_MobilityWeight state value_mean trial_cnt 17 177.0 33 800 128 216 200 58 COMPLETE 177.0 1 3 163.0 39 842 82 218 212 76 COMPLETE 163.0 1 14 150.0 43 844 122 193 100 128 COMPLETE 150.0 1 20 150.0 25 804 98 199 144 60 COMPLETE 150.0 1 18 137.0 38 878 140 193 210 52 COMPLETE 137.0 1 13 112.0 46 820 138 216 78 86 COMPLETE 112.0 1 1 100.0 66 910 114 200 50 68 COMPLETE 100.0 1 7 100.0 99 824 184 165 198 144 COMPLETE 100.0 1 2 89.0 75 1130 70 165 230 80 COMPLETE 89.0 1 10 89.0 25 932 82 208 188 84 COMPLETE 89.0 1 8 77.0 69 808 80 161 204 88 COMPLETE 77.0 1 11 77.0 65 926 110 181 50 82 COMPLETE 77.0 1 4 66.0 87 910 114 213 214 100 COMPLETE 66.0 1 12 66.0 46 818 122 205 248 126 COMPLETE 66.0 1 15 66.0 40 824 118 190 88 174 COMPLETE 66.0 1 5 55.0 38 948 50 123 210 52 COMPLETE 55.0 1 6 55.0 70 1042 64 140 56 184 COMPLETE 55.0 1 9 55.0 46 1142 198 205 122 140 COMPLETE 55.0 1 19 55.0 48 876 176 160 166 98 COMPLETE 55.0 1 16 11.0 56 834 74 212 144 88 COMPLETE 11.0 1 0 0.0 25 800 50 90 50 50 COMPLETE 0.0 1
It is necessary to run a verification test of the best param, now with more games, I use TC 15s+50ms.
The param suggested by the optimizer won.That is only a benchmark for this tuner to see if tuner is working.Code: Select all
Score of trial_17 vs def: 64 - 17 - 49 [0.681] 130 ... trial_17 playing White: 36 - 8 - 21 [0.715] 65 ... trial_17 playing Black: 28 - 9 - 28 [0.646] 65 ... White vs Black: 45 - 36 - 49 [0.535] 130 Elo difference: 131.6 +/- 48.6, LOS: 100.0 %, DrawRatio: 37.7 %
I agree that guessing is indeed inefficient, 1 or 2 parameter manual tuning can be tolerable but no longer fun if there are more.
This is my command line for this benchmark. Only applies for Optuna Game Parameter Tuner.In the parameter to be optimized format,Code: Select all
set studyname=deu_tpe_search_and_eval python tuner.py --study-name %studyname% ^ --noisy-result ^ --elo-objective ^ --sampler name=tpe multivariate=true seed=100 constant_liar=true ^ --threshold-pruner result=-100 ^ --games-per-trial 32 ^ --trials 200 ^ --concurrency 6 ^ --base-time-sec 15 ^ --inc-time-sec 0.05 ^ --draw-movenumber 20 --draw-movecount 6 --draw-score 0 ^ --resign-movecount 3 --resign-score 200 ^ --engine F:\Project\Deuterium2021-1\deuterium\\deuterium_v2021.1.38.118.exe ^ --input-param "{'LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}, 'FutilityMoveCountFactor': {'default':800, 'min':800, 'max':1200, 'step':2}, 'FutilityMargin': {'default':25, 'min':25, 'max':100, 'step':1}, 'LateMovePruningMargin': {'default':50, 'min':50, 'max':250, 'step':2}, 'KingAttackWeightOp': {'default':50, 'min':50, 'max':200, 'step':2}, 'MobilityWeight': {'default':50, 'min':50, 'max':200, 'step':2}}" ^ --opening-file ./start_opening/ogpt_chess_startpos.epd ^ --opening-format epd ^ --pgn-out %studyname%_games.pgn ^ --match-manager cutechess --plot
the default value of 90 is the one that will be used by the base engine. While the parameter suggested by the optimizer will be in the range [90, 220].Code: Select all
LMRFactor': {'default':90, 'min':90, 'max':220, 'step':1}
If you try to optimize your param be sure to use the best value or default value of your program under the "default" key. The optimizer will try to beat those default param values.
Studies can be very expensive, especially for search parameters. If during trial we use a TC of 15s+50ms, the best parameter it comes up with may not be good at higher TC. This can also be true for evaluation parameters especially those involves in king safeties.
This tuner has interrupt/resume capability, if we get bored or computer will be used for other task you can interrupt it. Then use the same command line to resume from previous state.
There is also a nice framework called nni. I have tried it, not yet shared in github. There are some kind of optimizers that are interesting like BOHB. The interface is very good.
picture host ru
There is also a parameter tuner by kiudee at https://github.com/kiudee/chess-tuning-tools. This is used to tune the parameters of Lc0. I tried this and it is good.