New Tool for Tuning with Skopt

zenpawn · Post by **zenpawn** » Sun Sep 08, 2019 9:46 pm

I've run the script with my xboard engine, but I don't set usermove=1. I do get timeout errors loading the engine, despite it taking well less than 1 second and sending done=0 and done=1. Nevertheless, it runs anyway.

thomasahle · Post by **thomasahle** » Sun Sep 08, 2019 11:02 pm

xr_a_y wrote: ↑Sun Sep 08, 2019 12:53 pmI'm trying it now, using xboard protocol, and I'm not really successful because python-chess seems to be rejecting "usermove=1".
Anyone experience this before ?

Yes, this is frustrating. I had to implement usermove=0 in sunfish to get it to work with python-chess.

Would be great if anyone could send a pull request to niklas about this.

JVMerlino · Post by **JVMerlino** » Mon Sep 09, 2019 12:44 am

thomasahle wrote: ↑Sun Sep 08, 2019 11:02 pm
xr_a_y wrote: ↑Sun Sep 08, 2019 12:53 pmI'm trying it now, using xboard protocol, and I'm not really successful because python-chess seems to be rejecting "usermove=1".
Anyone experience this before ?
Yes, this is frustrating. I had to implement usermove=0 in sunfish to get it to work with python-chess.

Would be great if anyone could send a pull request to niklas about this.

Well, Myrddin's in luck here. Since "usermove=0" is the default, I never implemented it.

thomasahle · Post by **thomasahle** » Mon Sep 09, 2019 6:03 am

JVMerlino wrote: ↑Mon Sep 09, 2019 12:44 am Well, Myrddin's in luck here. Since "usermove=0" is the default, I never implemented it.

Well, I didn't mean "implement" it as anything else than trying to parse unrecognised commands as moves.

xr_a_y · Post by **xr_a_y** » Wed Sep 11, 2019 7:21 am

After trying this script a little on some search parameter I am not in success.
Any advice will be welcome about choosing the good script configuration and a reasonable number of games.

For now, 1000 games to tune 4 or 5 parameters at a time seems not good.

thomasahle · Post by **thomasahle** » Wed Sep 11, 2019 7:09 pm

xr_a_y wrote: ↑Wed Sep 11, 2019 7:21 am After trying this script a little on some search parameter I am not in success.
Any advice will be welcome about choosing the good script configuration and a reasonable number of games.
For now, 1000 games to tune 4 or 5 parameters at a time seems not good.

People have had some problems with skopt not doing enough exploration.
I personally have found that increasing the assumed noise (say -asq-noise 10) made it much better at this.
Another approach is to tweak the -n-initial-points, and some of the other parameters under "Optimization parameters".
If you want to increase -n above 1000 it might be useful to change the -base-estimator to GBRT or ET, since the standard gaussian optimizer gets slow.

Reports of any experiences you've had would be beneficial. We are all learning.

nionita · Post by **nionita** » Wed Sep 11, 2019 9:30 pm

thomasahle wrote: ↑Wed Sep 11, 2019 7:09 pm
xr_a_y wrote: ↑Wed Sep 11, 2019 7:21 am After trying this script a little on some search parameter I am not in success.
Any advice will be welcome about choosing the good script configuration and a reasonable number of games.
For now, 1000 games to tune 4 or 5 parameters at a time seems not good.
People have had some problems with skopt not doing enough exploration.
I personally have found that increasing the assumed noise (say -asq-noise 10) made it much better at this.
Another approach is to tweak the -n-initial-points, and some of the other parameters under "Optimization parameters".
If you want to increase -n above 1000 it might be useful to change the -base-estimator to GBRT or ET, since the standard gaussian optimizer gets slow.

Reports of any experiences you've had would be beneficial. We are all learning.

Thanks for sharing this! I do not use your script, basically because I had already some python scripts for tuning based on DSPSA, but after seeing it, I wrote also a Bayesian optimizer using skopt and started last week to experiment with this again.

I have a feeling that this method will not work if the "measuments" (i.e. the game samples) are too noisy. That means, if you play a few games per parameter configuration, I guess it will fail. I experiment since 2-3 days with 1k to 2.5k games per configuration, I don't have yet confirmation of how strong the results are. But in the skopt documentation the function to be approximated is called "(very) expensive", which somehow suggests that taking a measurement must take much longer than doing the optimization of the surrogate function.

Joerg Oster · Post by **Joerg Oster** » Wed Sep 11, 2019 10:25 pm

nionita wrote: ↑Wed Sep 11, 2019 9:30 pm
thomasahle wrote: ↑Wed Sep 11, 2019 7:09 pm
xr_a_y wrote: ↑Wed Sep 11, 2019 7:21 am After trying this script a little on some search parameter I am not in success.
Any advice will be welcome about choosing the good script configuration and a reasonable number of games.
For now, 1000 games to tune 4 or 5 parameters at a time seems not good.
People have had some problems with skopt not doing enough exploration.
I personally have found that increasing the assumed noise (say -asq-noise 10) made it much better at this.
Another approach is to tweak the -n-initial-points, and some of the other parameters under "Optimization parameters".
If you want to increase -n above 1000 it might be useful to change the -base-estimator to GBRT or ET, since the standard gaussian optimizer gets slow.

Reports of any experiences you've had would be beneficial. We are all learning.
Thanks for sharing this! I do not use your script, basically because I had already some python scripts for tuning based on DSPSA, but after seeing it, I wrote also a Bayesian optimizer using skopt and started last week to experiment with this again.

I have a feeling that this method will not work if the "measuments" (i.e. the game samples) are too noisy. That means, if you play a few games per parameter configuration, I guess it will fail. I experiment since 2-3 days with 1k to 2.5k games per configuration, I don't have yet confirmation of how strong the results are. But in the skopt documentation the function to be approximated is called "(very) expensive", which somehow suggests that taking a measurement must take much longer than doing the optimization of the surrogate function.

That's why I wanted to be able to increase the number of games per parameter setting.
Another question is how much of the parameter space (possible configurations) needs to be explored
To get a good estimate?

I'm right now experimenting with GBRT as base estimator and increased kappa value to allow broader exploration.

xr_a_y · Post by **xr_a_y** » Fri Sep 13, 2019 7:16 pm

thomasahle wrote: ↑Wed Sep 11, 2019 7:09 pm
xr_a_y wrote: ↑Wed Sep 11, 2019 7:21 am After trying this script a little on some search parameter I am not in success.
Any advice will be welcome about choosing the good script configuration and a reasonable number of games.
For now, 1000 games to tune 4 or 5 parameters at a time seems not good.
People have had some problems with skopt not doing enough exploration.
I personally have found that increasing the assumed noise (say -asq-noise 10) made it much better at this.
Another approach is to tweak the -n-initial-points, and some of the other parameters under "Optimization parameters".
If you want to increase -n above 1000 it might be useful to change the -base-estimator to GBRT or ET, since the standard gaussian optimizer gets slow.

Reports of any experiences you've had would be beneficial. We are all learning.

Yes sorry for not giving enough details.

I'm trying to tune search parameters (for example static nullmove depth and coeff) in Minic.

Last try was

Code: Select all

python3 tune.py minic_dev_uci -opt staticNullMoveDepthInit0 0 800 -opt staticNullMoveDepthInit1 0 800 -opt staticNullMoveDepthCoeff0 0 800 -opt staticNullMoveDepthCoeff1 0 800 -opt staticNullMoveMaxDepth0 0 20 -opt staticNullMoveMaxDepth1 0 20 -movetime 30 -conf ~/.config/cutechess/engines.json -concurrency=7 -games-file out.pgn -n 8000 -base-estimator GBRT -acq-noise 5

giving those very unclear results 0 +/- 190

Code: Select all

Best expectation (κ=0): [106 207 361 782   5   9] = -0.000 ± 0.500 (ELO-diff -0.000 ± 190.849)
Best expectation (κ=1): [112 477 663  48  13   0] = -0.000 ± 0.261 (ELO-diff -0.000 ± 92.678)
Best expectation (κ=2): [112 477 663  48  13   0] = -0.000 ± 0.261 (ELO-diff -0.000 ± 92.678)
Best expectation (κ=3): [112 477 663  48  13   0] = -0.000 ± 0.261 (ELO-diff -0.000 ± 92.678)

where only first answer is somehow a possible good idea.

I think current values of search parameter in Minic are quite good and wonder why this is not easily shown by those optimization tries.

I've also tried with various other margins and coeff without success.

I may be doing something wrong of course ! and my knowledge on optimization is very poor...

pedrox · Post by **pedrox** » Sat Sep 14, 2019 1:12 pm

Once I have played a series of games and I have the file data.log, if I run the test again and use the file, in each run I have different results and the results seem quite random.

Code: Select all

...
Using [8.0, 90.0, 5.0, 114.0] => 0.5 from log-file
Using [1.0, 144.0, 1.0, 121.0] => 0.0 from log-file
Using [9.0, 107.0, 6.0, 99.0] => 0.0 from log-file
Using [8.0, 158.0, 7.0, 125.0] => -0.5 from log-file
Using [8.0, 101.0, 9.0, 134.0] => -1.0 from log-file
Using [3.0, 116.0, 9.0, 112.0] => -0.5 from log-file
Using [1.0, 133.0, 3.0, 142.0] => -1.0 from log-file
Using [6.0, 154.0, 8.0, 145.0] => 0.0 from log-file
Using [3.0, 156.0, 7.0, 115.0] => 0.0 from log-file
Using [1.0, 135.0, 4.0, 144.0] => 0.0 from log-file
Using [5.0, 148.0, 10.0, 126.0] => -0.5 from log-file
Using [2.0, 90.0, 4.0, 111.0] => -1.0 from log-file
Using [4.0, 115.0, 3.0, 84.0] => -0.5 from log-file
Using [8.0, 92.0, 1.0, 123.0] => 0.0 from log-file
Using [8.0, 100.0, 4.0, 129.0] => 0.0 from log-file
Using [3.0, 104.0, 0.0, 128.0] => -0.5 from log-file
Using [2.0, 128.0, 9.0, 134.0] => 0.5 from log-file
Using [6.0, 100.0, 4.0, 104.0] => -0.5 from log-file
Using [5.0, 133.0, 3.0, 107.0] => 0.0 from log-file
Using [6.0, 117.0, 10.0, 105.0] => 0.5 from log-file
Using [3.0, 100.0, 8.0, 97.0] => -0.5 from log-file
Fitting first model
Summarizing best values
Best expectation (κ=0): [  1.  83.   2. 135.] = -0.000 ± 0.375 (ELO-diff -0.000 ± 338.039)
Best expectation (κ=1): [  9. 155.   2.  92.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)
Best expectation (κ=2): [  9. 155.   2.  92.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)
Best expectation (κ=3): [  9. 155.   2.  92.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)

Code: Select all

...
Using [8.0, 90.0, 5.0, 114.0] => 0.5 from log-file
Using [1.0, 144.0, 1.0, 121.0] => 0.0 from log-file
Using [9.0, 107.0, 6.0, 99.0] => 0.0 from log-file
Using [8.0, 158.0, 7.0, 125.0] => -0.5 from log-file
Using [8.0, 101.0, 9.0, 134.0] => -1.0 from log-file
Using [3.0, 116.0, 9.0, 112.0] => -0.5 from log-file
Using [1.0, 133.0, 3.0, 142.0] => -1.0 from log-file
Using [6.0, 154.0, 8.0, 145.0] => 0.0 from log-file
Using [3.0, 156.0, 7.0, 115.0] => 0.0 from log-file
Using [1.0, 135.0, 4.0, 144.0] => 0.0 from log-file
Using [5.0, 148.0, 10.0, 126.0] => -0.5 from log-file
Using [2.0, 90.0, 4.0, 111.0] => -1.0 from log-file
Using [4.0, 115.0, 3.0, 84.0] => -0.5 from log-file
Using [8.0, 92.0, 1.0, 123.0] => 0.0 from log-file
Using [8.0, 100.0, 4.0, 129.0] => 0.0 from log-file
Using [3.0, 104.0, 0.0, 128.0] => -0.5 from log-file
Using [2.0, 128.0, 9.0, 134.0] => 0.5 from log-file
Using [6.0, 100.0, 4.0, 104.0] => -0.5 from log-file
Using [5.0, 133.0, 3.0, 107.0] => 0.0 from log-file
Using [6.0, 117.0, 10.0, 105.0] => 0.5 from log-file
Using [3.0, 100.0, 8.0, 97.0] => -0.5 from log-file
Fitting first model
Summarizing best values
Best expectation (κ=0): [  0.  95.   9. 113.] = -0.000 ± 0.375 (ELO-diff -0.000 ± 338.039)
Best expectation (κ=1): [ 10. 149.   1. 120.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)
Best expectation (κ=2): [ 10. 149.   1. 120.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)
Best expectation (κ=3): [ 10. 149.   1. 120.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)

Code: Select all

...
Using [8.0, 90.0, 5.0, 114.0] => 0.5 from log-file
Using [1.0, 144.0, 1.0, 121.0] => 0.0 from log-file
Using [9.0, 107.0, 6.0, 99.0] => 0.0 from log-file
Using [8.0, 158.0, 7.0, 125.0] => -0.5 from log-file
Using [8.0, 101.0, 9.0, 134.0] => -1.0 from log-file
Using [3.0, 116.0, 9.0, 112.0] => -0.5 from log-file
Using [1.0, 133.0, 3.0, 142.0] => -1.0 from log-file
Using [6.0, 154.0, 8.0, 145.0] => 0.0 from log-file
Using [3.0, 156.0, 7.0, 115.0] => 0.0 from log-file
Using [1.0, 135.0, 4.0, 144.0] => 0.0 from log-file
Using [5.0, 148.0, 10.0, 126.0] => -0.5 from log-file
Using [2.0, 90.0, 4.0, 111.0] => -1.0 from log-file
Using [4.0, 115.0, 3.0, 84.0] => -0.5 from log-file
Using [8.0, 92.0, 1.0, 123.0] => 0.0 from log-file
Using [8.0, 100.0, 4.0, 129.0] => 0.0 from log-file
Using [3.0, 104.0, 0.0, 128.0] => -0.5 from log-file
Using [2.0, 128.0, 9.0, 134.0] => 0.5 from log-file
Using [6.0, 100.0, 4.0, 104.0] => -0.5 from log-file
Using [5.0, 133.0, 3.0, 107.0] => 0.0 from log-file
Using [6.0, 117.0, 10.0, 105.0] => 0.5 from log-file
Using [3.0, 100.0, 8.0, 97.0] => -0.5 from log-file
Fitting first model
Summarizing best values
Best expectation (κ=0): [ 5. 97. 10. 88.] = -0.000 ± 0.375 (ELO-diff -0.000 ± 338.039)
Best expectation (κ=1): [  5. 159.   6. 140.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)
Best expectation (κ=2): [  5. 159.   6. 140.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)
Best expectation (κ=3): [  5. 159.   6. 140.] = -0.000 ± 0.255 (ELO-diff -0.000 ± 195.793)

I thought you could reuse data, whether there is a crash or if you want to expand the number of iterations, but I am confused that in each run once all the games have been played I have different results.

New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt

Re: New Tool for Tuning with Skopt