Stockfish's tuning method

Rémi Coulom · Post by **Rémi Coulom** » Sun Oct 09, 2011 1:04 pm

zamar wrote:
Rémi Coulom wrote:

Using a wide range and letting CLOP focus on the right interval by itself is the most efficient. As I wrote, this is one advantage of CLOP over Stockfish's tuning method: the user of the algorithm does not have to guess good values for such parameters of the optimization algorithm. CLOP will figure out good values by itself.

Rémi
I think that answering the following question would be of a great practical importance: How many iterations do you need to CLOP outperform SPSA method with a good starting value? (You've already showed that this happens in infinity)

This depends how good the starting value is. If the starting value is already optimal, there won't be any improvement over it.

In general, it is very difficult to guess the answer to this kind of question, especially since I have little experience with your algorithm. The best way would be to run an experiment.

Testing on artificial problems is most convenient, if you can do it. You can then compare to the plots in my paper. Otherwise, it would be interesting to test on a chess program.

This is how I would do it: for each algorithm, repeat a learning experiment 10-100 times, each time with the same total number of games, but with a different random seed. Then, plot, for each algorithm (and each replication of the experiment) the estimated optimal parameter as a function of time.

This would give a good comparison of how they behave.

But I have no time to do it.

Rémi

mcostalba · Post by **mcostalba** » Sun Oct 09, 2011 6:01 pm

Perhaps there is another key difference between SF tuning and CLOP.

SF tuning is based on self play, so it means that with a single game you are able to test 2 parameters vectors at once. In CLOP you select a parameter vector, send to the engine and start the game(s), then read the results compute the next vector and redo.

In case of SF we select 2 vectors, one is sent to the first engine, the other to the second engine, at the end of the match you get a result that has an information about _both_ the vectors. IMHO this is intrinsecally more efficient than using one vector per game. As a cons, not everybody likes self play.

Remi, would be possible to adapt CLOP to self play scenario ? Where 2 parameters vectors are chosen and the computation of the results takes in account the pair of vectors and not only one. I am not an expert but my guess is that you should converge faster because you can extract more information from a single game.

More, self play could have some drawback, but has also postive properties like the one to artificially increase the ELO difference so that signal to noise ratio is improved.

What do you think ?

mcostalba · Post by **mcostalba** » Sun Oct 09, 2011 6:17 pm

I am thinking also to another possible weakness if you use CLOP as is for self play.

Suppose you have reference setup A for your engine, you start tune the engine until you find a vector that maximizes the winning chances against A, call it setup B and suppose B is stronger than a (B > A)

Now use setup B as reference as start to tune for another setup until you find a vector C stronger than B (C > B).

Then you can't theoretically ignore the possibility that A > C so that you end up with B > A > C > B

Instead if self tuning is done changing _both_ vectors for both engines there is no more a fixed reference and you (intuitively, I am far from demonstarte this

) end up with a setup that on average is the strongest among all the possiible setups.

So self play changing both vectors intuitively seems theoretically stronger methodology and not only faster to converge.

Marco

Rémi Coulom · Post by **Rémi Coulom** » Sun Oct 09, 2011 8:04 pm

self-play is an interesting question. It is also in the TODO list. But it would take quite a bit of work.

It is not very difficult to perform quadratic regression from self-play data. It can be done with an Elo-rating model and the same kind of quadratic regression.

The big change is the sampling policy. In self-play, the two opponents should not be too close, because then, no information is obtained. The probability distribution should be clever. The choice of a Gaussian distribution in Stockfish's algorithm is, BTW, not efficient for this purpose. The distribution should have a low probability of zero difference, because zero difference is not informative.

There is some theory about that question:
@article{
Offen-1986a,
author = "Offen, Walter W. and Littell, Ramon C.",
title = "Design of paired comparison experiments when treatments are levels of
a single quantitative variable",
journal = "Journal of Statistical Planning and Inference",
volume = 15,
pages = "331--346",
year = "1986--1987"
}

@phdthesis{
VanBerkum-1985a,
author = "van Berkum, Emilius Eduardus Maria",
title = "Optimal Paired Comparison Designs for Factorial Experiments",
school = "Technische Universiteit Eindhoven",
publisher = "Centrum voor Wiskunde en Informatica, Amsterdam",
year = 1985
}

Of course you can use CLOP against a fixed version of your own program, but it won't be as efficient as it could be.

Rémi

jacobbl · Post by **jacobbl** » Sun Oct 09, 2011 8:28 pm

Is it possible to ask which parameters were the most important to tune (what gave the largest ELO gain?) For us with limited testing resources it would be nice to have an idea where we shoud start tuning.

mcostalba · Post by **mcostalba** » Sun Oct 09, 2011 9:44 pm

jacobbl wrote:Is it possible to ask which parameters were the most important to tune (what gave the largest ELO gain?) For us with limited testing resources it would be nice to have an idea where we shoud start tuning.

psq tables, mobility and piece values if I remember correctly. Anyhow now that SF git repository has been published you can check for old commits of tuned parameter, as a standard policy every commit that changes functionality stores the measured ELO increase in the log message.

Daniel Shawul · Post by **Daniel Shawul** » Mon Oct 17, 2011 2:07 am

I read that there is a second order equivalent of this method which computes the hessian in O(p) time. The first order SPSA does 2 evaluations for p parameters, compared to O(p) of regular Finite difference SA (FDSA). The authors suggested the second order SPSA beats both methods.

What is explanation without code ?

Please provide us that code along with cutechess-cli.py and a plugin to QLR gui (you know it is possible to do that?).
That is probably too much ask but at least give us the code.
regards,

Eelco de Groot · Post by **Eelco de Groot** » Sat Oct 22, 2011 2:01 am

mcostalba wrote:
Eelco de Groot wrote: Just an idea, it is not an official proposal for a patch or anything
No problem, I have done that. When I like some idea I am quite fast coding it

I have pushed to github a new branch called king_defenders and the last 2 patches are the result of this discussion.

https://github.com/mcostalba/Stockfish/ ... _defenders

The first just introduce the infrastructure without changing the functionality, the second enables it. Note that very possibly the parameters of the second patch are far from optimal and a tuning is really needed.

Thanks for the code and the GitHub Marco

I just wanted to add here that some new ideas for this king_defender branch have been posted elsewhere and Martin Wyngaarden has made a compile:

http://www.talkchess.com/forum/viewtopic.php?t=40766
http://dl.dropbox.com/u/11904592/stockfish_kd.zip Compile from Martin of mcostalba-Stockfish-a8b6f6b

I just posted some ideas for the king_defender branch in TDBB but these are not tested. I tried not to go overboard with added Rainbow Serpent code/ideas

http://www.talkchess.com/forum/viewtopic.php?t=40796
http://www.computerchess.info/tdbb/phpB ... ?f=9&t=617
mcostalba-Stockfish-a8b6f6b-ferrarifish-2011-10-22.rar archive contains some changes in evaluate.cpp, search.cpp, material.h, material.cpp

6. Changes for Ferrarifish branch
---------------------------------

This is essentially the mcostalba-Stockfish-a8b6f6b king_defender
branch with some changes from Rainbow Serpent, a thusfar private clone
of Stockfish 2.0. The changed files from king_defender are evaluate.cpp,
search.cpp, material.cpp and material.h.

In material.h I changed a int16_t type to int32_t This may not really be necessary,
but I wanted to be sure that the variable "value" in material.cpp could handle the negative inputs.

These negative inputs are possible because of the changes in the material
imbalance tables. (See comments in the chessprogramming forum Talkchess
http://www.talkchess.com/forum/viewtopic.php?t=38837) Basically it introduces
some constants so that the predicted synergies between pieces are no longer
linear with the number of those pieces in the ranges from zero to one.

In evaluate.cpp code is added for protected passed pawns, support by a rook
or queen behind the pawn, connected passed pawns and a probably superfluous piece
of code in case there are different colored bishops and passed pawns.
Also threat evaluation is made a little more aggressive and pieces blocking their own pawns
receive a penalty, thanks to a tip from Don Dailey about minor pieces behind pawns and
pawn mobility.

In search.cpp the changes are minor but futility pruning is made more aggressive
like in Rainbow Serpent. Futility pruning based on value and on movenumber
can now take place in the last 24 plies!

Copyright (C) all changes from mcostalba-Stockfish-a8b6f6b 2011 Eelco de Groot

Eelco

gaard · Post by **gaard** » Sat Oct 22, 2011 5:30 am

FYI, for g/6", and g/60", I have a 4-sigma result in favor of the default. This weekend I am going to set up for testing with modifications to the ...CheckBonus coefficients. Results to follow...

mcostalba · Post by **mcostalba** » Sat Oct 22, 2011 1:24 pm

gaard wrote:FYI, for g/6", and g/60", I have a 4-sigma result in favor of the default. This weekend I am going to set up for testing with modifications to the ...CheckBonus coefficients. Results to follow...

Yes, king_defenders does not work. I have tried also a version with reduced defenders weights:

Code: Select all

// Reduce attackUnits according to number and types of the defending
// pieces and the quality of the pawn shelter.
attackUnits -=  Min(15, (ei.kingAttackersCount[Us][Us] * ei.kingAttackersWeight[Us][Us]) / 4)
+ ei.kingAdjacentZoneAttacksCount[Us][Us]
+ mg_value(ei.pi->king_shelter<Us>(pos, ksq)) / 32;

It seems better but still around 15-20 ELO weaker than default.

Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method

Re: Stockfish's tuning method