What about the following method for tuning evaluation:
- run through, say, 1000 positions
- generate a (semi-)random set of tuning parameters
- get their eval from your program
- get the eval from the must-be-good program (I'm using stockfish)
- compare those 1000 pairs by using https://en.wikipedia.org/wiki/Pearson_c ... oefficient
- coefficient > previous_coefficient? then remember this tuning parameters set
What do you think?
You can use Genetic Algorithm too. See this link for instance:
Sampling is ok for tuning two parameters or so. Otherwise it is terribly slow.
O wait if you tune it badly it generalizes better. So it will do better on evaluating unseen positions. Somewhere there is an optimum between bad tuning and 'overtuning'.
Other constraint is that tuning should not cost too much time so better use hill climbing with restart. Or genetic algorithm (with restart?)
I tried pearson correlation in the past, but it's not a good measure. Right now my experiments with evolving the eval parameters use texel tuning on a set of 100K positions (so I can train ~6000 models per day).
In terms of the error rate, I got very close to the hand tuned model (stable version). If you want to see the fully automatic trained eval with almost 0 human intervention check [1] or [2]. The evolved version is 100 Elo weaker than the stable version of Zurichess. I need to add back the pawns cache, though.
If anyone is willing to explain the Texel tuning method tht would be great!
Sofar I understand I have to let it play (well, run QS + eval on FENs) millions of games and then do something with the evaluation-value. But what? I don't understand the wiki explanation.
flok wrote:If anyone is willing to explain the Texel tuning method tht would be great!
Sofar I understand I have to let it play (well, run QS + eval on FENs) millions of games and then do something with the evaluation-value. But what? I don't understand the wiki explanation.
The basic idea is pretty simple: calculate the error of the evaluation when it is compared to the actual outcome of the positions. Lower a particular evaluation parameter and check if the error has improved, if not, higher the parameter, if again not improved, keep the original value. Do this for all parameters until you have reached the lowest error.