I think it was this post by Rémy Coulom: http://www.talkchess.com/forum/viewtopi ... 611#400611Evert wrote: From what I remember, simulated annealing performs poorly when tuning chess evaluation functions. I'll try to find where I read that.
Ab-initio evaluation tuning
Moderators: hgm, Rebel, chrisw
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Ab-initio evaluation tuning
-
- Posts: 7234
- Joined: Mon May 27, 2013 10:31 am
Re: Ab-initio evaluation tuning
Simulated annealing is usually terribly slow. Maybe finding a (global) maximum is bad too for evaluation must generalize well when it encounters unknown positions. So strangely bad tuning might work better.
Tuning must be fast. Maybe that's only requirement and it must find a solution far beyond average but not too much near the top for otherwise it won't generalize.
Tuning must be fast. Maybe that's only requirement and it must find a solution far beyond average but not too much near the top for otherwise it won't generalize.
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Ab-initio evaluation tuning
The answer, it turns out, is yes.Gerd Isenberg wrote:Isn't it necessary to introduce at least a disjoint feature for most volatile advaned passers?
I added a fairly simple term: passers get a bonus dependent on their rank that increases quadratically and saturates at ~200 cp on the 7th rank (so the total value of the pawn is ~ a minor).
With this term added, I get
Code: Select all
MG EG
P 0.72 0.94
N 2.81 3.43
B 2.85 3.44
R 3.87 5.94
Q 9.42 10.22
BB 0.13 0.24
NN 0.17 -0.12
RR -0.03 -0.16
First up is improving the tuning algorithm though.
-
- Posts: 28123
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Ab-initio evaluation tuning
Perhaps an open-file bonus would help. I noticed from imbalance testing that orthogonal sliders of all ranges tend to test 25cP below the value you would expect, as opening value. That makes a Wazir hardly better than a Pawn. If you start the Wazir in front of the Pawn chain it is about 130 cP, though.
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Ab-initio evaluation tuning
I thought about an open-file bonus, but I think it would go the wrong way. What's needed is an increase of the base value of the Rook relative to the minor pieces in middle-game positions. Giving a situational bonus to the Rook isn't going to do that (if anything, it will decrease the base value), but giving a situational bonus to the minors should. Of course at the end of the day an evaluation function would have all these things.hgm wrote:Perhaps an open-file bonus would help. I noticed from imbalance testing that orthogonal sliders of all ranges tend to test 25cP below the value you would expect, as opening value. That makes a Wazir hardly better than a Pawn. If you start the Wazir in front of the Pawn chain it is about 130 cP, though.
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Ab-initio evaluation tuning
I used GSL again to build up the fitting code, but rather than using the minimisation family of functions (which are horrendously slow for this), I used the non-linear least-square fitting (which seem to be much faster, depending on the algorithm). The Levenberg-Marquardt algorithm proofs to be too slow, but what it calls the "Double dogleg" and "Two Dimensional Subspace" methods seem to perform well. I picked the latter by default.Evert wrote: First up is improving the tuning algorithm though.
The downside of using the GSL routines is that they don't work with stochastic gradient descent. I tried fitting on a small batch and then again on another batch. Results were poor. What might work is first fitting a subset of the positions and then increasing the number of positions that are to be fitted once the result converges, but I haven't tried that yet. For now, I just feed the full set of positions to the tuner, which is still reasonably fast.
It can use two threads to calculate the evaluation for all positions, which gives a nice speedup (t1c/t2c = 1.72). I could extend that so it can use more threads on my desktop, but I'm not sure it's all that useful.
The tuner now spits out
Code: Select all
0 0.76 3.11 3.05 3.78 9.18 0.93 3.12 3.25 5.90 10.28 0.10 0.22 0.06 -0.02 0.14 -0.16 0.75 3.26 1e+10 0.0598891
1 0.73 3.08 3.02 3.72 8.78 0.93 3.18 3.32 5.98 10.63 0.13 0.22 0.06 -0.02 0.17 -0.20 0.65 3.33 100.432 0.0598639
2 0.73 3.08 3.02 3.72 8.74 0.93 3.18 3.32 5.97 10.65 0.13 0.22 0.06 -0.02 0.17 -0.20 0.64 3.33 99.472 0.0598638
3 0.73 3.08 3.02 3.72 8.74 0.93 3.18 3.32 5.97 10.65 0.13 0.22 0.06 -0.02 0.17 -0.20 0.64 3.33 99.342 0.0598634
4 0.73 3.08 3.02 3.72 8.74 0.93 3.18 3.32 5.97 10.65 0.13 0.22 0.06 -0.02 0.17 -0.20 0.64 3.33 99.342 0.0598634
status = success
summary from method 'trust-region/2D-subspace'
number of iterations: 4
function evaluations: 89
Jacobian evaluations: 0
reason for stopping: small step size
initial |f(x)| = 182.124960
final |f(x)| = 182.087293
chisq/dof = 0.0598654
VALUEL_P_MG = 0.73 +/- 0.02 [186]
VALUEL_N_MG = 3.08 +/- 0.10 [788]
VALUEL_B_MG = 3.02 +/- 0.10 [774]
VALUEL_R_MG = 3.72 +/- 0.21 [953]
VALUEL_Q_MG = 8.74 +/- 0.46 [2238]
VALUEL_P_EG = 0.93 +/- 0.02 [238]
VALUEL_N_EG = 3.18 +/- 0.07 [814]
VALUEL_B_EG = 3.32 +/- 0.07 [849]
VALUEL_R_EG = 5.97 +/- 0.11 [1529]
VALUEL_Q_EG = 10.65 +/- 0.32 [2726]
VALUEQ_BB_MG = 0.13 +/- 0.04 [33]
VALUEQ_BB_EG = 0.22 +/- 0.04 [57]
VALUEQ_NN_MG = 0.06 +/- 0.03 [16]
VALUEQ_NN_EG = -0.02 +/- 0.04 [-6]
VALUEQ_RR_MG = 0.17 +/- 0.10 [43]
VALUEQ_RR_EG = -0.20 +/- 0.06 [-51]
VALUEQ_PASS_MG = 0.64 +/- 0.20 [164]
VALUEQ_PASS_EG = 3.33 +/- 0.11 [852]
{ 0.725672, 3.077, 3.02305, 3.72302, 8.74231, 0.930309, 3.17822, 3.31578, 5.9733, 10.6504, 0.13009, 0.222306, 0.0640046, -0.0233892, 0.166455, -0.200781, 0.639021, 3.32864,}
Now to improve that rook value.
By the way, if there's interest, I'm more than happy to share the code (after cleaning it up a bit).
-
- Posts: 2278
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ab-initio evaluation tuning
How did you select your positions? From normal games?Evert wrote: Now to improve that rook value.
Are you optimizing an evaluation function with only material terms?
In normal games a player would not give up an exchange without proper compensation. So if you only optimize for material values the value of a rook may indeed appear to be close to that of a minor, but that would be an artifact of the bias in the selection of the positions and the fact that there are no compensating terms in the eval for measuring the compensation.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Ab-initio evaluation tuning
They're positions taken from those sampled during gameplay. Positions that were not "quiet" were removed from the set, the result comes from a playout by Stockfish. See http://talkchess.com/forum/viewtopic.php?p=686204 for a description of the test positions.Michel wrote:How did you select your positions? From normal games?Evert wrote: Now to improve that rook value.
I filtered these further by only using positions with unbalanced material (otherwise material doesn't matter anyway).
For the moment. I'm in the stage of adding other positional terms, now that I'm reasonably satisfied that the code is working correctly.Are you optimizing an evaluation function with only material terms?
Indeed. I did a quick trial by adding a simple mobility term for B and N, and this has the desired effect of increasing the value of the Rook compared to the minors. Unfortunately it does this by reducing the value of the minors to to ~200cp or so, well below their EG value (which is problematic).In normal games a player would not give up an exchange without proper compensation. So if you only optimize for material values the value of a rook may indeed appear to be close to that of a minor, but that would be an artifact of the bias in the selection of the positions and the fact that there are no compensating terms in the eval for measuring the compensation.
-
- Posts: 2278
- Joined: Mon Sep 29, 2008 1:50 am
Re: Ab-initio evaluation tuning
I wonder if these positions were sampled during game play or during search. If they were sampled during game play then they would be biased as the material balance is not independent of other positional factors. This would lead to problems for tuning a "material only" evaluation (possibly not for tuning a "well rounded" evaluation).Evert wrote:Positions that were not "quiet" were removed from the set, the result comes from a playout by Stockfish. See http://talkchess.com/forum/viewtopic.php?p=686204 for a description of the test positions.
I think it is more reasonable to first do a few random moves on sample positions before recording them for a playout.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: Ab-initio evaluation tuning
To my understanding, they were sampled during search:Michel wrote:I wonder if these positions were sampled during game play or during search. If they were sampled during game play then they would be biased as the material balance is not independent of other positional factors. This would lead to problems for tuning a "material only" evaluation (possibly not for tuning a "well rounded" evaluation).Evert wrote:Positions that were not "quiet" were removed from the set, the result comes from a playout by Stockfish. See http://talkchess.com/forum/viewtopic.php?p=686204 for a description of the test positions.
This seems fair.Alexandru Mosoi wrote:From each game 20 positions were sampled from the millions of positions evaluated by the engine during the game play.
Adding mobility to the search and trying to tune that without tuning the rest of the evaluation terms doesn't go so well, but I guess it might be that my treatment is too simple (for one thing, it is not centred, so the base value is the value of a piece with no moves. That's probably wrong). Anyway, the condition number for the Jacobian jumps up and the program aborts after a few iterations without assigning weights to the mobility. I guess I'll do a proper job of it and try again. At some point I should try the evaluation function out in actual gameplay too...