For a set of positions, I took both Zurichess quiet datasets, and filtered out all the ones where the quiescent search score didn't match the static eval score - this left me with about 2.5 million positions, which should be more than enough.
Once I got the positions and was able to find the mean error of the expected result compared to the actual result, I first tried to use a simple method (described here http://www.talkchess.com/forum3/viewtop ... =7&t=76238) because I was afraid of gradient descent. It did not work well for two reasons:
1. this may have been due to human error, but I found that if you started the tuning process with piece values such as {0,0,0,0,0}, it failed quite badly to converge to the same degree as if you started the tuning process with reasonable values. This was probably due to only increasing and decreasing the values by 1 at a time, but I couldn't get more flexible parameter changes to work either.
2. I would turn into a skeleton before I finished waiting for it to optimize the 788 parameters of my material+PSTs.
So I ended up having to confront gradient descent after all. Turns out it was... much less scary than I thought lol. This forum (http://www.talkchess.com/forum3/viewtopic.php?t=55265) was a massive help, and in code it turns out that calculating the gradient for each parameter is pretty much just:
Code: Select all
for (int i = 0; i < N; i++){ //after every evaluation: for each eval parameter(assuming N parameters), add how many times it was used
//if white has 5 pawns to black's 7 for example, weights[WPAWNMATERIAL] = -2
GRADIENTEW[i] += ((result-sigmoid) * weights[i]);
//sigmoid is the win probability that the eval correlated to: if it was too optimistic for white, the
//gradient will be reduced by the weight, if too pessimistic then vice versa (and the closer it is the less it gets
// changed)
}
//after all the positions have been evaluated, where a is the number of positions, average out the gradient
for (int i = 0; i < N; i++){
GRADIENTEW[i] = ((double)-1/a) * GRADIENTEW[i];
}
//between iterations:
for (int i = 0; i < N; i++){
params[i] -= K*GRADIENTEW[i]; //K can be determined with a line search, but I kept it simple and made it a constant that slowly
//decreases over time. It still works, it just takes longer.
}
Some interesting things to note about the new sets of values:
In general, it despises pawns. Willow has always been a fairly enterprising engine with a tendency to sac a pawn or two for positional compensation, but now it has a pawn in the middlegame all the way down to 70 centipawns compared to 363 cps for knights and 368 cps for bishops. The PSTs for pawns are also not too much changed, but an exception is for endgame far advanced pawns, where it shows a huge favoritism to flank pawns compared to center pawns.
The knight is pretty much worth the same as a bishop, in fact the base value for a knight in the endgame is ever so slightly higher than one for a bishop. However, this is immediately explained by the fact that 7 pseudolegal moves for a bishop give +30 cps while the highest mobility bonus for a knight doesn't even hit 20.
The king table remained fairly similar, but the parameter for a king on e1 went down from "small gain" to "40 cps worse than being castled". This should probably encourage Willow to castle faster.
PST-based bonuses for developing pieces and central control went down quite a bit. The improvement in pstscore from a knight on g1 to a knight on f3 went from 36 to 19, for e4 the bonus plunged from 32 to 3, and bishops on their starting square now even get a small bonus! (Although they still improve their score by moving, of course.) This is again to be expected, as both mobility and space get bonuses from playing moves like e4 and is effectively doing the old PSTs job for it to some degree.
not one parameter went ridiculous due to overfitting, which I think confirms the quality of the dataset

As for the ELO difference?
440 game test: +188 -124 =128 - +51 elo
