a) The scale of centi-pawns vs elo is not one to one when we have significant draw percentages, so despite our previous assumption that no scaling is needed, this needs to be addressed especially for Davidson. The tuning results I got using Davidson are siginficantly ramped up compared to the other models. Assuming the centi-pawn vs winning-percentage relation is logistic, we will try to morf the three draw models to give evaluation scores obtained with no draw model. The criteria I am using to calculate the scaling factor is the slope with eval=0 should match that of the logistic curve. Let me know if there is a better scaling factor calculation method especially for Davidson that is slighlty off after scaling using this method. In code, it is:
Code: Select all
1282 //scale elos so that they look more like elostat's
1283 // Match slopes at 0 elo difference using df(x)/dx = K/4
1284 void calculate_scale() {
1285 const double K = log(10)/400.0;
1286 double df;
1287 if(eloModel == 0) {
1288 double dg = elo_to_gamma(eloDraw);
1289 double f = 1 / (1 + dg);
1290 df = f * (1 - f) * K;
1291 } else if(eloModel == 1) {
1292 double dg = elo_to_gamma(eloDraw) - 1;
1293 df = (dg / pow(2+dg,2.0)) * K;
1294 } else if(eloModel == 2) {
1295 const double pi = 3.14159265359;
1296 double x = -eloDraw/400.0;
1297 df = exp(-x*x) / (400.0 * sqrt(pi));
1298 }
1299 eloScale = (4.0 / K) * df;
1300 printf("EloScale %f\n",eloScale);
1301 }
b) Some evaluation terms dealing with imbalances are very problematic for tuning. I had a bonus for a major or minor pieces vs pawns bonus. Tuning this increased the value of the bonus from a default value of 45 to even 500! I think this is because in the dataset there probably aren't enough positions where a side is a piece up and not win. When I carefully changed the evaluation codition to be for a side to be up a piece AND down by atleast pawns that equal the piece value, then the tuning kept more or less the 45 value. I can imagine this kind of thing could be problematic to other people tuning their eval too.
I will do the drawishness simulations to convergence after i fixed these issues.
Daniel