Next, I modified my dynamic draw score calculation to basically compute the difference between Crafty's rating and the opponent's rating (obtained from a small file that was derived from previous test runs on the cluster.) I rounded the rating difference to multiples of 100 as follows:
difference = (opponent_rating - crafty_rating + 50) / 100
edit: if opponent rating is > crafty, I add 50, if opponent rating is < crafty, I subtract 50. Forgot about this until after I re-read it.
If the opponent is +200 better, "difference" = +2.
The draw score is set to a parameter set during testing, called "scale" multiplied by the rating difference above. I tried a bunch of different "scale" values, from zero to +200 (2.0 pawns). Small numbers are close, big numbers get worse.
The version tested is 23.5R10, which is the version with the new way of setting the default contempt based on rating. Old approach had a limit of +/-50 regardless of the rating difference, I wanted to see what happens on really big numbers...
Results:
Code: Select all
Crafty-23.5R10-30 2648 4 4 30000 62% 2544 20%
Crafty-23.5R10-40 2647 4 4 30000 62% 2544 20%
Crafty-23.5R10-20 2646 4 4 30000 62% 2544 20%
Crafty-23.5R10-10 2646 3 3 60000 63% 2544 20%
Crafty-23.5R10-60 2645 4 4 30000 62% 2544 20%
Crafty-23.5R10-80 2642 4 4 30000 61% 2544 20%
Crafty-23.5R10-0 2640 4 4 30000 62% 2544 20%
Crafty-23.5-1 2639 4 4 30000 62% 2544 21%
Crafty-23.5R10-100 2636 4 4 30000 60% 2544 19%
Crafty-23.5-2 2636 4 4 30000 62% 2544 21%
Crafty-23.5R10-120 2636 4 4 30000 60% 2544 19%
Crafty-23.5R10-140 2630 4 4 30000 59% 2544 18%
Crafty-23.5R10-200 2624 4 4 30000 58% 2544 17%
Observations. The error bar is generally +/-4 so don't lose sight of that. I do not know how I ran 23.5R10-10 twice, but I did, which dropped its error bar somewhat. Notice that the normal version was rated 2636 in one run, 2639 in another. The new version with scale=0 was 2640, and as it should be, was right in line with the others. Notice that some of the bigger numbers are better, but it doesn't take much before they start dropping. And by the time you get to 200 (which means for an opponent 200 points higher rated, drawscore = 400) things are dropping.
Next: check the draw percentages. As the draw score climbs, Crafty tries harder to avoid draws against weaker opponents, and tries harder to reach a drawn position against stronger opponents. The normal draw % is 21, but with scale=200, it is down to 17%. But as the drawing percentage drops, so does the winning percentage drop from the usual 62% to 58%. Obviously it is drawing less and losing more, no big surprise with a score that large.
Against the cluster opponents, I see the following loss/draw rates when looking at just the scale=0 (default) games:
edit: 78% means opponent beats Crafty 78% of time and draws 22%, from Bayeselo...
78% 22%
62% 20%
36% 24%
34% 23%
22% 19%
18% 15%
When I take the highest rating of the above, which is scale=30 (ignoring the error bar of course) I see this:
75% 32%
62% 20%
37% 22%
35% 21%
23% 17%
18% 11%
What changed? Against the best opponent, it won 3% more games, and drew 10% more games. Good change. Against the worst, it drew 4% fewer games, but did not increase its winning percentage, so it lost most of those it would have drawn. Not so good.o
Finally, for the worst-case, scale=200, I see these numbers:
78% 38%
58% 17%
41% 15%
38% 16%
27% 11%
23% 6%
Against the weakest opponent, draws are down to 6%, a reduction of 9%. But wins are up 5%, so it wins a bit more than it loses from that, but questionable. Against the strongest opponent, draws go up to 38% but the opponent's winning percentage climbs, which means that the draws we are adding are coming from the games where we were winning but found a way to draw and that big draw score pulled us away from (say) a +1.5 that might well win, and took us to a +2.0 which definitely draws.
Thoughts: My really ad-hoc way of computing the draw score probably needs some work. I don't think one really wants to give away pawns to avoid draws to weaker opponents, while when playing against a stronger opponent, it might well be worthwhile to try to draw a little harder. At least try to draw a little harder against strong opponents, than you try to avoid a draw against weaker ones. I'm going to experiment a bit on the "rating ramp" that sets the draw score. Just choosing Elo increments of 100 was a simple first step. I might now try increments of 100 if the opponent is better, but increments of 200 if the opponent is weaker, to see how that works.
Comments anyone???