dynamic draw scores (dynamic contempt)`

bob · Post by **bob** » Thu Sep 08, 2011 10:06 pm

This is intended to be a continuation of the discussion going on a week or two back, where Don and I were discussing draw scores. I thought it was worth testing again. Before I did, I had to do a little cleaning up on my draw scoring code as one place (drawish endgames) assumed drawscore was zero and pulled the score closer to zero if it was drawish. It now pulls it closer to "drawscore".

Next, I modified my dynamic draw score calculation to basically compute the difference between Crafty's rating and the opponent's rating (obtained from a small file that was derived from previous test runs on the cluster.) I rounded the rating difference to multiples of 100 as follows:

difference = (opponent_rating - crafty_rating + 50) / 100

edit: if opponent rating is > crafty, I add 50, if opponent rating is < crafty, I subtract 50. Forgot about this until after I re-read it.

If the opponent is +200 better, "difference" = +2.

The draw score is set to a parameter set during testing, called "scale" multiplied by the rating difference above. I tried a bunch of different "scale" values, from zero to +200 (2.0 pawns). Small numbers are close, big numbers get worse.

The version tested is 23.5R10, which is the version with the new way of setting the default contempt based on rating. Old approach had a limit of +/-50 regardless of the rating difference, I wanted to see what happens on really big numbers...

Results:

Code: Select all

Crafty-23.5R10-30    2648    4    4 30000   62%  2544   20%  
Crafty-23.5R10-40    2647    4    4 30000   62%  2544   20%  
Crafty-23.5R10-20    2646    4    4 30000   62%  2544   20%  
Crafty-23.5R10-10    2646    3    3 60000   63%  2544   20%  
Crafty-23.5R10-60    2645    4    4 30000   62%  2544   20%  
Crafty-23.5R10-80    2642    4    4 30000   61%  2544   20%  
Crafty-23.5R10-0     2640    4    4 30000   62%  2544   20%  
Crafty-23.5-1        2639    4    4 30000   62%  2544   21%  
Crafty-23.5R10-100   2636    4    4 30000   60%  2544   19%  
Crafty-23.5-2        2636    4    4 30000   62%  2544   21%  
Crafty-23.5R10-120   2636    4    4 30000   60%  2544   19%  
Crafty-23.5R10-140   2630    4    4 30000   59%  2544   18%  
Crafty-23.5R10-200   2624    4    4 30000   58%  2544   17%

23.5-1 and 23.5-2 are just two calibration runs of the default version, which assumed draw-score = 0. The R10-n versions used a scale of N. For example, with a scale of 10, and a difference of 3, the draw score was set to 30, because the opponent is 300 points better.

Observations. The error bar is generally +/-4 so don't lose sight of that. I do not know how I ran 23.5R10-10 twice, but I did, which dropped its error bar somewhat. Notice that the normal version was rated 2636 in one run, 2639 in another. The new version with scale=0 was 2640, and as it should be, was right in line with the others. Notice that some of the bigger numbers are better, but it doesn't take much before they start dropping. And by the time you get to 200 (which means for an opponent 200 points higher rated, drawscore = 400) things are dropping.

Next: check the draw percentages. As the draw score climbs, Crafty tries harder to avoid draws against weaker opponents, and tries harder to reach a drawn position against stronger opponents. The normal draw % is 21, but with scale=200, it is down to 17%. But as the drawing percentage drops, so does the winning percentage drop from the usual 62% to 58%. Obviously it is drawing less and losing more, no big surprise with a score that large.

Against the cluster opponents, I see the following loss/draw rates when looking at just the scale=0 (default) games:

edit: 78% means opponent beats Crafty 78% of time and draws 22%, from Bayeselo...

78% 22%
62% 20%
36% 24%
34% 23%
22% 19%
18% 15%

When I take the highest rating of the above, which is scale=30 (ignoring the error bar of course) I see this:

75% 32%
62% 20%
37% 22%
35% 21%
23% 17%
18% 11%

What changed? Against the best opponent, it won 3% more games, and drew 10% more games. Good change. Against the worst, it drew 4% fewer games, but did not increase its winning percentage, so it lost most of those it would have drawn. Not so good.o

Finally, for the worst-case, scale=200, I see these numbers:

78% 38%
58% 17%
41% 15%
38% 16%
27% 11%
23% 6%

Against the weakest opponent, draws are down to 6%, a reduction of 9%. But wins are up 5%, so it wins a bit more than it loses from that, but questionable. Against the strongest opponent, draws go up to 38% but the opponent's winning percentage climbs, which means that the draws we are adding are coming from the games where we were winning but found a way to draw and that big draw score pulled us away from (say) a +1.5 that might well win, and took us to a +2.0 which definitely draws.

Thoughts: My really ad-hoc way of computing the draw score probably needs some work. I don't think one really wants to give away pawns to avoid draws to weaker opponents, while when playing against a stronger opponent, it might well be worthwhile to try to draw a little harder. At least try to draw a little harder against strong opponents, than you try to avoid a draw against weaker ones. I'm going to experiment a bit on the "rating ramp" that sets the draw score. Just choosing Elo increments of 100 was a simple first step. I might now try increments of 100 if the opponent is better, but increments of 200 if the opponent is weaker, to see how that works.

Comments anyone???

sje · Post by **sje** » Thu Sep 08, 2011 10:54 pm

I have to say that I disagree with both nonzero draw scores and nonzero contempt values.

My idea is to follow the old adage "Play the board and not the man."

Now I realize that this may not always maximize the end result tally. But I feel that ignoring extraneous data such as the opponent's rating or any title will lead to better chess overall.

mhull · Post by **mhull** » Thu Sep 08, 2011 11:27 pm

sje wrote:I have to say that I disagree with both nonzero draw scores and nonzero contempt values.

My idea is to follow the old adage "Play the board and not the man."

Now I realize that this may not always maximize the end result tally. But I feel that ignoring extraneous data such as the opponent's rating or any title will lead to better chess overall.

On the other hand, AI is implicit in computer chess. It is enjoyable to watch a program that can seem truly intelligent, dynamic and capable of some level of gamesmanship, which is the essence of joie de jeu, a goal evident in the names of many programs, e.g. "crafty".

Don · Post by **Don** » Thu Sep 08, 2011 11:39 pm

sje wrote:I have to say that I disagree with both nonzero draw scores and nonzero contempt values.

My idea is to follow the old adage "Play the board and not the man."

Now I realize that this may not always maximize the end result tally. But I feel that ignoring extraneous data such as the opponent's rating or any title will lead to better chess overall.

All general principles should be applied with a degree of balance. The general principle is good, but slavish devotion to it is bad.

Another principle is "do not bring your queen out early" but a master friend of mine once told me that this principle has screwed up a lot of beginners who would not bring their queen out for anything. He proposed a better principle which was, "bring out your queen as early as you can." His main point was that you need to develop your queen too. Some silly rule should not prevent you from making a strong developing move.

The general principle of playing the board to me simply means that you need to make strong sound moves and not be "overly" concerned with your opponent. This same master told me that many weaker players lose to strong players because they are afraid to "mix it up" tactically out of undue respect for their opponents ratings. So instead of pressing any advantage they give it away without any fight. Instead of playing the board they played the opponent and lost.

All the top players prepare carefully for their opponents in important matches when the know who they will be playing. Sometimes they keep ideas for years to spring on specified opponents. I think they know what they are doing and the represent the best balance between slavish devotion to rules and balanced consideration of general principles.

If Komodo is playing a program 700 ELO weaker and the opponent has white, Komodo will likely come out of book with a slight disadvantage. There is no way we are going to happily consider a draw in this situation due to slavish devotion to a "general" principle.

By the same token, if Magnus Carlsen offers me a draw in a game we play - I'm probably going to accept even if I have a slight advantage. I would need to be up more than a pawn or two to say no.

BubbaTough · Post by **BubbaTough** » Thu Sep 08, 2011 11:39 pm

You may want to ask Brian Richardson how he auto-sets contempt for Tinker on ICC. I seem to remember he did something decent related to that and automated draw offers.

I am guessing most of your draw heuristics are from either very drawn endgames or for "no progress" positions where pawns and captures have not happened for a while. For these types of things it is no surprise to me that it helps against strong opponents AND weak opponents to not push for a win from either a slightly better or slightly worse position. It is consistent with my early results as well. My results were +20-30 elo against stronger opposition to have contempt of 20-30, and break even against weaker opposition to have contempt ranging from 0 - 30 Stronger and weaker were at least 100 elo difference between Hannibal and the opposing program if I remember right.

I would guess the more you added about understanding drawish opening/middlegames, and improved understanding about not trying to push for wins in drawn endgames, your dynamic contempt will improve more.

-Sam

bob · Post by **bob** » Fri Sep 09, 2011 2:47 am

sje wrote:I have to say that I disagree with both nonzero draw scores and nonzero contempt values.

My idea is to follow the old adage "Play the board and not the man."

Now I realize that this may not always maximize the end result tally. But I feel that ignoring extraneous data such as the opponent's rating or any title will lead to better chess overall.

I don't "play the board" myself. If I don't know the opponent, I certainly will, but if I know my opponent is much weaker than me, I will certainly play differently, particularly when assessing whether to take an easy draw or a potentially complicated win.

dynamic draw scores (dynamic contempt)`

dynamic draw scores (dynamic contempt)`

Re: dynamic draw scores (dynamic contempt)`

Re: dynamic draw scores (dynamic contempt)`

Re: dynamic draw scores (dynamic contempt)`

Re: dynamic draw scores (dynamic contempt)`

Re: dynamic draw scores (dynamic contempt)`