Tuning for rating lists ?

Discussion of chess software programming and technical issues.

Moderator: Ras

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Tuning for rating lists ?

Post by mcostalba »

There is this magical parameter, called "contempt" that has this interesting property, if enabled lets your engine to become strong with the weakest and weak with the strongest yes, is not a very ethical one :-) Apart from the technical merit of this: I have serious doubts that just tweaking the draw score is enough to enable a more aggressive and risky style of play, but this is another topic.

Here the topic is, assuming that this contempt does the trick, it happens that in the rating lists with many engines weaker than yours, so when your engine is in the top half of the list, this contempt can, more or less, artificially push you up. The side effect is that your engine becomes weaker with the strongest, so if you, for instance, want to participate to a tournament with elimination rounds until the final, this contempt factor perhaps it is wise to disable.

Personally I'd prefer to be strong with the strongest and...merciful :-) with the weakest, IOW I prefer the engine does well in tournaments and in one-to-one direct matches even if this means to give up some points in the rating lists.
Rein Halbersma
Posts: 751
Joined: Tue May 22, 2007 11:13 am

Re: Tuning for rating lists ?

Post by Rein Halbersma »

So why not let contempt be a function of opponent rating (or even the opponent identity) and let the tournament manager pass the current TPR of the opponent (or its name) to each engine? I don't think there is any unethical with adapting playing style to an opponent, as long as there is an agreed upon interface to handle such things.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Tuning for rating lists ?

Post by mcostalba »

Rein Halbersma wrote:So why not let contempt be a function of opponent rating (or even the opponent identity) and let the tournament manager pass the current TPR of the opponent (or its name) to each engine? I don't think there is any unethical with adapting playing style to an opponent, as long as there is an agreed upon interface to handle such things.
We already have this contempt UCI option, the user can directly modify that. The point is to have a strong default. An engine people take and runs off the shelves. One possibility would be to "infer" in some way the strength of the player while playing and adapt to it.

Just out of my hat a possible algorithm could be, when you send the best move, to remember the ponder move (also if you are not pondering) and then check how many times the opponent replied with what you think is the best reply. If the opponent misses the best reply many times _and_ your eval goes up along the game then you can increase contempt.....just an idea.
Rein Halbersma
Posts: 751
Joined: Tue May 22, 2007 11:13 am

Re: Tuning for rating lists ?

Post by Rein Halbersma »

mcostalba wrote:
Just out of my hat a possible algorithm could be, when you send the best move, to remember the ponder move (also if you are not pondering) and then check how many times the opponent replied with what you think is the best reply. If the opponent misses the best reply many times _and_ your eval goes up along the game then you can increase contempt.....just an idea.
Yes, or the sum of the score drops after ponder misses, or the move number when you drop out of book. Indeed, a dynamic opponent assessment sounds what humans would do when playing anonymously online.
User avatar
pohl4711
Posts: 2901
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Tuning for rating lists ?

Post by pohl4711 »

mcostalba wrote:There is this magical parameter, called "contempt" that has this interesting property, if enabled lets your engine to become strong with the weakest and weak with the strongest yes, is not a very ethical one :-) Apart from the technical merit of this: I have serious doubts that just tweaking the draw score is enough to enable a more aggressive and risky style of play, but this is another topic.

Here the topic is, assuming that this contempt does the trick, it happens that in the rating lists with many engines weaker than yours, so when your engine is in the top half of the list, this contempt can, more or less, artificially push you up. The side effect is that your engine becomes weaker with the strongest, so if you, for instance, want to participate to a tournament with elimination rounds until the final, this contempt factor perhaps it is wise to disable.

Personally I'd prefer to be strong with the strongest and...merciful :-) with the weakest, IOW I prefer the engine does well in tournaments and in one-to-one direct matches even if this means to give up some points in the rating lists.
The first problem are the ratinglists with too much too weak opponents - the strongest engines will always have distorted results in those ratinglists. But thats not your problem (and not mine: In the LS-ratinglist the engines only have strong opponents...).
But the second problem is, that Stockfish (in my LS-testruns) produces much (!) more draws by 3rd repetition in the middlegame than most of the other engines. It is obvious, that an engine, which does so, will always score bad against weak opponents, because of too many draws in (early) middlegame positions, so that the weaker opponents didnt get the chance to make mistakes and to loose the game...I tried some high contempt factors (35, 50) for Stockfish, but this problem was still there.

Stefan
User avatar
Steve Maughan
Posts: 1315
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Tuning for rating lists ?

Post by Steve Maughan »

Hi Marco,
mcostalba wrote:[...]Apart from the technical merit of this: I have serious doubts that just tweaking the draw score is enough to enable a more aggressive and risky style of play, but this is another topic[...].
I agree - I doubt it has too much impact. I really doubt it changes the style of play. The only real impact will be when the engine perceives its position to be weak (i.e. score < 0) and there is a possibility of a draw by three-move-repetition (and possibly in some cases by the 50 move rule). How many games will this affect? I don't know - but I assume it will be <3% (and even less of you're Stockfish 4 :lol: )

Steve
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: Tuning for rating lists ?

Post by kbhearn »

It can affect significantly more if your draw score is also used as the anchor you're pulling towards for drawish endings (instead of pulling towards zero), preserving more pieces on the board so your opponent still has a chance to blunder the game.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Tuning for rating lists ?

Post by Don »

mcostalba wrote:There is this magical parameter, called "contempt" that has this interesting property, if enabled lets your engine to become strong with the weakest and weak with the strongest yes, is not a very ethical one :-) Apart from the technical merit of this: I have serious doubts that just tweaking the draw score is enough to enable a more aggressive and risky style of play, but this is another topic.

Here the topic is, assuming that this contempt does the trick, it happens that in the rating lists with many engines weaker than yours, so when your engine is in the top half of the list, this contempt can, more or less, artificially push you up. The side effect is that your engine becomes weaker with the strongest, so if you, for instance, want to participate to a tournament with elimination rounds until the final, this contempt factor perhaps it is wise to disable.

Personally I'd prefer to be strong with the strongest and...merciful :-) with the weakest, IOW I prefer the engine does well in tournaments and in one-to-one direct matches even if this means to give up some points in the rating lists.
But the problem with not having comtempt was illustrated dramatically for us in the Leiden tournament. We were playing a against a program significantly weaker than us and had the black pieces. Immediately out of the opening we had a slightly bad position - just as it should be if both sides have a reasonable book. The opponent program played some move which game Komodo the opportunity to repeat the position and of course Komodo was willing to repeat even though the chances of winning that game were very high. The story has a happy ending for us, the other program couldn't decide whether it wanted the draw or not and ultimately refused - but our fate was in someone else's hands! We actually implemented a drawscore feature before the next round as a result of this game.

A couple of years ago I actually studied the affect of contempt on games and was pretty surprised. It does in fact make a pretty large difference when you are playing up or down by more than 100 ELO. If you are playing down 500 ELO it's worth a bundle of ELO because you are going to want to draw these very weak players when you are black and it's going to happen too often.

So I personally think that the ELO difference of the players should be used and told to the programs before the games are played in any rating list or competition and if not known that should be specified too.

For a strong program like Stockfish you should AT LEAST set the default contempt to be the same as the white advantage as estimated by Stockfish - and you could use that when when playing black and zero when playing white if you want to be as neutral as possible without being unduly generous about giving away ELO points. If your weaker opponent is able to overcome that contempt by out-playing Stockfish for a few moves then at least you made him earn it.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
syzygy
Posts: 5861
Joined: Tue Feb 28, 2012 11:56 pm

Re: Tuning for rating lists ?

Post by syzygy »

mcostalba wrote:There is this magical parameter, called "contempt" that has this interesting property, if enabled lets your engine to become strong with the weakest and weak with the strongest yes, is not a very ethical one :-) Apart from the technical merit of this: I have serious doubts that just tweaking the draw score is enough to enable a more aggressive and risky style of play, but this is another topic.
Note that not all engines implement "contempt" by tweaking the draw score. For example, H3 seems to implement it by making the evaluation asymmetric (piece value imbalance, king safety imbalance). See here under the heading "Contempt".
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: Tuning for rating lists ?

Post by kbhearn »

from observation and what i could glean from robert's comments last TCEC, h3's contempt does indeed mostly just move the draw score, but how much it moves it is dynamic (i think based on the root position, but not sure on that), i.e. if there's still queens on the board when houdini was showing repetition lines it was scoring them around 0.2 to 0.3 in favor of the opponent. whereas when it was down to a very drawish ending drawn lines were scoring around 0.1 in favor of the opponent.