Standard candles

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Standard candles

Post by sje »

In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.

http://en.wikipedia.org/wiki/Cosmic_distance_ladder

It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Standard candles

Post by michiguel »

sje wrote:In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.

http://en.wikipedia.org/wiki/Cosmic_distance_ladder

It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.
Adam did this:
http://adamsccpages.blogspot.com/p/also ... -list.html

"Rating Program: Ordo 0.6 with the switch '-W', and adjusted so that Brutus RND (which is a random mover) has a rating of 0 Elo."

Miguel
Henk
Posts: 7221
Joined: Mon May 27, 2013 10:31 am

Re: Standard candles

Post by Henk »

michiguel wrote:
sje wrote:In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.

http://en.wikipedia.org/wiki/Cosmic_distance_ladder

It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.
Adam did this:
http://adamsccpages.blogspot.com/p/also ... -list.html

"Rating Program: Ordo 0.6 with the switch '-W', and adjusted so that Brutus RND (which is a random mover) has a rating of 0 Elo."

Miguel
I guess a rating below zero is not possible. But it is possible to play such that it forces the opponent to play a check mate move.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Standard candles

Post by Sven »

Henk wrote:I guess a rating below zero is not possible. But it is possible to play such that it forces the opponent to play a check mate move.
You are guessing wrong. The whole rating scale in each pool is arbitrary. FIDE could change the scale such that Carlsen has +500 ELO. In that case players with around 1850 ELO today will have -500 ELO, and the rating system would still be valid. Prof. Elo who invented the Elo rating system did not intend such negative values but the system still works with them.

Sven
User avatar
hgm
Posts: 27837
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Standard candles

Post by hgm »

I once mate an attempt to accurately measure the ratings of the very weakest engines in the ChessWar Promo division. They did end up below zero. And I had the feeling these were even over-estimated, and would go down several hundred Elo more if there just had been enough intermediate engines to push them further down. The gap between engines like Ccp, Pos and N.E.G. and the weakest, most buggy alpha-beta searchers is truly enormous.

Another problem is that the standard rating models are not valid for these engines. There always is a very sizable probability that they score points against other extremely weak engines that are 500, 1000 or even 3000 Elo stronger. Because these weak engines are so buggy that the often hang before the opponent is checkmated, and thus forfeit on time (or play illegal move, etc.) Of course the standard rating models then refuse to believe that they are really 3000 Elo weaker, when they score points. But it would be easy to construct a sequence of engines all differing by 100 Elo (where the occasional forfeit of the stronger one would not completely corrupt the measurements, and then you would see that sequence would have to be very, very long before you reach the strength of Pos or Brutus random.

Another problem is that engines weakened by randomizing their eval still might recognize very deep mates. This makes them very unbalanced. They play like complete idiots, but (when put up against other complete idiots) then suddenly announce "mate in 11", and play it out perfectly. That is not what you expect of a weak player.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Standard candles

Post by Adam Hair »

Unlike ChessWar (if I remember correctly), I throw out all time forfeits and games adjudicated for illegal moves. Due to this, I exclude engines that tend to hang or perform an illegal move when they are losing. Still, there is still a major source for rating distortion in my list. Some engines can not apply mate with an overwhelming majority of material (I have forgotten the reason for this and would not mind if someone explained it again). It is these engines that give up half points to Brutus random, POS, etc ... in my testing.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Standard candles

Post by Adam Hair »

Unlike ChessWar (if I remember correctly), I throw out all time forfeits and games adjudicated for illegal moves. Due to this, I exclude engines that tend to hang or perform an illegal move when they are losing. Still, there is still a major source for rating distortion in my list. Some engines can not apply mate with an overwhelming majority of material (I have forgotten the reason for this and would not mind if someone explained it again). It is these engines that give up half points to Brutus random, POS, etc ... in my testing.
Henk
Posts: 7221
Joined: Mon May 27, 2013 10:31 am

Re: Standard candles

Post by Henk »

I think my chess program won't be able to finish a KRK endgame too. Only if there is a mate in 12 or so he will probably see it. I have not tested it but I encountered some awkward endgames.
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: Standard candles

Post by Robert Pope »

Or, a weak program will see multiple moves that are mate in 3+, and always end up picking the move that doesn't progress toward mate, until they draw by repetition.
Henk
Posts: 7221
Joined: Mon May 27, 2013 10:31 am

Re: Standard candles

Post by Henk »

Robert Pope wrote:Or, a weak program will see multiple moves that are mate in 3+, and always end up picking the move that doesn't progress toward mate, until they draw by repetition.
Yes something like that will happen with my chess program too. With mate in 12 I meant mate in 12 plies. But even that might be too optimistic.