Standard candles

sje · Post by **sje** » Tue Aug 27, 2013 1:52 pm

In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.

http://en.wikipedia.org/wiki/Cosmic_distance_ladder

It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.

michiguel · Post by **michiguel** » Tue Aug 27, 2013 5:52 pm

sje wrote:In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.

http://en.wikipedia.org/wiki/Cosmic_distance_ladder

It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.

Adam did this:
http://adamsccpages.blogspot.com/p/also ... -list.html

"Rating Program: Ordo 0.6 with the switch '-W', and adjusted so that Brutus RND (which is a random mover) has a rating of 0 Elo."

Miguel

Henk · Post by **Henk** » Tue Aug 27, 2013 6:11 pm

michiguel wrote:
sje wrote:In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.

http://en.wikipedia.org/wiki/Cosmic_distance_ladder

It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.
Adam did this:
http://adamsccpages.blogspot.com/p/also ... -list.html

"Rating Program: Ordo 0.6 with the switch '-W', and adjusted so that Brutus RND (which is a random mover) has a rating of 0 Elo."

Miguel

I guess a rating below zero is not possible. But it is possible to play such that it forces the opponent to play a check mate move.

Sven · Post by **Sven** » Tue Aug 27, 2013 7:06 pm

Henk wrote:I guess a rating below zero is not possible. But it is possible to play such that it forces the opponent to play a check mate move.

You are guessing wrong. The whole rating scale in each pool is arbitrary. FIDE could change the scale such that Carlsen has +500 ELO. In that case players with around 1850 ELO today will have -500 ELO, and the rating system would still be valid. Prof. Elo who invented the Elo rating system did not intend such negative values but the system still works with them.

Sven

hgm · Post by **hgm** » Tue Aug 27, 2013 7:19 pm

I once mate an attempt to accurately measure the ratings of the very weakest engines in the ChessWar Promo division. They did end up below zero. And I had the feeling these were even over-estimated, and would go down several hundred Elo more if there just had been enough intermediate engines to push them further down. The gap between engines like Ccp, Pos and N.E.G. and the weakest, most buggy alpha-beta searchers is truly enormous.

Another problem is that the standard rating models are not valid for these engines. There always is a very sizable probability that they score points against other extremely weak engines that are 500, 1000 or even 3000 Elo stronger. Because these weak engines are so buggy that the often hang before the opponent is checkmated, and thus forfeit on time (or play illegal move, etc.) Of course the standard rating models then refuse to believe that they are really 3000 Elo weaker, when they score points. But it would be easy to construct a sequence of engines all differing by 100 Elo (where the occasional forfeit of the stronger one would not completely corrupt the measurements, and then you would see that sequence would have to be very, very long before you reach the strength of Pos or Brutus random.

Another problem is that engines weakened by randomizing their eval still might recognize very deep mates. This makes them very unbalanced. They play like complete idiots, but (when put up against other complete idiots) then suddenly announce "mate in 11", and play it out perfectly. That is not what you expect of a weak player.

Adam Hair · Post by **Adam Hair** » Tue Aug 27, 2013 9:21 pm

Unlike ChessWar (if I remember correctly), I throw out all time forfeits and games adjudicated for illegal moves. Due to this, I exclude engines that tend to hang or perform an illegal move when they are losing. Still, there is still a major source for rating distortion in my list. Some engines can not apply mate with an overwhelming majority of material (I have forgotten the reason for this and would not mind if someone explained it again). It is these engines that give up half points to Brutus random, POS, etc ... in my testing.

Adam Hair · Post by **Adam Hair** » Tue Aug 27, 2013 9:25 pm

Unlike ChessWar (if I remember correctly), I throw out all time forfeits and games adjudicated for illegal moves. Due to this, I exclude engines that tend to hang or perform an illegal move when they are losing. Still, there is still a major source for rating distortion in my list. Some engines can not apply mate with an overwhelming majority of material (I have forgotten the reason for this and would not mind if someone explained it again). It is these engines that give up half points to Brutus random, POS, etc ... in my testing.

Henk · Post by **Henk** » Tue Aug 27, 2013 9:30 pm

I think my chess program won't be able to finish a KRK endgame too. Only if there is a mate in 12 or so he will probably see it. I have not tested it but I encountered some awkward endgames.

Robert Pope · Post by **Robert Pope** » Tue Aug 27, 2013 9:41 pm

Or, a weak program will see multiple moves that are mate in 3+, and always end up picking the move that doesn't progress toward mate, until they draw by repetition.

Henk · Post by **Henk** » Tue Aug 27, 2013 9:49 pm

Robert Pope wrote:Or, a weak program will see multiple moves that are mate in 3+, and always end up picking the move that doesn't progress toward mate, until they draw by repetition.

Yes something like that will happen with my chess program too. With mate in 12 I meant mate in 12 plies. But even that might be too optimistic.

Standard candles

Standard candles

Re: Standard candles

Re: Standard candles

Re: Standard candles

Re: Standard candles

Re: Standard candles

Re: Standard candles

Re: Standard candles

Re: Standard candles

Re: Standard candles