In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.
http://en.wikipedia.org/wiki/Cosmic_distance_ladder
It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.
Standard candles
Moderators: hgm, Rebel, chrisw
-
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Standard candles
Adam did this:sje wrote:In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.
http://en.wikipedia.org/wiki/Cosmic_distance_ladder
It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.
http://adamsccpages.blogspot.com/p/also ... -list.html
"Rating Program: Ordo 0.6 with the switch '-W', and adjusted so that Brutus RND (which is a random mover) has a rating of 0 Elo."
Miguel
-
- Posts: 7221
- Joined: Mon May 27, 2013 10:31 am
Re: Standard candles
I guess a rating below zero is not possible. But it is possible to play such that it forces the opponent to play a check mate move.michiguel wrote:Adam did this:sje wrote:In the realm of astrophysics, standard candles are classes of astronomical objects which allow measurement of cosmic distances. Without these candles, we would not know with any certainty the distances to a faraway object unless somehow physical travel were possible.
http://en.wikipedia.org/wiki/Cosmic_distance_ladder
It occurs to me that we might construct standard candles on the elo strength scale. For example, no one knows the actual elo measurement of a totally random player. But that measurement could be determined by artificially constructing progressively weaker machine players. A program would be made we no randomness and earn a (say) 2800 elo rating. From that point, a very slight amount of randomness would be added to produce a program which performed 100 elo points lower after extended (> 1000 game) match competition with the prior version. The downward revision iteration would be continued as necessary.
http://adamsccpages.blogspot.com/p/also ... -list.html
"Rating Program: Ordo 0.6 with the switch '-W', and adjusted so that Brutus RND (which is a random mover) has a rating of 0 Elo."
Miguel
-
- Posts: 4052
- Joined: Thu May 15, 2008 9:57 pm
- Location: Berlin, Germany
- Full name: Sven Schüle
Re: Standard candles
You are guessing wrong. The whole rating scale in each pool is arbitrary. FIDE could change the scale such that Carlsen has +500 ELO. In that case players with around 1850 ELO today will have -500 ELO, and the rating system would still be valid. Prof. Elo who invented the Elo rating system did not intend such negative values but the system still works with them.Henk wrote:I guess a rating below zero is not possible. But it is possible to play such that it forces the opponent to play a check mate move.
Sven
-
- Posts: 27837
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Standard candles
I once mate an attempt to accurately measure the ratings of the very weakest engines in the ChessWar Promo division. They did end up below zero. And I had the feeling these were even over-estimated, and would go down several hundred Elo more if there just had been enough intermediate engines to push them further down. The gap between engines like Ccp, Pos and N.E.G. and the weakest, most buggy alpha-beta searchers is truly enormous.
Another problem is that the standard rating models are not valid for these engines. There always is a very sizable probability that they score points against other extremely weak engines that are 500, 1000 or even 3000 Elo stronger. Because these weak engines are so buggy that the often hang before the opponent is checkmated, and thus forfeit on time (or play illegal move, etc.) Of course the standard rating models then refuse to believe that they are really 3000 Elo weaker, when they score points. But it would be easy to construct a sequence of engines all differing by 100 Elo (where the occasional forfeit of the stronger one would not completely corrupt the measurements, and then you would see that sequence would have to be very, very long before you reach the strength of Pos or Brutus random.
Another problem is that engines weakened by randomizing their eval still might recognize very deep mates. This makes them very unbalanced. They play like complete idiots, but (when put up against other complete idiots) then suddenly announce "mate in 11", and play it out perfectly. That is not what you expect of a weak player.
Another problem is that the standard rating models are not valid for these engines. There always is a very sizable probability that they score points against other extremely weak engines that are 500, 1000 or even 3000 Elo stronger. Because these weak engines are so buggy that the often hang before the opponent is checkmated, and thus forfeit on time (or play illegal move, etc.) Of course the standard rating models then refuse to believe that they are really 3000 Elo weaker, when they score points. But it would be easy to construct a sequence of engines all differing by 100 Elo (where the occasional forfeit of the stronger one would not completely corrupt the measurements, and then you would see that sequence would have to be very, very long before you reach the strength of Pos or Brutus random.
Another problem is that engines weakened by randomizing their eval still might recognize very deep mates. This makes them very unbalanced. They play like complete idiots, but (when put up against other complete idiots) then suddenly announce "mate in 11", and play it out perfectly. That is not what you expect of a weak player.
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Standard candles
Unlike ChessWar (if I remember correctly), I throw out all time forfeits and games adjudicated for illegal moves. Due to this, I exclude engines that tend to hang or perform an illegal move when they are losing. Still, there is still a major source for rating distortion in my list. Some engines can not apply mate with an overwhelming majority of material (I have forgotten the reason for this and would not mind if someone explained it again). It is these engines that give up half points to Brutus random, POS, etc ... in my testing.
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: Standard candles
Unlike ChessWar (if I remember correctly), I throw out all time forfeits and games adjudicated for illegal moves. Due to this, I exclude engines that tend to hang or perform an illegal move when they are losing. Still, there is still a major source for rating distortion in my list. Some engines can not apply mate with an overwhelming majority of material (I have forgotten the reason for this and would not mind if someone explained it again). It is these engines that give up half points to Brutus random, POS, etc ... in my testing.
-
- Posts: 7221
- Joined: Mon May 27, 2013 10:31 am
Re: Standard candles
I think my chess program won't be able to finish a KRK endgame too. Only if there is a mate in 12 or so he will probably see it. I have not tested it but I encountered some awkward endgames.
-
- Posts: 558
- Joined: Sat Mar 25, 2006 8:27 pm
Re: Standard candles
Or, a weak program will see multiple moves that are mate in 3+, and always end up picking the move that doesn't progress toward mate, until they draw by repetition.
-
- Posts: 7221
- Joined: Mon May 27, 2013 10:31 am
Re: Standard candles
Yes something like that will happen with my chess program too. With mate in 12 I meant mate in 12 plies. But even that might be too optimistic.Robert Pope wrote:Or, a weak program will see multiple moves that are mate in 3+, and always end up picking the move that doesn't progress toward mate, until they draw by repetition.