I do not believe that the standard rating model is valid for something.hgm wrote:I once mate an attempt to accurately measure the ratings of the very weakest engines in the ChessWar Promo division. They did end up below zero. And I had the feeling these were even over-estimated, and would go down several hundred Elo more if there just had been enough intermediate engines to push them further down. The gap between engines like Ccp, Pos and N.E.G. and the weakest, most buggy alpha-beta searchers is truly enormous.
Another problem is that the standard rating models are not valid for these engines. There always is a very sizable probability that they score points against other extremely weak engines that are 500, 1000 or even 3000 Elo stronger. Because these weak engines are so buggy that the often hang before the opponent is checkmated, and thus forfeit on time (or play illegal move, etc.) Of course the standard rating models then refuse to believe that they are really 3000 Elo weaker, when they score points. But it would be easy to construct a sequence of engines all differing by 100 Elo (where the occasional forfeit of the stronger one would not completely corrupt the measurements, and then you would see that sequence would have to be very, very long before you reach the strength of Pos or Brutus random.
Another problem is that engines weakened by randomizing their eval still might recognize very deep mates. This makes them very unbalanced. They play like complete idiots, but (when put up against other complete idiots) then suddenly announce "mate in 11", and play it out perfectly. That is not what you expect of a weak player.
Rating is always dependent on the pool of opponents that you play.
I think that it may be interesting to compute the rating advantage that some engine has against the random mover by having the following players.
1)a strong program X(call it X(0))
2)the same program except playing a random moves in 1% of the cases(call it X(1))
3)The same program except playing a random move in 2% of the cases(call it X(2))
You have by this way 101 programs X(0),X(1),...X(100) when X(100) is the random mover.
Now make a match of 1000 games between X(i) and X(i+1) for 0<=i<=99
After 100,000 games you get based on the matchs rating for X(i) and you can calculate the difference between X(0) and the random mover X(100).
You can also make 50 matchs of 1000 games between X(i) and X(i+2) for i=0,2,4,...98 and it may be interesting to see also the gap between X(0) and X(100) in these conditions.