Question about ratings

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

User avatar
Bill Rogers
Posts: 3562
Joined: Thu Mar 09, 2006 3:54 am
Location: San Jose, California

Question about ratings

Post by Bill Rogers »

I am testing some really low rated chess programs, ie. below 1,000 elo.
To do this I am using TSCP program because of its know rating.
Now assuming that its rating is about 1700 can I assume that if I play it at only say 6 plys can I assume that its rating will stay proportional.
I also intend to test TSCP at all lower ply levels from 5 to 1 ply just to see what kind of ratings it might have.
I know that most programs or I think I know that even when playing blitz most ratings remain pretty close to its over all rating.
Am I assuming to much here or should I make some allowances for playing at lower ply levels.
Bill
User avatar
Matthias Gemuh
Posts: 3245
Joined: Thu Mar 09, 2006 9:10 am

Re: Question about ratings

Post by Matthias Gemuh »

Bill Rogers wrote: ... if I play it at only say 6 plys can I assume that its rating will stay proportional.

Bill

No. Rating will not stay proportional because search extensions are not scaled accordingly.


Matthias.
My engine was quite strong till I added knowledge to it.
http://www.chess.hylogic.de
User avatar
hgm
Posts: 27700
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Question about ratings

Post by hgm »

Below 1000 is really extremely poor!

Problem with rating determination in this range is that most programs there have such low ratings because they are extremely buggy. Like printing a resign message when they can checkmate the opponent. As a result the score vs strength assumptions underlying rating theory do not apply to such engines at all, confusing most rating-extraction algorithms. And even if the score percentage is a monotonous function of opponent strength, the width of their distribution is often extreme, and not at all the 280 pts assumed in the Elo model: they can beat engines rated 2000, and the next game lose to an engine rated 800 because a bug triggered random move, or time pressure made them do a fatal move without search.

Trying to reduce strength of a sound and reliable engine to get a well calibrated rating scale might thus not be a bad idea. I would recommend using time control (with ponder off) as a means to reduce strength, though, over fixing ply depth to a low value. The latter makes even good engines play like an idiot in the end-game. Note that Winboard_x (and hence WinBoard_F) does support time-odds games (although I never used that myself).

Another reliable and stable engine, significantly weaker than TSCP, is micro-Max 1.6. It should be rated around 1400. It is not weak because it is buggy, just because it is simple (no hash, poor move ordering, very few evaluation terms, slow move generator, primitive search without null move or check extension, QS only considering recaptures). The disadvantage is that it has no book and no randomization of moves, so unless you play from a set of positions (like Nunn or Silver), you cannot play many independent games against another reproducible opponent.

To get still weaker, you have to turn to engines without Quiescence Search. If you want something that really plays like an idiot, use my engine N.E.G. That does not search at all, and does not know what checkmate is: it just moves pieces based on the SEE value of the From and the To Square.