Charles Roberson wrote: When you say CC, do you mean computer chess or correspondence chess?
This is a nice stab at an approximation method. Things like this
have been tried for years and failed. However, maybe it could be
accurate to say 200 points.
There are flaws in the method. It takes more positions to do this
accurately. Also, your method of only allowing 4 seconds then adding
4 for failures has multiple issues.
1) A program may produce the "correct" answer in 4 seconds then
move to something else at 5 seconds and never return.
2) You are clustering programs together that don't see the answer
in 4 seconds as programs that will see it in 8 seconds. Some will
see it in 8 and others may never see it.
3) Sometimes there are multiple correct answers.
4) How do you come up with the "correct" answer?
Your method ignores issues in creating a good or great
computer chess program such as the timing algorithm.
I think your luck has been in testing only top programs. Try a set
of programs scattered through the CCRL list. Lets say 1 program
for every 50 points. This would be 16 programs from 2200 to 2800.
Actually, I see your method as a decent way to detect potential
clones.
Correspondence chess.
My average absolute error is 38 Elo. And this is all assuming that CCRL ratings are more or less correct. My test does not measure "game elo", where a game score is also partially the function of the quality of an engines time management.
1) Of course you are correct. My hypothesis was that higher rated engines would find the move in less time and they did. I have no doubt that Fruit 2.1, with 1000x more time, or Naum with 2x more time would be comparable to Rybka with x time. My very hypothesis rests on the fact that different engines of different strengths will find the "solution" in varying amounts of time, depending on their "absolute" strength, if such a thing could be measured.
2) See 1)
3) I know of no way around this problem at this time. I am sure some engines were penalized for finding solutions that were even better than the proposed solution but I have no way to determine this.
4) My "correct" answers were taken from the CC games. I believe these CC games are played at a level of at least 500 Elo higher than these engines at t=4. So, for all practical purposes the moves played from these games should be judged as more or less correct from the perspective of a player 500+ Elo below these CC GM's. What I end up with is the fact the more similar an engine plays to your average CC GM's the higher rated that engine will actually be.
"Your method ignores issues in creating a good or great
computer chess program such as the timing algorithm. "
It was not my intention to measure the performance of the "timing algorithm", assuming we are talking about something like time management. In fact, I wish CCRL and CEGT did not either. An engines use at raw analysis might be overestimated by the fact that it does have exceptional time management and therefor scores better in tournaments. It is not clear to me how effective time management lends itself to actual analysis.
I threw Movei in the mix to see what would happen and it scored 2763, its CCRL rating: 2732. I will test other lower rated engines when I get a chance.