Based on the answers the table looks now as follows.
Code: Select all
List Since h Start/ Last Rating Time Clock CPU Ponder Book 32/64
u Elo Rybka3/32Bit move/ GHz Bit
m 1 CPU
----------------------------------------------------------------------------------
SSDF 1984 y ? / 40/120 2,4 4**** yes ? 64
CCRL 2005 y pool 3098 40/40* 2,4 1,2,4 no several par
CEGT 2006 ? 2750** 3048 40/20* 2.0 1,2,4 no several par
IPON 2009/Dec y 2800*** 2848 5'+3'' 3.0 1,2 yes 50 pos. 64
SWCR 2009/Dec y 2800*** 2851 40/10 2.8 1 yes Shr12 par
----------------------------------------------------------------------------------
hum= human oriented Elo calibration
* several time controls available
** reference engine = Shredder 9.1;1 CPU
*** reference engine = Shredder 12; 1 CPU
**** listed old programs and chess computers,too. 4 CPU since 2008
64= only 64 Bit tested, if available
par= most engines parallel 32 and 64 Bit tested
pool= calibration by more engines
The little inquiry pointed out that the included ranking lists are obviously oriented on human results. (CEGT with a question mark, because nobody seems to remember the reasons why the start Elo was set to Shredder 9.1 2750 Elo). CCRL took over the SSDF-calibration and IPON and SCRW started a new calibration, based on a recommendation of two German GMs. The SSDF calibration dates from 2000, based on a selection of 115 Man vs Machine games. Because there was no reaction by SSDF and no comment at the site, I assume this calibration is still the current state.
Using the values of Rybka3/32Bit/1Cpu, you can find significant differences between CCRL, CEGT and IPON/SWCR. In practical terms, IPON and SWCR give Carlsen, Anand and Kramnik still a chance against the top chess programs, how many cores and which hardware may be used, is another question. Following the SSDF and CCRL calibration, the human chess elite would fight losing battles against the top engines. Who is right?
The Elo system is in the context of human and artificial intelligence a most popular and controversial topic which has often been discussed in the fora. No rarely you can find the statement that the scale is made for humans and doesn't correctly represent the ranking of chess programs.
GM, computer chess expert and multiple World Senior Chess Champion LARRY KAUFMAN honoured this thread with a few thoughts and experiences about that issue. The IPON/SCWCR calibration seems too low to him and the CCRL/CEGT calibration too high. As far as I have understood his explanations, I'll try an interpretation. Larry points to the high similarity of playing chess by the programs. The high congruence of the algorithms leads very quickly to a better scoring of programs when improvements are implemented. Humans play more individual and less constantly. Therefore the top players are scoring less significantly than programs and the matches often end much closer. The consequence is an inflating and overstated scoring of the superior engines, if only engines play against each other. IMO, a very complex hypothesis which should be researched scientifically. If human games and engine games cannot be subsumed to one statistical population, than you have to develop another resp. an adapted scale for the ranking of chess programs. I think there are a lot of statistal methods to test the relevant parameters. As long as such a study is not available, try it with Larry's 25% rule

Thanks for the mainly constructive answers. The threatening, virulent, thread destroying, all-around lurking clone discussion plague has remained limited, thankfully. Something more about calibration in the next GGT report.
Rainer