for the rating of chess programs with error of less than 100 elo and you can even play only against a single opponent(there is no reason to spend time on testing against many opponents because performance seem to have little dependency on the opponent).
I checked the cegt 120/40 rating list that is based on many matches of
50 games between programs(310 matches).
I looked for matches when the difference between program's rating and program's performance is at least 100 elo.
I found only 4 matches.
Rybka 2.3.2a 64 2CPU 3048-Fruit 2.2 - 2782 performance of rybka 3155(+107)
Rybka 2.1c 64 2CPU- 3000-Loop 13.5 2CPU - 2839 perormance 3124
(+124)
Rybka 1.2f 64-bit 2969- Loop 13.5 2CPU - 2839 performance 3080
(+111)
Naum 2.2 x64 2CPU 2899 -Deep Shredder 10 x64 2CPU - 2853 performance 3018
(+119)
Uri
50 games are usually enough to get a good estimate
Moderator: Ras
-
Uri Blass
- Posts: 10985
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: 50 games are usually enough to get a good estimate
Interesting but irrelevant. The interesting issue is determining whether a new version is better than an old version. You are not going to be finding 100 Elo changes in such testing. 10 would be a big improvement. 50 games won't come within a light-year of giving a reliable indication of that kind of measurement...Uri Blass wrote:for the rating of chess programs with error of less than 100 elo and you can even play only against a single opponent(there is no reason to spend time on testing against many opponents because performance seem to have little dependency on the opponent).
I checked the cegt 120/40 rating list that is based on many matches of
50 games between programs(310 matches).
I looked for matches when the difference between program's rating and program's performance is at least 100 elo.
I found only 4 matches.
Rybka 2.3.2a 64 2CPU 3048-Fruit 2.2 - 2782 performance of rybka 3155(+107)
Rybka 2.1c 64 2CPU- 3000-Loop 13.5 2CPU - 2839 perormance 3124
(+124)
Rybka 1.2f 64-bit 2969- Loop 13.5 2CPU - 2839 performance 3080
(+111)
Naum 2.2 x64 2CPU 2899 -Deep Shredder 10 x64 2CPU - 2853 performance 3018
(+119)
Uri