50 games are usually enough to get a good estimate

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Uri Blass
Posts: 10985
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

50 games are usually enough to get a good estimate

Post by Uri Blass »

for the rating of chess programs with error of less than 100 elo and you can even play only against a single opponent(there is no reason to spend time on testing against many opponents because performance seem to have little dependency on the opponent).

I checked the cegt 120/40 rating list that is based on many matches of
50 games between programs(310 matches).

I looked for matches when the difference between program's rating and program's performance is at least 100 elo.

I found only 4 matches.

Rybka 2.3.2a 64 2CPU 3048-Fruit 2.2 - 2782 performance of rybka 3155(+107)
Rybka 2.1c 64 2CPU- 3000-Loop 13.5 2CPU - 2839 perormance 3124
(+124)
Rybka 1.2f 64-bit 2969- Loop 13.5 2CPU - 2839 performance 3080
(+111)

Naum 2.2 x64 2CPU 2899 -Deep Shredder 10 x64 2CPU - 2853 performance 3018
(+119)

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 50 games are usually enough to get a good estimate

Post by bob »

Uri Blass wrote:for the rating of chess programs with error of less than 100 elo and you can even play only against a single opponent(there is no reason to spend time on testing against many opponents because performance seem to have little dependency on the opponent).

I checked the cegt 120/40 rating list that is based on many matches of
50 games between programs(310 matches).

I looked for matches when the difference between program's rating and program's performance is at least 100 elo.

I found only 4 matches.

Rybka 2.3.2a 64 2CPU 3048-Fruit 2.2 - 2782 performance of rybka 3155(+107)
Rybka 2.1c 64 2CPU- 3000-Loop 13.5 2CPU - 2839 perormance 3124
(+124)
Rybka 1.2f 64-bit 2969- Loop 13.5 2CPU - 2839 performance 3080
(+111)

Naum 2.2 x64 2CPU 2899 -Deep Shredder 10 x64 2CPU - 2853 performance 3018
(+119)

Uri
Interesting but irrelevant. The interesting issue is determining whether a new version is better than an old version. You are not going to be finding 100 Elo changes in such testing. 10 would be a big improvement. 50 games won't come within a light-year of giving a reliable indication of that kind of measurement...