Hey Bob,
As a bit of a newbie, catching up on the history:
What, when, why, how was it realized that it takes so many games
to establish an ELO difference between programs.
Regards
Laurie
Some history from Bob please
Moderator: Ras
-
syzygy
- Posts: 5869
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Some history from Bob please
That's just basic statistics.
Repeatedly flip a coin and count the number of heads and tails. At what moment can you tell with any confidence that the coin is biased?
This is the subject of statistics, a branch of mathematics.
Repeatedly flip a coin and count the number of heads and tails. At what moment can you tell with any confidence that the coin is biased?
This is the subject of statistics, a branch of mathematics.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some history from Bob please
When Elo wrote his book?lauriet wrote:Hey Bob,
As a bit of a newbie, catching up on the history:
What, when, why, how was it realized that it takes so many games
to establish an ELO difference between programs.
Regards
Laurie
The only rule I use is that the error bar has to fit well inside the probable Elo range for the change being tested. If the expected gain/loss is small, a bunch of games is needed. To figure out whether null-move works or not, you can get by with far fewer games since the expected gain is quite large (+80 to +120 depending).
-
hgm
- Posts: 28456
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Some history from Bob please
Of course it also has to do with the fact that today there are hundreds of engines in thousands of versions, all practically identical, so that the Elo differences you want to measure are almost non-existent. In the days where there were only 5 programs, which differed hundreds of Elo in strength, not many games were needed at all to determine their Elo difference to the desired precision.
-
syzygy
- Posts: 5869
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Some history from Bob please
And it was possible to observe with the naked eye that the engine stopped playing 2.Ke2hgm wrote:Of course it also has to do with the fact that today there are hundreds of engines in thousands of versions, all practically identical, so that the Elo differences you want to measure are almost non-existent. In the days where there were only 5 programs, which differed hundreds of Elo in strength, not many games were needed at all to determine their Elo difference to the desired precision.
-
lauriet
- Posts: 199
- Joined: Sun Nov 03, 2013 9:32 am
Re: Some history from Bob please
Does that mean the ratings on sites like SSDF are pretty unreliable ?
Sometimes they only have 50->100 games against a machine that is closely rated to the one being tested.
Regards
Laurie.
Sometimes they only have 50->100 games against a machine that is closely rated to the one being tested.
Regards
Laurie.
-
Vinvin
- Posts: 5312
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Some history from Bob please
http://ssdf.bosjo.net/list.htm gives the error bar ("+ -") from long time ago.lauriet wrote:Does that mean the ratings on sites like SSDF are pretty unreliable ?
Sometimes they only have 50->100 games against a machine that is closely rated to the one being tested.
Regards
Laurie.
-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Some history from Bob please
Might be better than 2.Rh2.syzygy wrote: And it was possible to observe with the naked eye that the engine stopped playing 2.Ke2
2.Ke2 at least implies that move 1 was decent.
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Some history from Bob please
Depends on the TOTAL number of games a program has played. You can play 30K games against one opponent, or 1 game against 30K opponents (the latter is actually preferred).lauriet wrote:Does that mean the ratings on sites like SSDF are pretty unreliable ?
Sometimes they only have 50->100 games against a machine that is closely rated to the one being tested.
Regards
Laurie.
-
syzygy
- Posts: 5869
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Some history from Bob please
Maybe the very first version of my engine wasn't so bad after allzullil wrote:Might be better than 2.Rh2.syzygy wrote: And it was possible to observe with the naked eye that the engine stopped playing 2.Ke2
2.Ke2 at least implies that move 1 was decent.