An objective test process for the rest of us?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

hristo

Re: An objective test process for the rest of us?

Post by hristo »

hgm wrote: But I did not want to be too hard on Hristo (and at the same time probe how many games he had true data for). :lol:
You could have been, but it wouldn't help you when you make erroneous statements. :-)

Over a period of 4 years I have ran approximately 10K games. (normally 100 games per match, at fast time controls)
hgm wrote: Of course the fact that you can get it with the same engines, does prove the point that even if he had actually observed this, it would be no prve whatsoever that the strength difference between the engines actually did vary with time control.
You seem to agree that strength difference varies with time control, which you initially appear to dismiss outright. (perhaps that was a misunderstanding)
You only contest the amount (detectability) of such variance with respect to given time controls.

How I came up with this conclusion (strength difference between engines depends on time controls) is irrelevant since others (including you .. doh) seem to agree with it.
I don't remember claiming to have a proof, BTW.

Please note, that I was not talking about the case where the overall winner changes with respect to the time control, but rather about the case the winner wins by different margin. (yes, one can challenge this on the premise of not enough data ...)

p.s.
What has come upon you lately? You have become quite militant in your pursuit of not being "wrong".
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: An objective test process for the rest of us?

Post by bob »

I think the thing that is getting on everyone's nerves is the fact that it appears that the number of games required to make sensitive decisions is far larger than was originally thought. Large enough that it is actually very difficult to play enough unless you have some wild hardware resources.

I (and others) have said for years that events like the WCCC don't prove a thing about which program is best, unless you have a run like chess 3.x/4.x had where they won almost every year for nearly 10 years, or deep thought which did the same. If you win enough, it becomes convincing, but even that just identifies when one program is far superior to the others. But to test for determining whether a change is good or not requires far more games, and that is a bit disappointing... Particularly when quite a few (myself included) have been using what is obviously way too few games in the past...
hristo

Re: An objective test process for the rest of us?

Post by hristo »

bob wrote:I think the thing that is getting on everyone's nerves is the fact that it appears that the number of games required to make sensitive decisions is far larger than was originally thought. Large enough that it is actually very difficult to play enough unless you have some wild hardware resources.
Chess is fun for me and I try to avoid letting it get on my nerves.

Sharing experiences with others helps, IMO, because in some small way it increases the sample pool even if it doesn't prove anything.

The sheer number of games that you run is mind-boggling, so perhaps someone who is in dire need for more data can lobby you to run a particular test. :-)
bob wrote: I (and others) have said for years that events like the WCCC don't prove a thing about which program is best, unless you have a run like chess 3.x/4.x had where they won almost every year for nearly 10 years, or deep thought which did the same. If you win enough, it becomes convincing, but even that just identifies when one program is far superior to the others. But to test for determining whether a change is good or not requires far more games, and that is a bit disappointing... Particularly when quite a few (myself included) have been using what is obviously way too few games in the past...
I suspect that most people (non professionals) use far less test cases than are needed to determine "goodness of change". But that is part of the fun, we get to extrapolate and even read tealeaves from time to time -- and feel good about where things are going. ;-)

Regards,
Hristo