TalkChess.com
Hosted by Your Move Chess & Games

Author Message
H.G.Muller

Joined: 10 Mar 2006
Posts: 12911
Location: Amsterdam

Post subject: Re: Observator bias or...    Posted: Fri Jun 08, 2007 9:40 am

Well, excuse me for saying so, but if all you said was that 80 games is not enough, then I don't understand why you said that at all. As the topic under discussion, or at least the one that I addressed and that you reacted on, was the one raised by Uri, if it is preferable to test based on time or based on node count, not how many games are enough:
bob wrote:
 hgm wrote: Yes, for this reason testing at a fixed number of nodes and recording the ime, rather than fixing the time, seems preferable. But of course you cannot get rid of the randomness induced by SMP that way. For this reason I still want to implement the tree comparison idea I proposed here lately. This would eliminate the randomness not by sampling enough games and relying on the (tediously slow) 1/sqrt(N) convergence, but by exhaustively generatng all possible realizations of the game from a given initial position. If the versions under comparison are quite close (the case that is most difficult to test with conventional methods), the entire game tree might consist of less than 100 games, but might give you the accuracy of a 10,000 games that are subject to chance effects.

fixed number of nodes is absolutely worthless. To prove that to yourself, do the following. Play a match using the same starting position, where _both_ programs search a fixed number of nodes (say 20,000,000). Record the results. Then re-play but have both search 20,010,000 nodes (10K more nodes than before). Now look at the results. They won't be anywhere near the same. Which one is more correct? Answer: that's hopeless as you take a small random (the games with 20M nodes per side) from a much larger set of random results, and you base your decisions on that? May as well flip a coin...

my upcoming ICGA paper will show just how horrible this is...

Here you make the general statement that testing with a fixed number of nodes is useless. Without referring to any number of games. And I don't think that 'useless' is the same as 'not enough' (for a particular purpose). Even if you want to stick to the 80 games that suddenly popped out of nowhere, if I have two versions and one of them scored 45 out of 80, while the other scored 0 out of 80, then 80 games are clearly enough to draw the far-reaching conclusion that you have broken something, and it would be plain silly to continue playing 100,000 games with this version. But all of that is standard statistics, which was never an issue in this discussion thread.

To talk about things that are absolutely useles: testing uMax against Crafty, Fruit, Arasan, comes pretty close pretty to that. The more games I would play, the more useles it would be, for in 100,000 games both the old and the improved version of uMax would score 0 points. So How would I know if my improvement worked, or if I had completely broken it? It would just be a giant waste of time. An easy analysis shows that you obtain maximum information per game (so that you get the desired reliability with the smallest number of games) if you test against engines of about equal strength.

On top of that, for those that are testing engines in a higher ELO range, I would even reverse the statement: if Crafty, Fruit, Arasan,... are not able to reproduce their games despite 'random' being switched off, and despite being set for a fixed ply depth, (so that random factors outside of the engines cannot affect their logic), they are clearly not suitable test opponents and are best excluded from any gauntlets you make to evaluate tiny changes in your engine. As using such unpredictable engines needlesly add an enormous statistical variance to the quantity under measurement. Better stick to engines that behave according to specifications.

After all, the idea is to make testing to a certain accuracy as easy as possible. That you could also make it much harder on yourself by picking certain engines with nasty peculiarities, is quite irrelevant if you are smart enough to stay away from them!
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
Subject Author Date/Time
Alessandro Scotti Tue May 29, 2007 6:25 pm
Dann Corbit Tue May 29, 2007 6:33 pm
ed Tue May 29, 2007 6:59 pm
H.G.Muller Wed May 30, 2007 9:43 am
ed Wed May 30, 2007 3:02 pm
cwb Wed May 30, 2007 4:48 pm
Peter Fendrich Wed May 30, 2007 6:36 pm
Robert Hyatt Sun Jun 10, 2007 3:30 pm
Uri Blass Wed May 30, 2007 10:17 am
Alessandro Scotti Wed May 30, 2007 12:35 pm
Robert Hyatt Sat Jun 02, 2007 6:41 am
H.G.Muller Sat Jun 02, 2007 10:18 am
Robert Hyatt Sun Jun 03, 2007 12:54 am
Uri Blass Sun Jun 03, 2007 5:44 am
Robert Hyatt Sun Jun 03, 2007 8:34 pm
H.G.Muller Mon Jun 04, 2007 9:52 am
Robert Hyatt Mon Jun 04, 2007 6:58 pm
H.G.Muller Tue Jun 05, 2007 2:24 pm
Robert Hyatt Wed Jun 06, 2007 1:31 am
Uri Blass Wed Jun 06, 2007 6:42 am
Robert Hyatt Thu Jun 07, 2007 2:18 am
H.G.Muller Thu Jun 07, 2007 2:20 pm
Robert Hyatt Fri Jun 08, 2007 3:31 am
Robert Hyatt Fri Jun 08, 2007 4:02 pm
H.G.Muller Fri Jun 08, 2007 4:51 pm
Robert Hyatt Sat Jun 09, 2007 1:43 am
Re: Observator bias or... H.G.Muller Fri Jun 08, 2007 9:40 am
Robert Hyatt Sun Jun 10, 2007 2:24 am
Charles Roberson Wed Jun 06, 2007 2:44 am
Uri Blass Wed Jun 06, 2007 6:46 am
Ron Murawski Wed May 30, 2007 8:26 pm
Alessandro Scotti Wed May 30, 2007 8:31 pm
ed Wed May 30, 2007 11:50 pm
Dann Corbit Thu May 31, 2007 12:19 am
Dann Corbit Thu May 31, 2007 12:33 am
Dann Corbit Thu May 31, 2007 12:40 am
ed Thu May 31, 2007 9:40 am
H.G.Muller Thu May 31, 2007 11:02 am
Tony Thu May 31, 2007 12:04 pm
Uri Blass Thu May 31, 2007 12:51 pm
Tony Thu May 31, 2007 12:55 pm
Alessandro Scotti Thu May 31, 2007 12:56 pm
Robert Hyatt Sat Jun 02, 2007 6:37 am
Eelco de Groot Sat Jun 02, 2007 11:15 pm
Michael Sherwin Sun Jun 03, 2007 6:29 am
Uri Blass Sun Jun 03, 2007 8:11 am
Eelco de Groot Sun Jun 03, 2007 9:07 am
Uri Blass Sun Jun 03, 2007 9:39 am
H.G.Muller Sun Jun 03, 2007 9:47 am
Alessandro Scotti Sun Jun 03, 2007 8:36 am
Ron Murawski Sun Jun 03, 2007 5:50 pm
MartinBryant Sun Jun 03, 2007 9:07 am

 Jump to: Select a forum Computer Chess Club Forums----------------Computer Chess Club: General TopicsComputer Chess Club: Tournaments and MatchesComputer Chess Club: Programming and Technical DiscussionsComputer Chess Club: Engine Origins Other Forums----------------Chess Thinkers ForumForum Help and Suggestions
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum