ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Observator bias or...
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions Flat
View previous topic :: View next topic  
Author Message
H.G.Muller



Joined: 10 Mar 2006
Posts: 12911
Location: Amsterdam

PostPost subject: Re: Observator bias or...    Posted: Fri Jun 08, 2007 9:40 am Reply to topic Reply with quote

Well, excuse me for saying so, but if all you said was that 80 games is not enough, then I don't understand why you said that at all. As the topic under discussion, or at least the one that I addressed and that you reacted on, was the one raised by Uri, if it is preferable to test based on time or based on node count, not how many games are enough:
bob wrote:
hgm wrote:
Yes, for this reason testing at a fixed number of nodes and recording the ime, rather than fixing the time, seems preferable. But of course you cannot get rid of the randomness induced by SMP that way.

For this reason I still want to implement the tree comparison idea I proposed here lately. This would eliminate the randomness not by sampling enough games and relying on the (tediously slow) 1/sqrt(N) convergence, but by exhaustively generatng all possible realizations of the game from a given initial position. If the versions under comparison are quite close (the case that is most difficult to test with conventional methods), the entire game tree might consist of less than 100 games, but might give you the accuracy of a 10,000 games that are subject to chance effects.


fixed number of nodes is absolutely worthless. To prove that to yourself, do the following. Play a match using the same starting position, where _both_ programs search a fixed number of nodes (say 20,000,000). Record the results. Then re-play but have both search 20,010,000 nodes (10K more nodes than before). Now look at the results. They won't be anywhere near the same. Which one is more correct? Answer: that's hopeless as you take a small random (the games with 20M nodes per side) from a much larger set of random results, and you base your decisions on that? May as well flip a coin...

my upcoming ICGA paper will show just how horrible this is...

Here you make the general statement that testing with a fixed number of nodes is useless. Without referring to any number of games. And I don't think that 'useless' is the same as 'not enough' (for a particular purpose). Even if you want to stick to the 80 games that suddenly popped out of nowhere, if I have two versions and one of them scored 45 out of 80, while the other scored 0 out of 80, then 80 games are clearly enough to draw the far-reaching conclusion that you have broken something, and it would be plain silly to continue playing 100,000 games with this version. But all of that is standard statistics, which was never an issue in this discussion thread.

To talk about things that are absolutely useles: testing uMax against Crafty, Fruit, Arasan, comes pretty close pretty to that. The more games I would play, the more useles it would be, for in 100,000 games both the old and the improved version of uMax would score 0 points. So How would I know if my improvement worked, or if I had completely broken it? It would just be a giant waste of time. An easy analysis shows that you obtain maximum information per game (so that you get the desired reliability with the smallest number of games) if you test against engines of about equal strength.

On top of that, for those that are testing engines in a higher ELO range, I would even reverse the statement: if Crafty, Fruit, Arasan,... are not able to reproduce their games despite 'random' being switched off, and despite being set for a fixed ply depth, (so that random factors outside of the engines cannot affect their logic), they are clearly not suitable test opponents and are best excluded from any gauntlets you make to evaluate tiny changes in your engine. As using such unpredictable engines needlesly add an enormous statistical variance to the quantity under measurement. Better stick to engines that behave according to specifications.

After all, the idea is to make testing to a certain accuracy as easy as possible. That you could also make it much harder on yourself by picking certain engines with nasty peculiarities, is quite irrelevant if you are smart enough to stay away from them!
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Subject Author Date/Time
Observator bias or... Alessandro Scotti Tue May 29, 2007 6:25 pm
      Re: Observator bias or... Dann Corbit Tue May 29, 2007 6:33 pm
      Re: Observator bias or... ed Tue May 29, 2007 6:59 pm
            Re: Observator bias or... H.G.Muller Wed May 30, 2007 9:43 am
                  Re: Observator bias or... ed Wed May 30, 2007 3:02 pm
                        Re: Observator bias or... cwb Wed May 30, 2007 4:48 pm
                        Re: Observator bias or... Peter Fendrich Wed May 30, 2007 6:36 pm
                        Re: Observator bias or... Robert Hyatt Sun Jun 10, 2007 3:30 pm
      Re: Observator bias or... Uri Blass Wed May 30, 2007 10:17 am
            Re: Observator bias or... Alessandro Scotti Wed May 30, 2007 12:35 pm
                  Re: Observator bias or... Robert Hyatt Sat Jun 02, 2007 6:41 am
                        Re: Observator bias or... H.G.Muller Sat Jun 02, 2007 10:18 am
                              Re: Observator bias or... Robert Hyatt Sun Jun 03, 2007 12:54 am
                                    Re: Observator bias or... Uri Blass Sun Jun 03, 2007 5:44 am
                                          Re: Observator bias or... Robert Hyatt Sun Jun 03, 2007 8:34 pm
                                                Re: Observator bias or... H.G.Muller Mon Jun 04, 2007 9:52 am
                                                      Re: Observator bias or... Robert Hyatt Mon Jun 04, 2007 6:58 pm
                                                            Re: Observator bias or... H.G.Muller Tue Jun 05, 2007 2:24 pm
                                                                  Re: Observator bias or... Robert Hyatt Wed Jun 06, 2007 1:31 am
                                                                        Re: Observator bias or... Uri Blass Wed Jun 06, 2007 6:42 am
                                                                              Re: Observator bias or... Robert Hyatt Thu Jun 07, 2007 2:18 am
                                                                        Re: Observator bias or... H.G.Muller Thu Jun 07, 2007 2:20 pm
                                                                              Re: Observator bias or... Robert Hyatt Fri Jun 08, 2007 3:31 am
                                                                                    time for some real data Robert Hyatt Fri Jun 08, 2007 4:02 pm
                                                                                          Re: time for some real data H.G.Muller Fri Jun 08, 2007 4:51 pm
                                                                                          Re: time for some real data Robert Hyatt Sat Jun 09, 2007 1:43 am
                                    Re: Observator bias or... H.G.Muller Fri Jun 08, 2007 9:40 am
                                          Re: Observator bias or... Robert Hyatt Sun Jun 10, 2007 2:24 am
                              Re: Observator bias or... Charles Roberson Wed Jun 06, 2007 2:44 am
                                    Re: Observator bias or... Uri Blass Wed Jun 06, 2007 6:46 am
      Re: Observator bias or... Ron Murawski Wed May 30, 2007 8:26 pm
            Re: Observator bias or... Alessandro Scotti Wed May 30, 2007 8:31 pm
                  Re: Observator bias or... ed Wed May 30, 2007 11:50 pm
                        Re: Observator bias or... Dann Corbit Thu May 31, 2007 12:19 am
                              Re: Observator bias or... Dann Corbit Thu May 31, 2007 12:33 am
                                    Re: Observator bias or... Dann Corbit Thu May 31, 2007 12:40 am
                                          Re: Observator bias or... ed Thu May 31, 2007 9:40 am
                  Re: Observator bias or... H.G.Muller Thu May 31, 2007 11:02 am
                        Re: Observator bias or... Tony Thu May 31, 2007 12:04 pm
                              Re: Observator bias or... Uri Blass Thu May 31, 2007 12:51 pm
                                    Re: Observator bias or... Tony Thu May 31, 2007 12:55 pm
                        Re: Observator bias or... Alessandro Scotti Thu May 31, 2007 12:56 pm
                        Re: Observator bias or... Robert Hyatt Sat Jun 02, 2007 6:37 am
                        Re: Observator bias or... Eelco de Groot Sat Jun 02, 2007 11:15 pm
      Re: Observator bias or... Michael Sherwin Sun Jun 03, 2007 6:29 am
            Re: Observator bias or... Uri Blass Sun Jun 03, 2007 8:11 am
                  Re: Observator bias or... Eelco de Groot Sun Jun 03, 2007 9:07 am
                        Re: Observator bias or... Uri Blass Sun Jun 03, 2007 9:39 am
                        Re: Observator bias or... H.G.Muller Sun Jun 03, 2007 9:47 am
            Re: Observator bias or... Alessandro Scotti Sun Jun 03, 2007 8:36 am
                  Re: Observator bias or... Ron Murawski Sun Jun 03, 2007 5:50 pm
      Re: Observator bias or... MartinBryant Sun Jun 03, 2007 9:07 am
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads