Beginners testing methodology

hgm · Post by **hgm** » Wed Sep 12, 2012 7:33 pm

Note that WinBoard has a special timing mode /firstNPS=0 /secondNPS=0, where it updates the clocks from the times reported by the engines in their thinking output. When dealing with engines that reliably send a line of thinking output just before they move (some engines don't do this when they time out during an iteration!), this can be used to achieve completely lag-free timing. If the engine implements the WB-protocol nps command, it would report its times in CPU-sec rather than wall-clock time in this situation, so that you would even become insensitive to random interruptions of their search by other tasks stealing a little CPU.

Note, however, that in WB protocol times are sent in centi-seconds, which becomes a problem in ultrafast games. Even if it would not, the timing calls used by WinBoard and most engines only tick 60 times per second, even when they claim millisecond precision.

Perhaps WB protocol should be extended with a feature timing=N (default N=100), to indicate that the basic timing unit is 1/N sec. So that engines could send feature timing=1000000 to indicate they want to report (in thinking output) and be informed (with time and otim commands) in microseconds?

jdart · Post by **jdart** » Thu Sep 13, 2012 2:34 am

As you probably know achieving microsecond timing requires some special program settings and it is expensive in terms of overhead.

And changing the settings to microseconds or something from centiseconds might be helpful, but would require re-engineering programs to accommodate this, so I am not sure that is a good idea. I can change my program to do this but how many others will change?

I use cutechess-cli for testing currently and am setting timecontrol to tc=0:05+0.1. At this rate I am not seeing losses on time.

--Jon

brianr · Post by **brianr** » Sat Sep 15, 2012 2:42 pm

Thanks for the comments and suggestions.

Yes, I know most test faster than this (0:4+0.4 or 0:6+0.6).
As I mentioned, I am trying to get deep enough into search to see changes there, not just for eval.
Do others feel that 0.1 sec is long enough to measure extension/LMR/razor type search tuning?

Re starting positions: I don't think it should matter that much. Recently I just use unique 12 ply game positions from a large PGN set, sorted by frequency, and take the top N positions. down to a minimum frequency occurrence, in random order. The PGN is here: http://www.arasanchess.org/testpos.pgn (about 4500 positions).

The point of this post was that it DOES seem to matter quite a bit which test positions are used.
I also ran a test with your starting positions, which seem more like the "Krill subset of Adam" above.

Code: Select all

Rank Name           Elo    +    - games score oppo. draws
   1 Tinker852x64    23    9    9  2026   57%   -13   37%
   2 Tinker863x64    -8   10   10  1750   47%     0   31%
   3 Tinker850x64   -15    8    8  2392   46%     9   36%
ResultSet-EloRating>los
              Ti Ti Ti
Tinker852x64     99 99
Tinker863x64   0    84
Tinker850x64   0 15

Beginners testing methodology

Re: Starting Position Test Sets Re:Beginners testing methodo

Re: Starting Position Test Sets Re:Beginners testing methodo

Re: Starting Position Test Sets Re:Beginners testing methodo