Causes for inconsistent benchmark signatures

Evert · Post by **Evert** » Thu Mar 28, 2013 12:11 am

I added a "benchmark" command to Jazz (searching a given set of positions to a given depth and reporting total nodes as well as time and NPS), which will hopefully be very useful. At the moment it's a bit of a pain, however.

The problem is I get different total node-counts in different environments.

OS X, 32 and 64 bit (GCC and Clang):
signature 3356838

Linux 64 bit
signature 3356838

Linux 32 bit
signature 3358082

Windows 64 bit (cross compiled and running under Wine)
signature 3340172

Windows 32 bit (cross compiled and running under Wine)
signature 3357543

What are some common-causes for inconsistencies like these? I've eliminated a random component to root move ordering (so Jazz doesn't always play the same moves early on; disabled for this) and calling libc's qsort() function (which is an unstable sort and will produce slightly different ordering on different platforms because of differences in the implementation).

Michel · Post by **Michel** » Thu Mar 28, 2013 2:06 am

Did you try valgrind?

A common cause of such inconsistencies is uninitialized variables.

AlvaroBegue · Post by **AlvaroBegue** » Thu Mar 28, 2013 2:15 am

Are you using rand() anywhere? (e.g., in initializing Zobrist tables)

Evert · Post by **Evert** » Thu Mar 28, 2013 9:35 am

Michel wrote:Did you try valgrind?

A common cause of such inconsistencies is uninitialized variables.

Good question. I hadn't tried valgrind recently (but I did run the code through the Clang static analyser). For some reason valgrind refuses to check my 32-bit compiles, but the 64 bit ones it does. However, it finds nothing...

Evert · Post by **Evert** » Thu Mar 28, 2013 9:38 am

AlvaroBegue wrote:Are you using rand() anywhere? (e.g., in initializing Zobrist tables)

I use a random component in sorting the root moves at the beginning of the search, but that's disabled for the benchmark. Without it benchmark results are not reproducible on the same machine.

That's the only one (apart from opening books, but those are disabled as well).

hgm · Post by **hgm** » Thu Mar 28, 2013 9:56 am

Well, if you really want to know, I guess you should look for the lowes search depth where you see a difference, and then do a search that reports the 'split node counts' for every move separately at 1, 2, 3... ply, and zoom in on the difference using that, to see which node makes the difference.

An alternative would be to print all hash keys on a file, one per line, and use the diff command to get the first difference. Then patch the engine to print the path when it encounters a position that hash key. If the number of nodes for which you get a difference is manageable, this might be the quickest method.

Evert · Post by **Evert** » Thu Mar 28, 2013 11:13 am

hgm wrote:Well, if you really want to know, I guess you should look for the lowes search depth where you see a difference, and then do a search that reports the 'split node counts' for every move separately at 1, 2, 3... ply, and zoom in on the difference using that, to see which node makes the difference.

Yup, that's what I ended up doing (more or less): identify the search depth where the problem first appeared (which turned out to be 2) and then identify the exact position (the benchmark searches a couple) that causes the deviation, then print out the movelist that is searched (which isn't hard for a depth of 2).

This revealed the presence of a spurious en-passant capture in the movelist in 32-bit Linux that wasn't there in 64 bits or on OS X, which was generated because of a spurious en-passant target square being set from loading the FEN position (I don't mean an en-passant square being set while the capture is illegal, I mean a random square on the board was marked as "en passant" square). This turned out to be a harmless square in 64 bit mode, but generated a spurious pawn capture in 32 bit mode.

Tracking where the en-passant square came from, it turned out that the FEN position was incorrect and was missing the en-passant field (or the castling field, it's hard to tell the difference if one of them is missing and the other one is -).

So I corrected the FEN string and made my FEN parser more robust when dealing with incorrect FENs.

That corrected the problem and I now get consistent node-counts across different platforms, compilers and compiler options.

Phew.

hgm · Post by **hgm** » Thu Mar 28, 2013 1:03 pm

Life is so much easier at 2 ply!

In the new Fairy-Max derivative I am making (introducing move sorting in a real move list) I fixed a problem this morning that made random pieces disappear from the board, in a rather irreproducible way. Saving and restoring the piece on the to-square of e.p. captures seems to have made this problem go away. I guess that is what you get when you transfer the info whether a move is an e.p. capture through the move encoding in the hash table, in stead of basing it on the move generator's knowledge that it was a Pawn that was moving, and it went to the e.p. square.

Rein Halbersma · Post by **Rein Halbersma** » Thu Mar 28, 2013 3:24 pm

Evert wrote: What are some common-causes for inconsistencies like these? I've eliminated a random component to root move ordering (so Jazz doesn't always play the same moves early on; disabled for this) and calling libc's qsort() function (which is an unstable sort and will produce slightly different ordering on different platforms because of differences in the implementation).

I would try using std::stable_sort if you can use C++/STL, or write your own insertion sort to rule out algorithm related inconsistencies. E.g. in Stockfish they do

Code: Select all

// Our insertion sort, guaranteed to be stable, as is needed
  void insertion_sort(MoveStack* begin, MoveStack* end)
  {
    MoveStack tmp, *p, *q;

    for (p = begin + 1; p < end; ++p)
    {
        tmp = *p;
        for (q = p; q != begin && *(q-1) < tmp; --q)
            *q = *(q-1);
        *q = tmp;
    }
  }

bob · Post by **bob** » Thu Mar 28, 2013 10:23 pm

Evert wrote:I added a "benchmark" command to Jazz (searching a given set of positions to a given depth and reporting total nodes as well as time and NPS), which will hopefully be very useful. At the moment it's a bit of a pain, however.

The problem is I get different total node-counts in different environments.

OS X, 32 and 64 bit (GCC and Clang):
signature 3356838

Linux 64 bit
signature 3356838

Linux 32 bit
signature 3358082

Windows 64 bit (cross compiled and running under Wine)
signature 3340172

Windows 32 bit (cross compiled and running under Wine)
signature 3357543

What are some common-causes for inconsistencies like these? I've eliminated a random component to root move ordering (so Jazz doesn't always play the same moves early on; disabled for this) and calling libc's qsort() function (which is an unstable sort and will produce slightly different ordering on different platforms because of differences in the implementation).

That is almost certainly a bug that needs fixing. NO reason for the node counts to vary, assuming NOTHING changes but the 32/64 bit compiler differences. IE same hash size, etc... Anything else indicates a bug.

Causes for inconsistent benchmark signatures

Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures