64-bit and 32-bit exes producing different results

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: 64-bit and 32-bit exes producing different results

Post by JVMerlino »

rbarreira wrote:How do you calculate dwHashSize? Hopefully you're not assuming that sizeof (HASH_ENTRY) is a power of 2, or that it is the same for both versions.
dwHashSize = allocated size of hash table / sizeof(HASH_ENTRY)

and the allocated hash size is SUPPOSED to be a power of 2, but I'm checking that code thoroughly now.

jm
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: 64-bit and 32-bit exes producing different results

Post by JVMerlino »

Desperado wrote:Just a quick idea before i go to bed :) .

The problem may be caused by the _&_ operation when the size of
_dwHashsize_ is not longer a power of 2.

My example is padded to 8 bytes (not 12). if i would use it without
knowing the issue my _dwHashsize_ would not be a power of 2
and the & operation would fail.

Michael
Very good point, and I am checking over that code now. dwHashSize is supposed to be a power of 2, but now I'm not so sure.

jm
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 64-bit and 32-bit exes producing different results

Post by bob »

JVMerlino wrote:
rbarreira wrote:Do you use any external libraries which might change the behavior of your program? For example random number generators.

Failing that, it means it's something internal to your code. In that case, you probably have a bug somewhere (accessing uninitialized memory or an invalid memory location for example).
I do not. All of my zobrist hashing values are in a fixed table.

As I type this, Andrew Fan (Firefly) is looking at it. His debug builds show identical behavior, but the release builds do not. Which is even stranger because there is definitely no debug-specific code -- not even asserts. :?

The mystery deepens....

jm
This sounds like an unitialized variable. Have you tried GCC and let it build the dependency graph (it will do this with -O option) which will point out cases where it sees a reference to a variable before it is assigned. Note that this does not work across procedures very well, so if you set something in one procedure and use it in another, it well might not give an error if there is a path to the procedure that uses it before the procedure that sets it.

Changing debug options and such will alter the stack and change results.
JVMerlino
Posts: 1357
Joined: Wed Mar 08, 2006 10:15 pm
Location: San Francisco, California

Re: 64-bit and 32-bit exes producing different results

Post by JVMerlino »

JVMerlino wrote:
Desperado wrote:Just a quick idea before i go to bed :) .

The problem may be caused by the _&_ operation when the size of
_dwHashsize_ is not longer a power of 2.

My example is padded to 8 bytes (not 12). if i would use it without
knowing the issue my _dwHashsize_ would not be a power of 2
and the & operation would fail.

Michael
Very good point, and I am checking over that code now. dwHashSize is supposed to be a power of 2, but now I'm not so sure.

jm
I've verified that dwHashSize is indeed a power of 2, but I also tried switching to a modulo operation just to be sure:

Code: Select all

    PosSignature	index = (dwSignature % (dwHashSize));
But the problem is still there. :(

jm
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: 64-bit and 32-bit exes producing different results

Post by michiguel »

JVMerlino wrote:Many thanks in advance to anybody who can provide some guidance on this.

In a nutshell, the 32-bit and 64-bit versions of Myrddin have always given very different results in analysis mode -- not just move count but PV and sometimes even best move. It's only now that I've decided to devote some time to the problem.

I thought it might be the compile setup, since the two versions are compiled on two different machines (but with the same compiler, Visual Studio 2010, and compile/link settings). But when Jim Ablett's compiles also produce the same issue, I start to suspect the code itself, as I seem to recall that Jim does not exclusively use VS for his builds.

Jim pointed me to a thread from a couple of years ago about Stockfish exhibiting the same problem, but that was due to a MS library sort function which Myrddin does not use. I've searched through all of Myrddin code many times, and cannot find any 64-bit specific code, and I'm just not familiar enough with the MS libraries to guess at which functions might be causing this problem.

Again, any help will be very much appreciated (you'll be mentioned in the release notes!) :D

jm
IMHO, this should never happen, ever, ever, not even between linux and windows compiles. It could be benign, if there is anything in the code that is not stable, like different sort routines from different libraries. But there is no reason in a chess engine to rely on those libraries.

I suggest
1) Disable hashtables, if the problem persist...
2) Dump the tree in a file, with hashtables disabled. Compared them with a diff program and see where you hit the first difference. W/o hashtables, you should be able to go to that particular node and debug it easily.

I should go and check gaviota to see if the counts are the same :-)

Miguel
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: 64-bit and 32-bit exes producing different results

Post by Michel »

This sounds like an unitialized variable. Have you tried GCC and let it build the dependency graph (it will do this with -O option)
Much faster in my opinion is to use valgrind. If there is an uninitialized variable it will immediately tell you.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: 64-bit and 32-bit exes producing different results

Post by michiguel »

Michel wrote:
This sounds like an unitialized variable. Have you tried GCC and let it build the dependency graph (it will do this with -O option)
Much faster in my opinion is to use valgrind. If there is an uninitialized variable it will immediately tell you.
I use gcc with these switches, and I get warnings for uninitialized variables:

-Wwrite-strings -Wconversion -Wshadow -Wparentheses -Wlogical-op -Wunused -Wmissing-prototypes -Wmissing-declarations -Wdeclaration-after-statement -W -Wall -Wextra

Miguel
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: 64-bit and 32-bit exes producing different results

Post by Michel »

I use gcc with these switches, and I get warnings for uninitialized variables:

-Wwrite-strings -Wconversion -Wshadow -Wparentheses -Wlogical-op -Wunused -Wmissing-prototypes -Wmissing-declarations -Wdeclaration-after-statement -W -Wall -Wextra
Ok I will try that. Does it cover all cases?

Valgrind (which is basically a virtual cpu) works by tagging memory with extra bits to distinguish between initialized data and uninitialized data.
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: 64-bit and 32-bit exes producing different results

Post by rbarreira »

JVMerlino wrote:
JVMerlino wrote:
Desperado wrote:Just a quick idea before i go to bed :) .

The problem may be caused by the _&_ operation when the size of
_dwHashsize_ is not longer a power of 2.

My example is padded to 8 bytes (not 12). if i would use it without
knowing the issue my _dwHashsize_ would not be a power of 2
and the & operation would fail.

Michael
Very good point, and I am checking over that code now. dwHashSize is supposed to be a power of 2, but now I'm not so sure.

jm
I've verified that dwHashSize is indeed a power of 2, but I also tried switching to a modulo operation just to be sure:

Code: Select all

    PosSignature	index = (dwSignature % (dwHashSize));
But the problem is still there. :(

jm
But did you check that sizeof (HASH_ENTRY) is the same on both 32-bit and 64-bit, debug and release ?
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: 64-bit and 32-bit exes producing different results

Post by Desperado »

Ok, now that we know the problem is hidden in transposition table code
and _if_ you can exclude different hashsize for 32/64 bit here is my next
bet:

* mixing up lo/hi - Index/Signature code.
I mean, you are using loBits for indexing, are you really using _hiBits_
for the signature. That would explain the 32/64 bit differences
immediatelly and also different debug/release behaviour.

(edit: or is dwSignature the _complete_ , a 32 bit hashkey for the position ?)

Just one more question.
And are you using 1 slot or many slots where the position can be put in ?

Michael