Transposition Tables and ply depth

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Transposition Tables and ply depth

Post by stegemma »

bob wrote:...zobrist_signature ^= random[white knight][g5];
zobrist_signature ^= random[white knight][f3];

And I am done. And the order of the above is irrelevant, although I did it in the order the piece moves for clarity. Is that what you are doing? If so, that is how everybody is doing it.
That's what i'm doing:

Code: Select all


(mm5 is the actual 64 bit Zobrist key of the position)

    mov edi, [esi.nmPezzoP]             ; edi = pointer to taken piece
    test edi, edi
    je em_nopreso                          ; if zero jump to em_nopreso
    pxor mm5, [edi.pLastKey]          ; mm5 ^= key(taken, square)
em_nopreso:
    mov edi, [edx.mPezzo]               ; edi = moved piece
    mov ebx, [esi.nmSrc]                ; ebx = source square
    mov ecx, [esi.nmDst]                ; ecx = destination square

    movq mm3, [edi.pKey]              ; mm3 = key(moved piece)
    movq mm2, mm3                     ; save key(pezzo) for later use

    paddb mm3, [ecx.cKey]            ; key(mov.p., dst sq)
    paddb mm2, [ebx.cKey]            ; key(mov.p., src sq)
    movq [edi.pLastKey], mm3 ; save new lastKey for the piece

    pxor mm5, mm2                       ; mm5 ^= key(piece, src sq)
    pxor mm5, mm3                       ; mm5 ^= key(piece, dst sq)

(i've omitted some QWORD PTR "cast", for clarity)

I know, it is absolutly unportable, but i don't care about that, at present. The interesting (or completly crazy!) part is this:

paddb mm3, [ecx.cKey] ; key(mov.p., dst sq)
paddb mm2, [ebx.cKey] ; key(mov.p., src sq)

here's where i compute the key(piece, source) and key(piece, destination). For those who don't know what paddb is (not for you): it is an assembly instruction that add two 64 bit values but byte per byte. That means that there are not a carry sum between any 8 bit boundary. I suppose that adding that way two random numbers we still get a good random number, for our purpose. Cache hits similar to those given by Bob can suggest this, i think. Whe can use a word or dword sum as well (qword sum is not available on mmx).

Finally i think that i can use directly my pointers to access the full random keys array but maybe i should do some more operations. Still this way is "elegant", from an assembly programmer point of view, because it requires only 16+64 random values and no extra indexes (piece+square).

Of course i put in pKey member of the piece structure the same random key for piece of the same type. By that way i havent to store the piece type index because i already have the key(piece)=key(type).

Using assembly struct makes the assembly program enough readable as C program, i think... just adding a lot of comment ;)
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Transposition Tables and ply depth

Post by hgm »

Your method sounds a bit risky: if you would use XOR to generate the OldKey and NewKey, rather than +', it would obviously not work at all, as you could not distinguish the situation pieceA@sqr1+pieceB@sqr2 from pieceA@sqr2+pieceB@sqr1. Now you use +' in stead of XOR, but + and XOR are nearly te same operation. The only difference is a carry propagation here and there (which in the MMX instruction is even suppressed on byte boundaries).

Have you really checked if this method does not give an unduly large number of key collissions? (E.g. by counting the number of dependency loops of 6 or 7 keys in 56-bit or 48-bit keys generated by this method, and comparing it with that for purely random keys?)

I think it would be much safer to stick to the method of overlapping keys. That also requires only about 1KB for the entire Zobrist table, and it is very doubtful if there would be much to be gained by reducing it more. And it is likely faster to calculate as well.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Transposition Tables and ply depth

Post by stegemma »

hgm wrote:Your method sounds a bit risky: if you would use XOR to generate the OldKey and NewKey, rather than +', it would obviously not work at all, as you could not distinguish the situation pieceA@sqr1+pieceB@sqr2 from pieceA@sqr2+pieceB@sqr1. Now you use +' in stead of XOR, but + and XOR are nearly te same operation. The only difference is a carry propagation here and there (which in the MMX instruction is even suppressed on byte boundaries).

Have you really checked if this method does not give an unduly large number of key collissions? (E.g. by counting the number of dependency loops of 6 or 7 keys in 56-bit or 48-bit keys generated by this method, and comparing it with that for purely random keys?)

I think it would be much safer to stick to the method of overlapping keys. That also requires only about 1KB for the entire Zobrist table, and it is very doubtful if there would be much to be gained by reducing it more. And it is likely faster to calculate as well.
In fact i've soon noticed that using xor instead of +' would be completly wrong. My first try was to use *', instead of +', but i've get some assembler error trying to use that operation and i've opted for +'. It is true that + is similar to xor but with the 64 bit random values we still have enough different bits between the two operations.

Thanks to you, i've changed +' with -'. this is equal to a byte per byte:

a - b = a + (~b +1)

so i think that this one could be good enough, for our purpose.

Aftwer trying -', i've noticed that the first bit never change in two complement numbers so we have 8 bit that never change, in second operand. Using dwords (32+32 bits) instead of bytes (4x8 bits) could be better because that way we have only 2 bits that doesn't change. Notice that i can't directly subtract two 64 bits, with mmx, as far as i know. Doing some very brief test, all methods (+' 8 bits, -' 8 bits and -' 32 bits) gives almost the same cache hits.

I i've tested even *' (pmullw) with similar cache hits. This one could be the best one but it seems to me that executing two consecutive multiply instructions could be slower, with mmx.

Surely i should check for the key collisions number but i think that this would not be be very higher than standard Zobrist key.
User avatar
stegemma
Posts: 859
Joined: Mon Aug 10, 2009 10:05 pm
Location: Italy
Full name: Stefano Gemma

Re: Transposition Tables and ply depth

Post by stegemma »

bob wrote:...

and I am done. When I unmake that move, I do this:

zobrist_signature ^= random[white knight][g5];
zobrist_signature ^= random[white knight][f3];
...
I think that it could be a little faster to store zobrist_signature before make move and restore it in unmake move. The difference is probably very very little.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Transposition Tables and ply depth

Post by bob »

stegemma wrote:
bob wrote:...

and I am done. When I unmake that move, I do this:

zobrist_signature ^= random[white knight][g5];
zobrist_signature ^= random[white knight][f3];
...
I think that it could be a little faster to store zobrist_signature before make move and restore it in unmake move. The difference is probably very very little.
I actually do that. But only because it makes UnmakeMove() a few lines shorter. I could not measure any speed difference either way on 64 bit machines.