Dirty hashing trick

leanchess · Post by **leanchess** » Sun Apr 26, 2020 12:45 pm

As I finally got to implementing TT, I decided to forego generating pseudorandom values for castling rights. I instead update a "dirty" bitboard, the relevant bits of which contribute to the lower 16 bits of the key. This guarantees that any two positions that only differ in their castling rights map to separate buckets:

Code: Select all

static uint64_t get_key_castling(register const uint64_t dirty) {
	return ((dirty & dirty_masks[SIDE_WHITE]) | ((dirty << 56) & dirty_masks[SIDE_BLACK])) >> 48;
}

(for orthodox chess both dirty_masks are 0x91).

I measured a ~5% performance gain in perft(7) from SP. I understand that there will be false negatives in more advanced positions, but those should seemingly be quite rare.

Any thoughts?

leanchess · Post by **leanchess** » Sun Apr 26, 2020 3:42 pm

With the same idea applied to e.p. square:

Code: Select all

static uint64_t get_key_ep_square(register const int8_t ep_square) {
	return ep_square < 0
		? 0
		: ((ep_square & 7) + 8);
}

static uint64_t get_key_castling(register const uint64_t dirty) {
	return ((dirty & dirty_masks[SIDE_WHITE]) | ((dirty << 56) & dirty_masks[SIDE_BLACK])) >> 44;
}

Since the hash is already split between the sides, this makes the piece-square the only pseudorandom component.

leanchess · Post by **leanchess** » Sun Apr 26, 2020 4:56 pm

Scratch that. The code is incorrect, no performance gain with corrected code.

hgm · Post by **hgm** » Sun Apr 26, 2020 5:09 pm

In micro-Max I use index = (boardKey + (stm + 128)*epSqr) & hashMask, where stm = 8 or 16, and epSqr = 128 if there are no e.p. rights. If the rest of the keys is random, this doesn't increase the chance for collisions. Micro-Max doesn't bother to hash the castling rights.

leanchess · Post by **leanchess** » Sun Apr 26, 2020 6:01 pm

OK, I think I got it right this time around:

Code: Select all

static uint64_t get_key_ep_square(register const int8_t ep_square) {
	return ep_square < 0
		? 0
		: ((ep_square & 7) ^ 8);
}

static uint64_t rotll(uint64_t n, uint8_t c)
{
	return (n << c) | (n >> (64 - c));
}

static uint64_t get_key_castling(register const uint64_t dirty) {
	return rotll(dirty & dirty_mask, 12);
}

According to this, rotll() is compiled into an actual rotate instruction. The performance gain isn't as great as I'd hoped (and falsely advertised), but it is noticeable.

hgm wrote: ↑Sun Apr 26, 2020 5:09 pm In micro-Max I use index = (boardKey + (stm + 128)*epSqr) & hashMask, where stm = 8 or 16, and epSqr = 128 if there are no e.p. rights. If the rest of the keys is random, this doesn't increase the chance for collisions. Micro-Max doesn't bother to hash the castling rights.

Thanks, it's good to know that my approach isn't completely off.

Now I need to somehow get rid of the ternary. I tried using ep_square = 0xF8 for none, adding 8, and then clearing the upper 4 bits, but that doesn't seem to work.

leanchess · Post by **leanchess** » Mon Apr 27, 2020 7:04 am

The following works:

Code: Select all

static uint64_t get_key_ep_square(register const int8_t ep_square) {
	return (ep_square >= 0) + (ep_square & 7);
}

provided that the negative ep_square is divisible by 8. However, that seems to actually slow things down, so I'll stick with the ternary.

Using 128 as the negative value, on the other hand, results in a small speedup. Thanks again @hgm!

MOBMAT · Post by **MOBMAT** » Mon Apr 27, 2020 8:02 am

I use the Polyglot hash values for my TT. It saves some space since I don't need another dedicated array for the hash values and the current position's TT hash value can be used directly when looking up openings from a Polyglot book.

I maintain castling rights as 4 bits (low order in a byte), setting or clearing them as needed.

I pre-calculate (at program start) the 16 possible castling hash values (index 0 = zero) and use the castling rights bits as an index into the array of 16 hash values. No loops or if statements are required, just a memory fetch.

I couldn't think of a faster way.

leanchess · Post by **leanchess** » Mon Apr 27, 2020 10:49 am

MOBMAT wrote: ↑Mon Apr 27, 2020 8:02 am I use the Polyglot hash values for my TT. It saves some space since I don't need another dedicated array for the hash values and the current position's TT hash value can be used directly when looking up openings from a Polyglot book.

I maintain castling rights as 4 bits (low order in a byte), setting or clearing them as needed.

I pre-calculate (at program start) the 16 possible castling hash values (index 0 = zero) and use the castling rights bits as an index into the array of 16 hash values. No loops or if statements are required, just a memory fetch.

I couldn't think of a faster way.

If built-in opening book lookup is required, there is indeed no alternative. The speedup is due to the transparent dirty BB replacing the 4-bit castling rights.

hgm · Post by **hgm** » Mon Apr 27, 2020 11:41 am

MOBMAT wrote: ↑Mon Apr 27, 2020 8:02 amI couldn't think of a faster way.

You could just XOR the castling-right bits with the key. That would be faster. Or multiply them with some magic constant before XORing them in.

To save space I sometimes use overlapping Zobrist keys: access a table filled with random uint8 as if they are uint64:

uint8 Zobrists[];

#define KEY(piece, square) *(uint64 *) (Zobrists + 64*piece + square)

If there is a huge number of pieces on a huge board, I synthesize the keys on-the-fly:

uint32 pieceKey[NPIECES];
uint32 squareKey[BOARDSIZE];

#define KEY(piece,square) (pieceKey[piece] * (unit64) squareKey[square])

leanchess · Post by **leanchess** » Mon Apr 27, 2020 3:03 pm

hgm wrote: ↑Mon Apr 27, 2020 11:41 am uint32 pieceKey[NPIECES];
uint32 squareKey[BOARDSIZE];

#define KEY(piece,square) (pieceKey[piece] * (unit64) squareKey[square])

How about the following?

Code: Select all

uint64_t piece_key[NPIECES];
uint64_t row_key[NPIECES];
uint64_t col_key[NPIECES];

#define KEY(piece, square) (piece_key[piece] + row_key[piece] * (square >> 3) + col_key[piece] * (square & 7))

I realise that using the macro as it is might be slower than lookup, but consider incremental updates, where a single value to be added to the key would be trivially calculated from the piece/colour/vector.

Dirty hashing trick

Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick

Re: Dirty hashing trick