TT false hits at lower depths

chessbit · Post by **chessbit** » Mon Sep 29, 2025 3:26 pm

I have tried several implementations to read/write in my TT while doing a perft with multiple threads. I have correct results up to depth 11, but when going too deep, I get slightly off results.

Locks might work, I haven't tried to compute perft 12 with them, but the performance hit is too significant, so I want to avoid this (unless you have a fast approach?).

My TT entry is just 2 64bits ints, one for the zobrist key, and one for the nodes count. The suggested lockless implementation on the chess wiki does not work => key = zobrist ^ nodes

What is commonly used at lower depths, with hopefully good performance results?

petero2 · Post by **petero2** » Mon Sep 29, 2025 7:50 pm

chessbit wrote: ↑Mon Sep 29, 2025 3:26 pm I have tried several implementations to read/write in my TT while doing a perft with multiple threads. I have correct results up to depth 11, but when going too deep, I get slightly off results.

Locks might work, I haven't tried to compute perft 12 with them, but the performance hit is too significant, so I want to avoid this (unless you have a fast approach?).

My TT entry is just 2 64bits ints, one for the zobrist key, and one for the nodes count. The suggested lockless implementation on the chess wiki does not work => key = zobrist ^ nodes

What is commonly used at lower depths, with hopefully good performance results?

Lockless hashing works perfectly if you implement it correctly, for example using C++ memory_order_relaxed atomics. See for example: viewtopic.php?p=671589#p671589 Be warned though that there were quite a bit of confused posts in that thread.

I suspect the problem in your case is that you don't take the remaining depth into account. In my perft program I handle that by having separate hash tables for each depth. This way it is also easier to control how much hash memory to use for different depths. An alternative is to instead use a single hash table but also store the depth somehow in the hash entries.

chessbit · Post by **chessbit** » Mon Sep 29, 2025 9:26 pm

petero2 wrote: ↑Mon Sep 29, 2025 7:50 pm
chessbit wrote: ↑Mon Sep 29, 2025 3:26 pm I have tried several implementations to read/write in my TT while doing a perft with multiple threads. I have correct results up to depth 11, but when going too deep, I get slightly off results.

Locks might work, I haven't tried to compute perft 12 with them, but the performance hit is too significant, so I want to avoid this (unless you have a fast approach?).

My TT entry is just 2 64bits ints, one for the zobrist key, and one for the nodes count. The suggested lockless implementation on the chess wiki does not work => key = zobrist ^ nodes

What is commonly used at lower depths, with hopefully good performance results?
Lockless hashing works perfectly if you implement it correctly, for example using C++ memory_order_relaxed atomics. See for example: viewtopic.php?p=671589#p671589 Be warned though that there were quite a bit of confused posts in that thread.

I suspect the problem in your case is that you don't take the remaining depth into account. In my perft program I handle that by having separate hash tables for each depth. This way it is also easier to control how much hash memory to use for different depths. An alternative is to instead use a single hash table but also store the depth somehow in the hash entries.

Great thanks for the reply, I'll adapt my implementation tomorrow with atomics.

My TT is declared as a matrix [depth][2^28], but I haven't really played with memory yet, I was more focused on having it work first. I hope that with better memory allocation, I would be able to increase the size.

What do you mean with the remaining depth? Depth 1 is never stored in the TT because it's faster to count the moves than it is to use the TT.

syzygy · Post by **syzygy** » Mon Sep 29, 2025 10:14 pm

chessbit wrote: ↑Mon Sep 29, 2025 9:26 pmMy TT is declared as a matrix [depth][2^28], but I haven't really played with memory yet, I was more focused on having it work first. I hope that with better memory allocation, I would be able to increase the size.

What do you mean with the remaining depth? Depth 1 is never stored in the TT because it's faster to count the moves than it is to use the TT.

If you do a perft 15 and you are in a node that is 10 ply from the root position, then remaining depth is 5. So for that position you still need to calculate perft 5.

You might encounter the same position at different plies from the root. To avoid this, you could include either the remaining depth (or alternatively/equivalently the distance to the root) in the hash key. But then you cannot use the depth in your decision whether to replace a TT entry or not.

Since your TT is a two-dimensional array [depth][2^28], it seems you already have separate tables for different remaining depths. (In this case replace-always is probably the best replacement strategy.)

Note that with 2^28 entries and a 64-bit key, once the TT is full, there will be a false hit every 2^(64-28) = 2^36 = 68.7 billion probes into that part of the table which should not have hit. So 64 bits is not going to be enough for higher perfts.

chessbit · Post by **chessbit** » Tue Sep 30, 2025 6:16 pm

syzygy wrote: ↑Mon Sep 29, 2025 10:14 pm
chessbit wrote: ↑Mon Sep 29, 2025 9:26 pmMy TT is declared as a matrix [depth][2^28], but I haven't really played with memory yet, I was more focused on having it work first. I hope that with better memory allocation, I would be able to increase the size.

What do you mean with the remaining depth? Depth 1 is never stored in the TT because it's faster to count the moves than it is to use the TT.
If you do a perft 15 and you are in a node that is 10 ply from the root position, then remaining depth is 5. So for that position you still need to calculate perft 5.

You might encounter the same position at different plies from the root. To avoid this, you could include either the remaining depth (or alternatively/equivalently the distance to the root) in the hash key. But then you cannot use the depth in your decision whether to replace a TT entry or not.

Since your TT is a two-dimensional array [depth][2^28], it seems you already have separate tables for different remaining depths. (In this case replace-always is probably the best replacement strategy.)

Note that with 2^28 entries and a 64-bit key, once the TT is full, there will be a false hit every 2^(64-28) = 2^36 = 68.7 billion probes into that part of the table which should not have hit. So 64 bits is not going to be enough for higher perfts.

I see, indeed that makes sense. That might be my problem. Thanks for the clarification.
Is it common to use 128 bits keys for higher perfts?

syzygy · Post by **syzygy** » Wed Oct 01, 2025 1:32 am

chessbit wrote: ↑Tue Sep 30, 2025 6:16 pm
syzygy wrote: ↑Mon Sep 29, 2025 10:14 pm
chessbit wrote: ↑Mon Sep 29, 2025 9:26 pmMy TT is declared as a matrix [depth][2^28], but I haven't really played with memory yet, I was more focused on having it work first. I hope that with better memory allocation, I would be able to increase the size.

What do you mean with the remaining depth? Depth 1 is never stored in the TT because it's faster to count the moves than it is to use the TT.
If you do a perft 15 and you are in a node that is 10 ply from the root position, then remaining depth is 5. So for that position you still need to calculate perft 5.

You might encounter the same position at different plies from the root. To avoid this, you could include either the remaining depth (or alternatively/equivalently the distance to the root) in the hash key. But then you cannot use the depth in your decision whether to replace a TT entry or not.

Since your TT is a two-dimensional array [depth][2^28], it seems you already have separate tables for different remaining depths. (In this case replace-always is probably the best replacement strategy.)

Note that with 2^28 entries and a 64-bit key, once the TT is full, there will be a false hit every 2^(64-28) = 2^36 = 68.7 billion probes into that part of the table which should not have hit. So 64 bits is not going to be enough for higher perfts.
I see, indeed that makes sense. That might be my problem. Thanks for the clarification.
Is it common to use 128 bits keys for higher perfts?

Yes. For example, Steven Edwards used 128-bit keys in his attempt to calculate perft 14:
viewtopic.php?p=667349#p667349

chessbit · Post by **chessbit** » Wed Oct 01, 2025 8:39 pm

petero2 wrote: ↑Mon Sep 29, 2025 7:50 pm
chessbit wrote: ↑Mon Sep 29, 2025 3:26 pm I have tried several implementations to read/write in my TT while doing a perft with multiple threads. I have correct results up to depth 11, but when going too deep, I get slightly off results.

Locks might work, I haven't tried to compute perft 12 with them, but the performance hit is too significant, so I want to avoid this (unless you have a fast approach?).

My TT entry is just 2 64bits ints, one for the zobrist key, and one for the nodes count. The suggested lockless implementation on the chess wiki does not work => key = zobrist ^ nodes

What is commonly used at lower depths, with hopefully good performance results?
Lockless hashing works perfectly if you implement it correctly, for example using C++ memory_order_relaxed atomics. See for example: viewtopic.php?p=671589#p671589 Be warned though that there were quite a bit of confused posts in that thread.

I suspect the problem in your case is that you don't take the remaining depth into account. In my perft program I handle that by having separate hash tables for each depth. This way it is also easier to control how much hash memory to use for different depths. An alternative is to instead use a single hash table but also store the depth somehow in the hash entries.

FYI I have tried with atomics but it didn't help. I have tried two different sets of zobrist keys, but I suspect my hashes may not be good enough. I guess you would run into the same issue at lower perfts? Maybe your zobrist keys are better and so still work at perft 12, maybe even 13? I'm a bit out of ideas otherwise, except trying 128 bits hashes, but that still doesn't solve the problem of collisions, which would yield potentially wrong results at even lower depths... which sucks.

syzygy · Post by **syzygy** » Wed Oct 01, 2025 9:14 pm

chessbit wrote: ↑Wed Oct 01, 2025 8:39 pmFYI I have tried with atomics but it didn't help. I have tried two different sets of zobrist keys, but I suspect my hashes may not be good enough. I guess you would run into the same issue at lower perfts? Maybe your zobrist keys are better and so still work at perft 12, maybe even 13? I'm a bit out of ideas otherwise, except trying 128 bits hashes, but that still doesn't solve the problem of collisions, which would yield potentially wrong results at even lower depths... which sucks.

Unless your Zobrist keys have been generated with a really bad random generator, they are almost certainly good enough.
But if you're not sure, just take numbers from https://www.random.org/bytes/

If you have no errors when running single-threaded, then your problem is not lack of randomness but the interaction between threads.

You could try this. Assuming your key is 128 bits (16 bytes) and your perft value is 64 bits (8 bytes), arrange them as follows:
5 bytes key
3 bytes perft value
---
5 bytes key
3 bytes perft value
---
6 bytes key
2 bytes perft value

Read this entry with 3 64-bit reads, then check that the key bytes match the current position's key. If it does, you can use the perft value.
Writes work the same. Just prepare the values and write in one go.
Perhaps you want to use inline assembly to read/write, so the compiler does not spread around the read and write operations.

syzygy · Post by **syzygy** » Wed Oct 01, 2025 10:31 pm

syzygy wrote: ↑Wed Oct 01, 2025 9:14 pm
chessbit wrote: ↑Wed Oct 01, 2025 8:39 pmFYI I have tried with atomics but it didn't help. I have tried two different sets of zobrist keys, but I suspect my hashes may not be good enough. I guess you would run into the same issue at lower perfts? Maybe your zobrist keys are better and so still work at perft 12, maybe even 13? I'm a bit out of ideas otherwise, except trying 128 bits hashes, but that still doesn't solve the problem of collisions, which would yield potentially wrong results at even lower depths... which sucks.
Unless your Zobrist keys have been generated with a really bad random generator, they are almost certainly good enough.
But if you're not sure, just take numbers from https://www.random.org/bytes/

If you have no errors when running single-threaded, then your problem is not lack of randomness but the interaction between threads.

You could try this. Assuming your key is 128 bits (16 bytes) and your perft value is 64 bits (8 bytes), arrange them as follows:
5 bytes key
3 bytes perft value
---
5 bytes key
3 bytes perft value
---
6 bytes key
2 bytes perft value

Read this entry with 3 64-bit reads, then check that the key bytes match the current position's key. If it does, you can use the perft value.
Writes work the same. Just prepare the values and write in one go.
Perhaps you want to use inline assembly to read/write, so the compiler does not spread around the read and write operations.

What should also work, without interleaving the key and they perft value bytes:
To read:
- read key (2 x 64-bit reads)
- if a match, read perft value and then read the key again.
- if still a match, use the perft value.

To write:
- clear the key to 0 (2 x 64-bit writes) (I guess clearing half the key should be suffiicient)
- write the perft value, then write the key.

Do this with inline assembly to make sure memory locations are actually read and written in the intended error.
(This can be achieved with atomics, but I am too lazy to check what kind of atomic semantics is needed.)

If someone sees how this could go wrong, please let me know.

Aleks Peshkov · Post by **Aleks Peshkov** » Thu Oct 02, 2025 1:17 am

Your problem is not collisions. You should store depth in TT record. "position startpos moves g1f3 g8f6 f3g1 f6g8" is legal transposition and have different perft from startpos, because it has remaining depth-4.

'perft 2' and 'perft 6' are different perft numbers of the same position, but you store them in the same slot in TT.

TT false hits at lower depths

TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths

Re: TT false hits at lower depths