Data structure choice for TT

hgm · Post by **hgm** » Fri Jul 31, 2020 4:08 pm

My "optimal in this respect" only referred to the strategy of sticking to the (cache-line-sized) bucket the initial hashing function maps you to, and replace what you consider least valuable there, rather than trying to look elsewhere to see if you can find something even less valuable to replace. There still can be a huge difference in preformance due to the criteria used to decide which entry is least valuable.

I don't think an age clock based on irreversible moves would be very useful. For one, the way you propose it doesn't catch every unreachable position: if the root has age clock N, moves that push one Pawn get age N+1, but moves that push another Pawn also get age N+1. If the engine decides to push the first Pawn, the root gets age N+1, and the sub-tree after pushing the other Pawn (now largely unreachable) still has the same age as the root, and is thus not recognized as obsolete. But even without this problem, not being able to distinguish positions that are no longer visited in practice (due to the preceding reversible moves have learned the search how to alpha-beta prune them would be a severe disadvantage.

I also doubt that neural nets can be of much use here. Because there isn't much you could feed to them to base their decision on. Unless you would store much more info in the hash entries, the only info you can go on is the depths and possibly ages. And that it is harmless to overwrite an entry when its age tells you the search no longer visits it seems a no-brainer.

Zenmastur · Post by **Zenmastur** » Fri Jul 31, 2020 11:13 pm

hgm wrote: ↑Fri Jul 31, 2020 4:08 pm My "optimal in this respect" only referred to the strategy of sticking to the (cache-line-sized) bucket the initial hashing function maps you to, and replace what you consider least valuable there, rather than trying to look elsewhere to see if you can find something even less valuable to replace. There still can be a huge difference in preformance due to the criteria used to decide which entry is least valuable.

I guess you would have to do some testing to determine what the penalty benefits ratio of increasing the bucket size to a multiple of the cache line size. So, I think it's premature to conclude that the cache line size is the optimal bucket size. It could be that the optimal size is larger than the cache-line size. This would depend on the CPUs cache architecture of course and the amount of benefit obtained with new replacement alogrithms.

hgm wrote: ↑Fri Jul 31, 2020 4:08 pm I don't think an age clock based on irreversible moves would be very useful. For one, the way you propose it doesn't catch every unreachable position: if the root has age clock N, moves that push one Pawn get age N+1, but moves that push another Pawn also get age N+1. If the engine decides to push the first Pawn, the root gets age N+1, and the sub-tree after pushing the other Pawn (now largely unreachable) still has the same age as the root, and is thus not recognized as obsolete. But even without this problem, not being able to distinguish positions that are no longer visited in practice (due to the preceding reversible moves have learned the search how to alpha-beta prune them would be a severe disadvantage.

If the age clock is used by it self I would agree, but there is other information in the TT entry that could be of help. Namely the part of the Zorbist key used to differentiate two entries that hash to the same TT bucket. Even if no special form of Zorbist key is used an NN could in theory be trained to use this information to distinguish between position that are still useful and those that aren't. While this couldn't be made to be 100% accurate it would likely be much better than what's commonly in use today. It's difficult to see, from a human prospective how this would work, but a NN could pick out the patterns that you would miss in your wildest imagining.

If the Zorbist key were constructed so that the part stored in the TT entry contained, a pawn constellation key and a piece constellation key, either combined or separately, it becomes easier to see how this would work. Since this information needs to be stored some where in the key in any case, little if any loss of efficiency of the Zorbist key would result. I'm not sure this is necessary as I'm pretty sure a NN would find the patterns in any case, but it's easier to think about how it would work if we mentally separate these parts of the key.

hgm wrote: ↑Fri Jul 31, 2020 4:08 pm I also doubt that neural nets can be of much use here. Because there isn't much you could feed to them to base their decision on. Unless you would store much more info in the hash entries, the only info you can go on is the depths and possibly ages. And that it is harmless to overwrite an entry when its age tells you the search no longer visits it seems a no-brainer.

I think you're wrong, I think there is already enough information in the TT to make this possible. You might shorten training times if the needed data were separated in the Zorbist key. As far as finding positions that are no longer useful even though a reversible move has been made would be left up to the NN. This is a simple enough problem I think it will find the patterns that everyone else has missed.

Regards,

Zenmastur

hgm · Post by **hgm** » Fri Jul 31, 2020 11:32 pm

As even with bucket sizes of 4 there would virtually always be something of very little value in the bucket, and a cache line can hold up to 7 entries if you economize on entry size, there seems to be no significant upside to using multiple cache lines, and there definitely is a down-side. So the burden of proof would be on the multi-line scheme. Another point to keep in mind is that modern CPUs often use automatic prefetching of sequential memory accesses: when you access two neighboring lines, it will attempt prefetching the third. Which backfires when you use two-line buckets.

Even if you coulrd reconstruct the complete position from the hash key, how could that possibly tell you how important the position is? Much more relevant is to which depth it was searched, how many nodes that took, how many nodes ago it was first encountered...

Zenmastur · Post by **Zenmastur** » Sat Aug 01, 2020 1:33 am

hgm wrote: ↑Fri Jul 31, 2020 11:32 pm As even with bucket sizes of 4 there would virtually always be something of very little value in the bucket,

I guess this would be determined by the amount of pressure there is on the TT. If there is little or no TT pressure I agree there will be plenty of space in the TT. But this isn't the situation anyone cares about. When the TT is massively oversubscribed replacing the least valuable entry is much more important. Being able to determine that a vast number of entries are completely worthless, especially those associated with deep searches and regardless of the number of cut offs they caused in the past would seem to be of value.

hgm wrote:and a cache line can hold up to 7 entries if you economize on entry size,

Really?? Which top tier program uses a bucket size of 7? Or even 6? I don't know of any. Although I must admit I haven't looked. The last time I checked (it's been a while) SF used a bucket size of 3 with two buckets per cache line. I'm not sure this is still true as they recently changed their TT structure to accommodate much larger TT sizes, so I suppose its changed.

hgm wrote:Even if you coulrd reconstruct the complete position from the hash key, how could that possibly tell you how important the position is? Much more relevant is to which depth it was searched, how many nodes that took, how many nodes ago it was first encountered...

The depth of search is completely useless information if the position can no longer occur. Further, if the aging method uses depth of search as a criteria for determining when to replace an entry this will simple clog the TT with entries that are no longer useful but haven't aged out yet due to their depth. I run into this problem all the time. If the search is continued long enough the search will become pathological due to huge number of useless entries that have been searched to great depth. Clearing the TT solves this problem, but it also loses all the info in the TT which defeats the purpose of a long search in the first place.

[Moderation] The remainder of this message was lost due to moderator error.

Zenmastur · Post by **Zenmastur** » Sat Aug 01, 2020 12:32 pm

Zenmastur wrote: ↑Sat Aug 01, 2020 1:33 am
hgm wrote: ↑Fri Jul 31, 2020 11:32 pmI guess this would be determined by the amount of pressure there is on the TT. If there is little or no TT pressure I agree there will be plenty of space in the TT. But this isn't the situation anyone cares about. When the TT is massively oversubscribed replacing the least valuable entry is much more important. Being able to determine that a vast number of entries are completely worthless, especially those associated with deep searches and regardless of the number of cut offs they caused in the past would seem to be of value.
Sure, replacement only becomes an issue with severely overloaded tables. I could detect virtually no effect on time-to-depth until the number of nodes gets to be more than 10 times the table size. But the statement isn't really dependent on that, but more on the distribution of nodes in the tree. This is vastly dominated by low-depth nodes. And all competitive replacement schemes always store every node. So the lowest depth in the bucket will reflect the distribution of nodes in the tree, and thus be almost always be 0 or 1, not saving you very much search effort on a cutoff hit.

And even in a table with 100K buckets (extremely small by today's memory standards) an entry stored in the always-replace slot of a bucket would sit there for 100K nodes on average before that bucket is visited for a probe on another position, as these probes should be distributed randomly over the table. That is independent of tree size. During that period it will be available for probing by transpositions, and will cause several hits due to transpositions with moves 2, 4 or 6 ply earlier, as long as the sub-tree from those positions is not yet larger than 100K unique nodes.

hgm wrote: ↑Fri Jul 31, 2020 11:32 pmReally?? Which top tier program uses a bucket size of 7? Or even 6? I don't know of any. Although I must admit I haven't looked. The last time I checked (it's been a while) SF used a bucket size of 3 with two buckets per cache line. I'm not sure this is still true as they recently changed their TT structure to accommodate much larger TT sizes, so I suppose its changed.
Well, two buckets of three is 6 per cache line, right? The fact that they split the line into two buckets, while there would have been no downside in terms of memory bandwidth to use a bucket of 6, should already tell you that bucket size is far from being a problem.

BTW, I am not really convinced that 'top tier' programs use good TT replacement. They are often tested in a regime where the table is far from overloaded, so that having a good replacement scheme is not on their agenda. Even with simplistic schemes like shallowest-of-4 the search speed is only a very weak function of table size.

hgm wrote: ↑Fri Jul 31, 2020 11:32 pmThe depth of search is completely useless information if the position can no longer occur.
Sure. This is why it is also relevant to know how long ago it was last used.

Further, if the aging method uses depth of search as a criteria for determining when to replace an entry this will simple clog the TT with entries that are no longer useful but haven't aged out yet due to their depth. I run into this problem all the time. If the search is continued long enough the search will become pathological due to huge number of useless entries that have been searched to great depth. Clearing the TT solves this problem, but it also loses all the info in the TT which defeats the purpose of a long search in the first place.
You are making a flawed assumption here. "Using depth as a criterion for determining when to replace an entry" is not at all the same as mindlessly hanging on to the entries of the largest depth. Always replacing the highest depth would also be a decision that uses no other information than the depth. (Not a very smart one, of course...) To prevent the problem you describe (which is all too real) we have equi-distributed-depth replacement, or under-cut replacement.

Hmmm....

Why is this post under my name? I didn't make it!

And why do I appear to be quoting you when it's actually the other way around?

Are you hacking my account?

hgm · Post by **hgm** » Sat Aug 01, 2020 1:10 pm

Oop, I goofed again.

I am so sorry. I must have accidentally hit the Edit instead of the Quote button...

Unfortunately there is no way to recover the original posting, but since I quoted most of it in my reply, a large part of it can still be salvaged. I will try to repair as much as I can.

hgm · Post by **hgm** » Sat Aug 01, 2020 1:12 pm

Zenmastur wrote: ↑Sat Aug 01, 2020 1:33 amI guess this would be determined by the amount of pressure there is on the TT. If there is little or no TT pressure I agree there will be plenty of space in the TT. But this isn't the situation anyone cares about. When the TT is massively oversubscribed replacing the least valuable entry is much more important. Being able to determine that a vast number of entries are completely worthless, especially those associated with deep searches and regardless of the number of cut offs they caused in the past would seem to be of value.

Sure, replacement only becomes an issue with severely overloaded tables. I could detect virtually no effect on time-to-depth until the number of nodes gets to be more than 10 times the table size. But the statement isn't really dependent on that, but more on the distribution of nodes in the tree. This is vastly dominated by low-depth nodes. And all competitive replacement schemes always store every node. So the lowest depth in the bucket will reflect the distribution of nodes in the tree, and thus be almost always be 0 or 1, not saving you very much search effort on a cutoff hit.

And even in a table with 100K buckets (extremely small by today's memory standards) an entry stored in the always-replace slot of a bucket would sit there for 100K nodes on average before that bucket is visited for a probe on another position, as these probes should be distributed randomly over the table. That is independent of tree size. During that period it will be available for probing by transpositions, and will cause several hits due to transpositions with moves 2, 4 or 6 ply earlier, as long as the sub-tree from those positions is not yet larger than 100K unique nodes.

Zenmastur wrote: ↑Fri Jul 31, 2020 11:32 pmReally?? Which top tier program uses a bucket size of 7? Or even 6? I don't know of any. Although I must admit I haven't looked. The last time I checked (it's been a while) SF used a bucket size of 3 with two buckets per cache line. I'm not sure this is still true as they recently changed their TT structure to accommodate much larger TT sizes, so I suppose its changed.

Well, two buckets of three is 6 per cache line, right? The fact that they split the line into two buckets, while there would have been no downside in terms of memory bandwidth to use a bucket of 6, should already tell you that bucket size is far from being a problem.

BTW, I am not really convinced that 'top tier' programs always use good TT replacement. They are often tested in a regime where the table is far from overloaded, so that having a good replacement scheme is not on their agenda. Even with simplistic schemes like shallowest-of-4 the search speed is only a very weak function of table size.

Zenmastur wrote: ↑Fri Jul 31, 2020 11:32 pmThe depth of search is completely useless information if the position can no longer occur.

Sure. This is why I mentioned age as one of the other criteria.

Further, if the aging method uses depth of search as a criteria for determining when to replace an entry this will simple clog the TT with entries that are no longer useful but haven't aged out yet due to their depth. I run into this problem all the time. If the search is continued long enough the search will become pathological due to huge number of useless entries that have been searched to great depth. Clearing the TT solves this problem, but it also loses all the info in the TT which defeats the purpose of a long search in the first place.

You are making a flawed assumption here. "Using depth as a criterion for determining when to replace an entry" is not at all the same as mindlessly hanging on to the entries of the largest depth. Always replacing the highest depth in the bucket would also be a decision that uses no other information than the depth. (Not a very smart one, of course... But with completely different behavior.) To prevent the problem you describe (which is all too real) we have equi-distributed-depth replacement, or under-cut replacement.

OliverBr · Post by **OliverBr** » Sun Aug 02, 2020 1:11 am

Tony P. wrote: ↑Mon Jul 13, 2020 5:05 pm What are some examples of fast and strong engines with non-Zobrist hashes?

OliThink is using self defined hash tables and has no collisions at all. Moreover, it uses a own random generator within its code, so the results are independent of the machine.

It uses:
1) A predefined array with random values for 12 bits. It's important that those values are perfectly random.

Code: Select all

static u64 hashxor[4096];
for (i = 0; i < 4096; i++) hashxor[i] = _rand_64();

"hashb" being the 64-bit absolute unique hash value of a particular position is being updated with every move. "f" ist the square (0..63), "p" the piece (0...7) and "c" the color on move (0...1).
This hash value ist updated by:

Code: Select all

u64 hashb;
#define XORHASH(f, p, c) hashb ^= hashxor[(f) | (p) << 6 | (c) << 9]

where en-passant captures get an own value of "p".
With XOR you can use the same method for moving and unmoving a piece from the square "f" to square "t":

Code: Select all

void move(Move m, int c, int d) {
        int f = FROM(m);
        int t = TO(m);
        int p = PIECE(m);
        XORHASH(f, p, c);
        XORHASH(t, p, c);
        ...

Of course, captures and promotions need more operations.

PS: The 64 bit random generator:

Code: Select all

typedef unsigned long long u64;
typedef unsigned long u32;
typedef int Move;

#define LOW16(x) ((x) & 0xFFFF)
#define LOW32(x) ((x) & 0xFFFFFFFF)
static u32 r_x = 30903, r_y = 30903, r_z = 30903, r_w = 30903, r_carry = 0;
u32 _rand_32() {
        r_x = LOW32(r_x * 69069 + 1);
        r_y ^= LOW32(r_y << 13);
        r_y ^= LOW32(r_y >> 17);
        r_y ^= LOW32(r_y << 5);
        u32 t = LOW32((r_w << 1) + r_z + r_carry);
        r_carry = (LOW32(r_z >> 2) + LOW32(r_w >> 3) + LOW32(r_carry >> 2)) >> 30;
        r_z = r_w;
        r_w = t;
        return LOW32(r_x + r_y + r_w);
}

u64 _rand_64() { u64 c = _rand_32(); return _rand_32() | (c << 32); }

Note, that in ANSI C you cannot simply do:

Code: Select all

u64 _rand_64() { return _rand_32() | (_rand_32(); << 32); }

because the order of the OR-Operation is not defined.

Kotlov · Post by **Kotlov** » Tue Aug 11, 2020 9:40 am

OliverBr wrote: ↑Sun Aug 02, 2020 1:11 am PS: The 64 bit random generator:

Code: Select all

typedef unsigned long long u64;
typedef unsigned long u32;
typedef int Move;

#define LOW16(x) ((x) & 0xFFFF)
#define LOW32(x) ((x) & 0xFFFFFFFF)
static u32 r_x = 30903, r_y = 30903, r_z = 30903, r_w = 30903, r_carry = 0;
u32 _rand_32() {
        r_x = LOW32(r_x * 69069 + 1);
        r_y ^= LOW32(r_y << 13);
        r_y ^= LOW32(r_y >> 17);
        r_y ^= LOW32(r_y << 5);
        u32 t = LOW32((r_w << 1) + r_z + r_carry);
        r_carry = (LOW32(r_z >> 2) + LOW32(r_w >> 3) + LOW32(r_carry >> 2)) >> 30;
        r_z = r_w;
        r_w = t;
        return LOW32(r_x + r_y + r_w);
}

u64 _rand_64() { u64 c = _rand_32(); return _rand_32() | (c << 32); }

Why this so complexity?

P.S.
my engine initialize function hash keys:

Code: Select all

void initialize_hash_keys()
{
    for(U64 m=6364136223846793005,s=1,n=1,i=768; i--; hashkeys[i]=n=n*m+s);
}

OliverBr · Post by **OliverBr** » Tue Aug 11, 2020 11:09 pm

Kotlov wrote: ↑Tue Aug 11, 2020 9:40 am Why this so complexity?

Because I thought this was a perfect random generator.

P.S.
my engine initialize function hash keys:
Code: Select all
void initialize_hash_keys()
{
    for(U64 m=6364136223846793005,s=1,n=1,i=768; i--; hashkeys[i]=n=n*m+s);
}

This would be very interesting, if it had the same level of randomness. Are you sure of it? Am I free to use it?

EDIT: I see this magic number in https://en.wikipedia.org/wiki/Linear_co ... _generator.

Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT