Data structure choice for TT

hgm · Post by **hgm** » Mon Jul 13, 2020 11:08 pm

Pio wrote: ↑Mon Jul 13, 2020 10:12 pm
Tony P. wrote: ↑Mon Jul 13, 2020 5:52 pm
Tony P. wrote: ↑Mon Jul 13, 2020 5:05 pm What are some examples of fast and strong engines with non-Zobrist hashes?
Oops, Leela is an obvious example (of a strong one). Its hash function, based on a few additions, multiplications, bit shifts and concatenations, doesn't seem slow.
My engine does also not use Zobrist but uses instead a series of multiplications, shifts and XOR-ing but of course my engine is really bad. I think Zobrist-hashing is extremely smart and much better than what I currently do.

I have one idea for a maybe new type of replacement strategy. What about using a scheme that keeps the read entries and kicks out the seldom accessed ones. My idea is as follow (let’s take a four entry bucket as an example):

1) when saving an entry always save it as the last entry and kick out the existing entry at its place, i.e. save the new position as entry 4 by overwriting what is there (should not be done if the positions are the same and the depth is less for the new one, then you just keep the old information since it is more precious)

2) when reading the TT and getting a hit you bump up the read entry and bump down the entry that is replaced. For example if you access/read entry 3 you bump it up to entry 2 and bump the previous entry 3 down to entry 2. If you read the first entry you don’t do anything.

I guess my replacement strategy will work better for those engines that also probe interior nodes since otherwise my replacement strategy might kick out deeper entries too much.

Some nice properties of the strategy is that:
1) it is very simple
2) unreachable positions will quickly be kicked out since they won’t be accessed
3) You will not need an age field and can thus make the entries a little bit smaller and simplify the logic
4) If the TT is probed for interior nodes the TT will keep more of the deeper entries

What do you think?

/Pio

Everyone probes for interior nodes, the main point of the TT is to get the cut-move from the previous iteration. Leaf nodes would not have been searched in the previous iteration, and getting their score from a transposition will hardly save any work, as you already are in a leaf, so that there is no tree to search to get the score.

Your method is not simpler than what is usually done: you would have to swap entries, and entries usually consist of more than a single machine word. Normally one leaves the entries in place, and just overwrites one of those.

The way the deepest entries are preserved in your scheme is rather indirect and unreliable. Reading the depth directly is simple and reliable.

Note that the way we walk the tree is such that transpositions occur with decreasing frequencies. If a position can be reached by 4 independent moves of player A (assuming player B always played the same moves) A-B-C-D, there are 4! = 24 transpositions. The first one you encounter is A-B-D-C, which needs a different move 3 ply before the leaf, which happens fast). Then you will encounter A-C-D-B, which required a move change 5 ply before the leaf, which takes much longer. (You will skip A-C-B-D, which already gave a hash cut at A-C-B.) So accesses occur in bursts. This can easily make a shallow entry temporarily pass a deep entry in your scheme, leading to overwriting of the deep entry.

Tony P. · Post by **Tony P.** » Mon Jul 13, 2020 11:19 pm

hgm wrote: ↑Mon Jul 13, 2020 11:08 pm So accesses occur in bursts. This can easily make a shallow entry temporarily pass a deep entry in your scheme, leading to overwriting of the deep entry.

Ah, so that's why Leela is using the LRU cache replacement policy instead of the LFU one. Thanks for enlightening me again!

Pio · Post by **Pio** » Mon Jul 13, 2020 11:57 pm

Tony P. wrote: ↑Mon Jul 13, 2020 11:19 pm
hgm wrote: ↑Mon Jul 13, 2020 11:08 pm So accesses occur in bursts. This can easily make a shallow entry temporarily pass a deep entry in your scheme, leading to overwriting of the deep entry.
Ah, so that's why Leela is using the LRU cache replacement policy instead of the LFU one. Thanks for enlightening me again!

If I am not mistaken the LRU is an equivalent method as I proposed but uses a little bit more space than mine Since I do not need an age field. Of course I think HGM knows much better than me since I have not done any experiments but I don’t think it will be so easy to replace a deep entry since it will be accessed more often and the deep entries will usually be at position 1 and 2 and will need 4 respectively 3 replacements before being kicked out for good.

/Pio

hgm · Post by **hgm** » Tue Jul 14, 2020 12:16 am

Tony P. wrote: ↑Mon Jul 13, 2020 11:19 pm
hgm wrote: ↑Mon Jul 13, 2020 11:08 pm So accesses occur in bursts. This can easily make a shallow entry temporarily pass a deep entry in your scheme, leading to overwriting of the deep entry.
Ah, so that's why Leela is using the LRU cache replacement policy instead of the LFU one. Thanks for enlightening me again!

Indeed, and it is also the reason why it is so important to always store results of a search, even if you have to overwrite a deeper result that represented more effort to obtain it. The chances that you will quickly get a hit on the new result will be much larger. And when the table gets overloaded, the important matter is how efficiently you use the TT slots. An slot that holds a large number of cheaply obtained entries, each given one quick hit, for a very short time one after the other can save you more work than a very deep entry that was hard to obtain, but which had to occupy the slot for very long before you got the first hit on it. The turn-over of deep results will always be much slower than that of shallow results, and probes for shallow results hardly ever hit deep results. Most transpositions are found at the same ply level of the tree.

So one should not overdo the effort for preserving very deep results, and make sure all depths are reasonably represented in the table. Devoting a larger number of slots to a certain depth only increases the fraction of hits at that tree level logarithmically, and thus has diminishing returns. The best over-all savings factor will be reached when all depths have an equal number of slots in the table, in that model.

'Under-cut replacement' is a poor-man's approximation to that: it allows large depths to survive longer than smaller depths, because you do not allow replacement by entries that are more than 1 ply less deep. So that the frequency of arrival of entries that could replace them gets lower and lower as the depth gets higher. Yet the slot will not be dedicated to ever higher depth, as the frequency of arrival of a depth exactly 1 lower will be larger than that of all higher or same depth together. So that eventually it will be recycled for holding low depth.

Pio · Post by **Pio** » Tue Jul 14, 2020 12:19 am

hgm wrote: ↑Mon Jul 13, 2020 11:08 pm
Pio wrote: ↑Mon Jul 13, 2020 10:12 pm
Tony P. wrote: ↑Mon Jul 13, 2020 5:52 pm
Tony P. wrote: ↑Mon Jul 13, 2020 5:05 pm What are some examples of fast and strong engines with non-Zobrist hashes?
Oops, Leela is an obvious example (of a strong one). Its hash function, based on a few additions, multiplications, bit shifts and concatenations, doesn't seem slow.
My engine does also not use Zobrist but uses instead a series of multiplications, shifts and XOR-ing but of course my engine is really bad. I think Zobrist-hashing is extremely smart and much better than what I currently do.

I have one idea for a maybe new type of replacement strategy. What about using a scheme that keeps the read entries and kicks out the seldom accessed ones. My idea is as follow (let’s take a four entry bucket as an example):

1) when saving an entry always save it as the last entry and kick out the existing entry at its place, i.e. save the new position as entry 4 by overwriting what is there (should not be done if the positions are the same and the depth is less for the new one, then you just keep the old information since it is more precious)

2) when reading the TT and getting a hit you bump up the read entry and bump down the entry that is replaced. For example if you access/read entry 3 you bump it up to entry 2 and bump the previous entry 3 down to entry 2. If you read the first entry you don’t do anything.

I guess my replacement strategy will work better for those engines that also probe interior nodes since otherwise my replacement strategy might kick out deeper entries too much.

Some nice properties of the strategy is that:
1) it is very simple
2) unreachable positions will quickly be kicked out since they won’t be accessed
3) You will not need an age field and can thus make the entries a little bit smaller and simplify the logic
4) If the TT is probed for interior nodes the TT will keep more of the deeper entries

What do you think?

/Pio
Everyone probes for interior nodes, the main point of the TT is to get the cut-move from the previous iteration. Leaf nodes would not have been searched in the previous iteration, and getting their score from a transposition will hardly save any work, as you already are in a leaf, so that there is no tree to search to get the score.

Your method is not simpler than what is usually done: you would have to swap entries, and entries usually consist of more than a single machine word. Normally one leaves the entries in place, and just overwrites one of those.

The way the deepest entries are preserved in your scheme is rather indirect and unreliable. Reading the depth directly is simple and reliable.

Note that the way we walk the tree is such that transpositions occur with decreasing frequencies. If a position can be reached by 4 independent moves of player A (assuming player B always played the same moves) A-B-C-D, there are 4! = 24 transpositions. The first one you encounter is A-B-D-C, which needs a different move 3 ply before the leaf, which happens fast). Then you will encounter A-C-D-B, which required a move change 5 ply before the leaf, which takes much longer. (You will skip A-C-B-D, which already gave a hash cut at A-C-B.) So accesses occur in bursts. This can easily make a shallow entry temporarily pass a deep entry in your scheme, leading to overwriting of the deep entry.

You are right that most if not all engines probe the interior nodes. Of course my engine does it. I just wanted to say that it is a very important condition since otherwise my replacement strategy would fail miserably.

I do think that you might be mistaken of how easy it will be to kick out a deep entry since if a shallower entry is accessed a lot in bursts it will just put it to the nr 1 position. It will at most put down the deep entry by one position no matter how many times it is accessed.

/Pio

Tony P. · Post by **Tony P.** » Tue Jul 14, 2020 12:21 am

Pio wrote: ↑Mon Jul 13, 2020 11:57 pm If I am not mistaken the LRU is an equivalent method as I proposed but uses a little bit more space than mine Since I do not need an age field.

LRU doesn't require an age field - it's indeed enough to put entries into a queue like you've proposed. That way, you save space at the cost of the time to move entries within the queue. Having an age field makes sense if the entries are large and the buckets would have padding bytes otherwise (aligning them to cache lines).

Pio · Post by **Pio** » Tue Jul 14, 2020 2:55 am

Tony P. wrote: ↑Tue Jul 14, 2020 12:21 am
Pio wrote: ↑Mon Jul 13, 2020 11:57 pm If I am not mistaken the LRU is an equivalent method as I proposed but uses a little bit more space than mine Since I do not need an age field.
LRU doesn't require an age field - it's indeed enough to put entries into a queue like you've proposed. That way, you save space at the cost of the time to move entries within the queue. Having an age field makes sense if the entries are large and the buckets would have padding bytes otherwise (aligning them to cache lines).

Hi Tony!

The method of using age fields Is not equivalent to my method, but is very similar. A very frequently accessed entry is less likely to be kicked out with the age fields since it is not just enough that all the other entries are visited once to move them up in hierarchy. My method is closer to an always replace strategy than LRU is (if I understood it correctly from your link). There are pros and cons with my strategy compared to LRU. The cons is that my strategy kicks out deeper more valuable entries easier. The pros is that my method won’t keep really old entries that are less likely to be accessed and might not even be reachable after one has made a move.

/Pio

bob · Post by **bob** » Sun Jul 19, 2020 4:57 am

This is a bit confusing using normal terminology. "age" refers to the search that an entry is stored. IE it advances once for each real move played on the board. The idea is to let you recognize entries that come from the previous search (which might be great for the current search or completely useless). When you choose to store an entry, the 'old age" entries are the first to be overwritten. If one of those old entries helps, its age is updated to current so that normal replacement rules will apply.

It sounds more like you are calling "age" some sort of time-stamp. Can't comment on that as I have never tried it.

Pio · Post by **Pio** » Sun Jul 19, 2020 3:45 pm

bob wrote: ↑Sun Jul 19, 2020 4:57 am This is a bit confusing using normal terminology. "age" refers to the search that an entry is stored. IE it advances once for each real move played on the board. The idea is to let you recognize entries that come from the previous search (which might be great for the current search or completely useless). When you choose to store an entry, the 'old age" entries are the first to be overwritten. If one of those old entries helps, its age is updated to current so that normal replacement rules will apply.

It sounds more like you are calling "age" some sort of time-stamp. Can't comment on that as I have never tried it.

Sorry Bob, I used the “age”-terminology in a bad way. I just read how the LRU worked (by clicking the link from Tony). It seems LRU is closely related to my method. LRU could be used in engines efficiently I guess but then you would have to prepend what you and everyone else in the chess community call age to the LRU-age as the most significant digits.

I cannot yet test my swapping method because I use the built in c#-dictionary as my TT, and clear it after each move. I know it is really bad but I work very little with my engine and I first just want something working that I can improve on.

It would be really fun if you have time to test my method. It should be very easy to implement at least on a single threaded engine.

/Pio

Zenmastur · Post by **Zenmastur** » Thu Jul 30, 2020 6:19 pm

bob wrote: ↑Sun Jul 19, 2020 4:57 am This is a bit confusing using normal terminology. "age" refers to the search that an entry is stored. IE it advances once for each real move played on the board. The idea is to let you recognize entries that come from the previous search (which might be great for the current search or completely useless). When you choose to store an entry, the 'old age" entries are the first to be overwritten. If one of those old entries helps, its age is updated to current so that normal replacement rules will apply.

It sounds more like you are calling "age" some sort of time-stamp. Can't comment on that as I have never tried it.

I guess a more useful definition of age would be the number of irreversible moves made. A simple piece move is reversible and so doesn't change the positions that can be reached in the tree (excluding half move clock differences). Captures and pawn pushes permanently remove positions from ever occurring in the tree again. At most there can be 96 pawn moves and 30 captures. So a single byte can encompass the aging clock for an entire game. Any TT entry with an age less than the current age clock should be replaced before any entry that has the current age regardless of any other parameters because that entry has become completely useless.

hgm wrote: ↑Fri Jul 10, 2020 10:21 am Note that hash tables as used for the Transposition Table in alpha-beta tree search is not really a standard hash table. Normally, loss of data by overwriting would be completely unacceptable. So the 'standard' hashing methods have developed all kind of techniques for adding new entries to a nearly full table. These can sometimes become very time consuming when you are looking for an empty slot in an almost saturated table, and interfere with the ability to find entries if others are deleted. The discussion about the various techniques focuses on this aspect: e.g. "do I have to do 100 retries in a table that is 99% filled, or just 20".

For chess this is all totally unimportant. It is unavoidable that your hash table gets overloaded, as you search more nodes than that you have memory to store them. So you will have to live with replacement. Once you accepted that, expensive methods for avoiding replacement while you are filling the last 1% (say) of an initially empty TT become totally non-competitive. Just start the replacement that you will be forced to do when the table is 100.0% filled from the very start.

The commonly used replacement methods are pretty much optimal in this respect. ...

This statement may be correct but it's also misleading. It gives the impression that the current replacement algorithms are “optimal” in a more general respect. I highly doubt that they are. I suspect that a very small NN could be trained to find a much more optimal replacement strategy than any currently used. I also suspect that the optimal buckets size is larger than those commonly in use today. I guess if you could find the simplest NN that does a significantly better job of choosing which entries to replace that you might be able to encode it in a non-NN form so it's fast enough to be of use.

I know you have stated many times in the past that a TT can't/doesn't add much to the engines ELO. I suspect this would change if a more optimal TT replacement algorithm were readily available.

Regards,

Zenmastur

Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT

Re: Data structure choice for TT