I want to implement history in an engine that is supposed to play games with board sizes upto 36 x 36. The larger the board, the more positions will resemble each other throughout the tree, as almost all pieces stay in place, and only a tiny fraction of them will move in the depth you can reach. So I expect history to work really great for huge boards.
But using a 1296 x 1296 array of counters doesn't seem a great idea. Even with 16-bit counters that would be 3.4 MB, and exceed even the L3 cache on most CPUs. Of course most of the (from,to) entries would not be used ever, as moves tend to go only along files, ranks and diagonals, or with very close range (e.g. Knight jumps). A Queen on a 36 x 36 board would have at most 4x35-1 = 139 moves, so squeezing out the non-existing moves by clever indexing (e.g. using a 0x88-type square numbering and a 72x72 array indexed by from-to as a translation table to indirectly access the counters) would save an order of magnitude. But 320KB is still uncomfortably large (exceeding L2 on modern CPUs). Using a (color, type, to) history table is also no big gain, as there are over 250 piece types.
So I thought of hashing the history counters. History is not a very precise algorithm, and an occasional loss of a high count or mis-identification of a move giving it an undeserved high count is not a disaster. (The (from, to) scheme suffers from that too, as it ignores piece type.) It is a bit more tricky than hashing for the Transposition Table, however. There you know in advance what depth the entries you want to store have, and can use that info to decide on replacement. With history every unstored entry might now have a count of 1 for as far as you know, and you have to give it a chance to accumulate counts for some time before you can start to recognize their importance, and start to protect their presence in the table.
The whole scheme only makes sense if you could do it in a table significantly smaller than L2 (256KB on modern CPUs). The number of moves likely to be present in a position can be estimated by reasoning from the to-squares: each square can be reached from 8 directions, and if you scan in those directions you will eventually encounter something. The chances that you are out of range of that piece are appreciable, as at least half of the piece types will not move in any given direction at all, and those that do mostly move only a single step, not enough to reach you. So that means that each square would be attacked by only about 2 pieces on average. In addition there would be a small number of moves by the occasional piece with an 'irregular' move (such as Knights). So that makes only some 2,500 moves, so that a 4K-entry hash table would not seem overloaded. That sounds a lot more reasonable; if you want to score every non-capture for history, you want access to the history counters to be fast, so it would be a definite advantage if it would fit in L1!
In the simplest implementation you would not worry about collisions at all. The table index could be the difference of the Zobrist keys for the from and the to-square (the to-square using the promoted piece type if promotion took place) clipped to the required number of bits. (Assuming the table would only be used for non-captures, so there is no need to distinguish by victim.) Because the table will be close to full, there will be many collissions. If these are completely undetected, (ie. if you don't store a signature of any kind in the table) the history counts of all moves mapping to the same entry would simply be lumped together, non-deserving moves mapping there stealing the good history value of the usefull moves, and there would be no such thing as 'replacement'.
It might be good to be a bit more clever, and store a 1-byte signature. (It is very unlikely more than 16 moves would map to the same table entry, so 1 byte should be enough.) You could store the signature of the last contributor to the count there, and assume on probing that the entire count is due to that move, and other moves mapping to that entry would receive zero history score. (Not very accurate, though.) You could also remember the two most-recent contributors, with a killer-like replacement, distributing the count evenly over the two of them.
Another method could group the entries into buckets. When you remember the most-recent contributor to each entry in the bucket, and always add a count of a new move mapping to the bucket to the entry with the lowest count (usurping the place of most-recent contribotor for it), the count would tend to be equally divided over the entries in the bucket until one move surfaces as most-recent contributor that performs better than any of the lumped moves together: this counter would then be exclusively contributed to by that move, able to keep ahead of the others despite their lumping. If you do this in buckets of 4, you would be able to fish out 3 good moves from each bucket, as long as the third-best is able to outcompete number 4 and lower lumped together.
History heuristic and big boards
Moderators: hgm, Dann Corbit, Harvey Williamson
-
hgm
- Posts: 27703
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: History heuristic and big boards
I confused this game with chu-shogi, which had a reasonable board size but way too many pieces. Well it goes without saying 36x36 is going to be
hard on the cpu in general. Mayb be handling four 9x9 boards separately is a good idea even though that does not help with your current issue.
I have heard that the 19x19 go board benefits very much from such kind of local analysis. Well Go is such a game anyway. My engine never makes a move away from the proximity of the the last move. Well chess is a bit different as there are long range pieces, but unless the inventors of tyoku shogi are really crazy they would limit the number of such pieces.
benefit of local information by hashing the last move's to square. It means you have to check the hash table not only for the best move but for other information
that will help in move ordering. Sounds a good idea for such a large board. I am sure you would have way too many situations in qsearch where you have to make
drastic measures.
and was a bit surprized not to find one quickly with reasonable TB sizes. I went down to 1kb and started encountering them in a 1 sec search. I think it is a good sanity check for TT for anyone that hasn't done it.
The table will be full as you said so that raises efficiency questions for any replacement scheme you may use. But I understand you don't replace but just always increment
and probably clear up every now and then, sounds like a good idea.
it seems reasonable. I guess that is what you meant by the latter killer like scheme but only to the last two recent moves.
As a side note, I remembered that Daniel Homan used hashing for killers/counter-moves, so this vindicates use of hashing as long as you think the idea brings significant advantages.
hard on the cpu in general. Mayb be handling four 9x9 boards separately is a good idea even though that does not help with your current issue.
I have heard that the 19x19 go board benefits very much from such kind of local analysis. Well Go is such a game anyway. My engine never makes a move away from the proximity of the the last move. Well chess is a bit different as there are long range pieces, but unless the inventors of tyoku shogi are really crazy they would limit the number of such pieces.
IOW an impossible game to handle directly, so indeed storing most relevant information through hashing sounds a good idea. You might also want to takeBut using a 1296 x 1296 array of counters doesn't seem a great idea. Even with 16-bit counters that would be 3.4 MB, and exceed even the L3 cache on most CPUs. Of course most of the (from,to) entries would not be used ever, as moves tend to go only along files, ranks and diagonals, or with very close range (e.g. Knight jumps). A Queen on a 36 x 36 board would have at most 4x35-1 = 139 moves, so squeezing out the non-existing moves by clever indexing (e.g. using a 0x88-type square numbering and a 72x72 array indexed by from-to as a translation table to indirectly access the counters) would save an order of magnitude. But 320KB is still uncomfortably large (exceeding L2 on modern CPUs). Using a (color, type, to) history table is also no big gain, as there are over 250 piece types.
benefit of local information by hashing the last move's to square. It means you have to check the hash table not only for the best move but for other information
that will help in move ordering. Sounds a good idea for such a large board. I am sure you would have way too many situations in qsearch where you have to make
drastic measures.
That the size is controllable is an advantage since so you can make it any size less than 256kb to be nice to the rest of the code. Recently I wanted to reproduce real hash collisions for a parallel searchThe whole scheme only makes sense if you could do it in a table significantly smaller than L2 (256KB on modern CPUs). The number of moves likely to be present in a position can be estimated by reasoning from the to-squares: each square can be reached from 8 directions, and if you scan in those directions you will eventually encounter something. The chances that you are out of range of that piece are appreciable, as at least half of the piece types will not move in any given direction at all, and those that do mostly move only a single step, not enough to reach you. So that means that each square would be attacked by only about 2 pieces on average. In addition there would be a small number of moves by the occasional piece with an 'irregular' move (such as Knights). So that makes only some 2,500 moves, so that a 4K-entry hash table would not seem overloaded. That sounds a lot more reasonable; if you want to score every non-capture for history, you want access to the history counters to be fast, so it would be a definite advantage if it would fit in L1!
In the simplest implementation you would not worry about collisions at all. The table index could be the difference of the Zobrist keys for the from and the to-square (the to-square using the promoted piece type if promotion took place) clipped to the required number of bits. (Assuming the table would only be used for non-captures, so there is no need to distinguish by victim.) Because the table will be close to full, there will be many collissions. If these are completely undetected, (ie. if you don't store a signature of any kind in the table) the history counts of all moves mapping to the same entry would simply be lumped together, non-deserving moves mapping there stealing the good history value of the usefull moves, and there would be no such thing as 'replacement'.
and was a bit surprized not to find one quickly with reasonable TB sizes. I went down to 1kb and started encountering them in a 1 sec search. I think it is a good sanity check for TT for anyone that hasn't done it.
The table will be full as you said so that raises efficiency questions for any replacement scheme you may use. But I understand you don't replace but just always increment
and probably clear up every now and then, sounds like a good idea.
Is it possible to add ageing to it so that you give more weight to the most recent move while keeping old information? You are storing count only (and not a move) so
It might be good to be a bit more clever, and store a 1-byte signature. (It is very unlikely more than 16 moves would map to the same table entry, so 1 byte should be enough.) You could store the signature of the last contributor to the count there, and assume on probing that the entire count is due to that move, and other moves mapping to that entry would receive zero history score. (Not very accurate, though.) You could also remember the two most-recent contributors, with a killer-like replacement, distributing the count evenly over the two of them.
it seems reasonable. I guess that is what you meant by the latter killer like scheme but only to the last two recent moves.
As a side note, I remembered that Daniel Homan used hashing for killers/counter-moves, so this vindicates use of hashing as long as you think the idea brings significant advantages.
You have to excuse me because I am writing before finishing reading the whole post. It seems that you have considered all options and it is probably inferior method to use one bucket with aging as I suggested. Btw the 16move/bucket is for what size of hash table ?Another method could group the entries into buckets. When you remember the most-recent contributor to each entry in the bucket, and always add a count of a new move mapping to the bucket to the entry with the lowest count (usurping the place of most-recent contribotor for it), the count would tend to be equally divided over the entries in the bucket until one move surfaces as most-recent contributor that performs better than any of the lumped moves together: this counter would then be exclusively contributed to by that move, able to keep ahead of the others despite their lumping. If you do this in buckets of 4, you would be able to fish out 3 good moves from each bucket, as long as the third-best is able to outcompete number 4 and lower lumped together.
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: History heuristic and big boards
Some ideas for you:hgm wrote:I want to implement history in an engine that is supposed to play games with board sizes upto 36 x 36. The larger the board, the more positions will resemble each other throughout the tree, as almost all pieces stay in place, and only a tiny fraction of them will move in the depth you can reach. So I expect history to work really great for huge boards.
But using a 1296 x 1296 array of counters doesn't seem a great idea. Even with 16-bit counters that would be 3.4 MB, and exceed even the L3 cache on most CPUs. Of course most of the (from,to) entries would not be used ever, as moves tend to go only along files, ranks and diagonals, or with very close range (e.g. Knight jumps). A Queen on a 36 x 36 board would have at most 4x35-1 = 139 moves, so squeezing out the non-existing moves by clever indexing (e.g. using a 0x88-type square numbering and a 72x72 array indexed by from-to as a translation table to indirectly access the counters) would save an order of magnitude. But 320KB is still uncomfortably large (exceeding L2 on modern CPUs). Using a (color, type, to) history table is also no big gain, as there are over 250 piece types.
So I thought of hashing the history counters. History is not a very precise algorithm, and an occasional loss of a high count or mis-identification of a move giving it an undeserved high count is not a disaster. (The (from, to) scheme suffers from that too, as it ignores piece type.) It is a bit more tricky than hashing for the Transposition Table, however. There you know in advance what depth the entries you want to store have, and can use that info to decide on replacement. With history every unstored entry might now have a count of 1 for as far as you know, and you have to give it a chance to accumulate counts for some time before you can start to recognize their importance, and start to protect their presence in the table.
The whole scheme only makes sense if you could do it in a table significantly smaller than L2 (256KB on modern CPUs). The number of moves likely to be present in a position can be estimated by reasoning from the to-squares: each square can be reached from 8 directions, and if you scan in those directions you will eventually encounter something. The chances that you are out of range of that piece are appreciable, as at least half of the piece types will not move in any given direction at all, and those that do mostly move only a single step, not enough to reach you. So that means that each square would be attacked by only about 2 pieces on average. In addition there would be a small number of moves by the occasional piece with an 'irregular' move (such as Knights). So that makes only some 2,500 moves, so that a 4K-entry hash table would not seem overloaded. That sounds a lot more reasonable; if you want to score every non-capture for history, you want access to the history counters to be fast, so it would be a definite advantage if it would fit in L1!
In the simplest implementation you would not worry about collisions at all. The table index could be the difference of the Zobrist keys for the from and the to-square (the to-square using the promoted piece type if promotion took place) clipped to the required number of bits. (Assuming the table would only be used for non-captures, so there is no need to distinguish by victim.) Because the table will be close to full, there will be many collissions. If these are completely undetected, (ie. if you don't store a signature of any kind in the table) the history counts of all moves mapping to the same entry would simply be lumped together, non-deserving moves mapping there stealing the good history value of the usefull moves, and there would be no such thing as 'replacement'.
It might be good to be a bit more clever, and store a 1-byte signature. (It is very unlikely more than 16 moves would map to the same table entry, so 1 byte should be enough.) You could store the signature of the last contributor to the count there, and assume on probing that the entire count is due to that move, and other moves mapping to that entry would receive zero history score. (Not very accurate, though.) You could also remember the two most-recent contributors, with a killer-like replacement, distributing the count evenly over the two of them.
Another method could group the entries into buckets. When you remember the most-recent contributor to each entry in the bucket, and always add a count of a new move mapping to the bucket to the entry with the lowest count (usurping the place of most-recent contribotor for it), the count would tend to be equally divided over the entries in the bucket until one move surfaces as most-recent contributor that performs better than any of the lumped moves together: this counter would then be exclusively contributed to by that move, able to keep ahead of the others despite their lumping. If you do this in buckets of 4, you would be able to fish out 3 good moves from each bucket, as long as the third-best is able to outcompete number 4 and lower lumped together.
In Komodo we found that piece/from/to is not superior to piece/to and that from/to (without piece) is inferior to both while taking more memory.
So I would suggest simply piece/to using a standard array. The best scheme however might be domain specific.
If you need to use a hash table with origin and destination I would see if I could produce a perfect hash with a reasonable table size by trying every legal move and test cases to find the hash table that works, if possible. If you cannot do that with a reasonable size table you might do better with set associative hashing - taking care that each bucket fits in a cache line. I'll bet you can away with 4 bits for the checking in a bucket-size = 4 scheme.
The beauty of this is that you can determine in advance by testing to see if all possible values can be placed in the table without collision.
By the way, the hash function hash(from, to) does not have to be very good, in fact it can be crappy. As long as your test shows that it produces no collisions the most important thing is that it be trivial to compute and fast.
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
hgm
- Posts: 27703
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: History heuristic and big boards
Thanks for the input! I will study your ideas carefully.
The inventors of this game were Buddhist monks with nothing useful on their hands. I have not studied the rules in detail yet, but there are lots of sliders. (At least, pieces that slide in some directions, and step in others, in very asymmetric ways. There also are limited-range sliders, that slide upto a maximum of 5 or 7 squares. There even are sliders that jump over (an arbitrary number of) other pieces, and capture anything they jump over (friend or foe)...) So I guess the inventors qualify at least as a bit crazy!
Fortunately it is not only hard on the CPU, but also on the human player. Thinking 2 ply ahead will probably beat any human!
Piece/to is a good suggestion. The game is such that most pieces occur only once or twice, so the from-square will almost always be the square the piece is on in the root.
The inventors of this game were Buddhist monks with nothing useful on their hands. I have not studied the rules in detail yet, but there are lots of sliders. (At least, pieces that slide in some directions, and step in others, in very asymmetric ways. There also are limited-range sliders, that slide upto a maximum of 5 or 7 squares. There even are sliders that jump over (an arbitrary number of) other pieces, and capture anything they jump over (friend or foe)...) So I guess the inventors qualify at least as a bit crazy!
Fortunately it is not only hard on the CPU, but also on the human player. Thinking 2 ply ahead will probably beat any human!
Piece/to is a good suggestion. The game is such that most pieces occur only once or twice, so the from-square will almost always be the square the piece is on in the root.