Storage and ordering strategy in move generation

Fguy64 · Post by **Fguy64** » Sun Jan 24, 2010 7:34 pm

In my engine, I have taken a square-centric approach as opposed to a piece-centric approach. So in my move generation scheme, there are four pieces of information that are stored together in a single integer. These are source and target squares, promotion piece, and a sort key for those lists that are ordered. The sort key occupies the most significant bits of the integer, to facilitate sorting of a list of moves. So a list of candidate moves for a given node is a one dimensional array of integers. Come hell or high water, I'm sticking with this plan for now.

Currently, I am doing move ordering on captures and promotions only. Thus only a small percentage of moves get ordered. So, I maintain two arrays of candidate moves, Those moves that will be ordered, and those moves that won't. This means that in alphaBeta search, there are two more or less identical blocks of code, that differ only in the list of moves that are being searched. I maintain two arrays so I don't have to sort moves that aren't going to be ordered anyway. This approach seems to work quite well.

I am aware that there are additional ordering criteria besides just captures. So I'm wondering if I shouldn't just switch to one array now, and sort the whole list. As my program becomes more sophisticated, I expect that maybe more and more moves will be ordered, and there might be little point in maintaining both an ordered and an unordered list.

How do other people handle this?

hgm · Post by **hgm** » Sun Jan 24, 2010 7:53 pm

In HaQiKi D I use a move stack that consists of blocks of ~200 moves, one block per ply level. So I have a ' frame pointer' into this stack that is incremented by 200 each time I enter search. I then start two stacks from this frame pointer, one for the non-captures that grows upward, another for the capture that grows downward. That way I get a single contiguous block of moves, (from firstCapture to lastNonCapture) with the captures leading. I then only sort the part (firstCapture to framePointer).

I also put the sort key in the high byte of the move word. I use the spare byte to flag and encode non-standard moves, such as promotions, castlings, pawn double-pushes, e.p. captures. In the hash table I store this extra byte rather than the real to-square, as the to-square can be obtained from the flags byte by a simple table lookup. (The off-board squares coding for special moves).

Greg Strong · Post by **Greg Strong** » Sun Jan 24, 2010 7:56 pm

A couple of thoughts... For ordering non-captures, using the piece-square-table values works pretty well. (value of move = value of target square - value of origin square.) Also, I'd recommend not sorting at all. When it's time to pick the next move, scan the list and grab the one with the best value. Then replace it with the element from the end of the list and shorten the list so that you don't look at it again. The advantage of this is that if you get an early cutoff, you've saved the effort of sorting the list...

Fguy64 · Post by **Fguy64** » Sun Jan 24, 2010 8:11 pm

hgm wrote:In HaQiKi D I use a move stack that consists of blocks of ~200 moves, one block per ply level. So I have a ' frame pointer' into this stack that is incremented by 200 each time I enter search. I then start two stacks from this frame pointer, one for the non-captures that grows upward, another for the capture that grows downward. That way I get a single contiguous block of moves, (from firstCapture to lastNonCapture) with the captures leading. I then only sort the part (firstCapture to framePointer).

I also put the sort key in the high byte of the move word. I use the spare byte to flag and encode non-standard moves, such as promotions, castlings, pawn double-pushes, e.p. captures. In the hash table I store this extra byte rather than the real to-square, as the to-square can be obtained from the flags byte by a simple table lookup. (The off-board squares coding for special moves).

hmm, I'm missing something here. If you add captures from the top, and non-captures from the bottom. how is it that you can get a single contiguous block of moves without knowing in advance how many moves you have?

hgm · Post by **hgm** » Sun Jan 24, 2010 8:18 pm

I only use the move-by-move extraction for the captures, as there are never very many. Then I first extract the move with the best MVV/LVA key. If that happens to be a HxL capture, I run SEE on them, and if it turns out to be a bad capture, I lower the key to some value that will be picked again even after captures of a Pawn (and at that time it will not be subjected to SEE again).

When I am done with all captures, I make one pass through all non-captures, to score them according to PST, and swap all those above a certain threshold to the front. In the process I also identify killers. A killer is tried immediately. Then I run a qsort on the moves with keys above the threshold.

hgm · Post by **hgm** » Sun Jan 24, 2010 8:20 pm

Fguy64 wrote:hmm, I'm missing something here. If you add captures from the top, and non-captures from the bottom. how is it that you can get a single contiguous block of moves without knowing in advance how many moves you have?

The block is chosen so large, that I never will run into the edge. (That would only happen if number of captures of this ply + number of non-captures from previous ply > 200.)

bob · Post by **bob** » Mon Jan 25, 2010 12:33 am

hgm wrote:In HaQiKi D I use a move stack that consists of blocks of ~200 moves, one block per ply level. So I have a ' frame pointer' into this stack that is incremented by 200 each time I enter search. I then start two stacks from this frame pointer, one for the non-captures that grows upward, another for the capture that grows downward. That way I get a single contiguous block of moves, (from firstCapture to lastNonCapture) with the captures leading. I then only sort the part (firstCapture to framePointer).

I also put the sort key in the high byte of the move word. I use the spare byte to flag and encode non-standard moves, such as promotions, castlings, pawn double-pushes, e.p. captures. In the hash table I store this extra byte rather than the real to-square, as the to-square can be obtained from the flags byte by a simple table lookup. (The off-board squares coding for special moves).

Just a note, but for chess you need at _least_ 218 entries. There is an existing position that has this many legal moves. And there is no proof that there is not one with a larger number. I don't have any "fixed" size n Crafty, just one big move list with a pointer to the start of the list used at ply 1, and as I generate ply-1 moves, I adjust the ply-2 pointer to point one beyond the end so that it is ready for the next ply to generate moves. This way I have a total move count limit, which I think is around 5,000. I've checked many times and have never come anywhere near filling that up.

MattieShoes · Post by **MattieShoes** » Mon Jan 25, 2010 7:36 am

bob wrote:
hgm wrote:In HaQiKi D I use a move stack that consists of blocks of ~200 moves, one block per ply level. So I have a ' frame pointer' into this stack that is incremented by 200 each time I enter search. I then start two stacks from this frame pointer, one for the non-captures that grows upward, another for the capture that grows downward. That way I get a single contiguous block of moves, (from firstCapture to lastNonCapture) with the captures leading. I then only sort the part (firstCapture to framePointer).

I also put the sort key in the high byte of the move word. I use the spare byte to flag and encode non-standard moves, such as promotions, castlings, pawn double-pushes, e.p. captures. In the hash table I store this extra byte rather than the real to-square, as the to-square can be obtained from the flags byte by a simple table lookup. (The off-board squares coding for special moves).
Just a note, but for chess you need at _least_ 218 entries. There is an existing position that has this many legal moves. And there is no proof that there is not one with a larger number. I don't have any "fixed" size n Crafty, just one big move list with a pointer to the start of the list used at ply 1, and as I generate ply-1 moves, I adjust the ply-2 pointer to point one beyond the end so that it is ready for the next ply to generate moves. This way I have a total move count limit, which I think is around 5,000. I've checked many times and have never come anywhere near filling that up.

And in crazyhouse it is far worse!

hgm · Post by **hgm** » Mon Jan 25, 2010 8:28 am

bob wrote:Just a note, but for chess you need at _least_ 218 entries.

Indeed, but HaQiKi D is a Xiangqi engine.

In Xiangqi there are no promotions, so the maximum number of moves can be easily calculated, and indeed that maximum can be easily realized. It is 111. (4x17=68 for the Rooks and Cannons, 2x8=16 for the Knights, 5x3=15 for the Pawns, 4+2=6 for the Elephants, and 4+2=6 for one Adviser (on e1) plus a King.)

So ~200 is ample safety margin. For Chess I would probably take ~300.

The number of moves in a typical in Xiangqi and Chess is not very different. A cache line can hold 16 moves, and typically 3-4 cache lines will be enough to hold all the moves. I aligned he stack such that there is still room for 14 captures in the cache line where the frame Pointer is pointing. That leaves room for 34 non-captures in 3 cache lines, and 50 non-captures in 4 cache lines.

On Intel machines a cache way is 4KB, i.e. 64 cache lines. So making the stack frame exactly 12 cache lines (192 moves) would give it an offset of 4 cache lines after one wrap, which is just about the size of the typically used part. So even at 36 ply there would never be any collision of move-stack memory in L1.

I keep the stack for other search variables in sync with this, so that these variables would only start to collide with the move stack of 18 ply closer to the root. That means that even if the local search variales and the move stack have to share one cache way, there are no collissions in any nodes close to the leaves.

Actually, today's caches are so large, that it hardly matters. The only fun is to design it for efficient running on a Pentium IV, which has only an 2-way 8KB L1 cache. There you must make sure the move stack and loca-variables stack can share one cache way with the gobal core data (board, piece list, PST and Zobrist tables) without colliding, so the second cache way is available for occasional memory accesses (hash probes). That is a real challenge.

Storage and ordering strategy in move generation

Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation

Re: Storage and ordering strategy in move generation