CPW Move Table compression

Discussion of chess software programming and technical issues.

Moderator: Ras

ZirconiumX
Posts: 1361
Joined: Sun Jul 17, 2011 11:14 am
Full name: Hannah Ravensloft

CPW Move Table compression

Post by ZirconiumX »

This isn't a thought of mine, but merely an improvement on what the CPW offers. It is slightly more expensive due to the decompression, but works none the less.

Instead of having:

Code: Select all

struct SlidingMoveNode
{
  TSquare tosq;
  SMove   move;
  SlidingMoveNode* pNext[2];
};

SlidingMoveNode  slidingMoveTable[1456+896+560]; /* queen, rook, bishop ->  2912 * 16 = 46592 bytes 45.5 Kib */
SlidingMoveNode* slidingMoveHead[3][64];
which is slightly ugly IMO. What if you had:

Code: Select all

U64 slidingMoveTable[1456+896+560]; /* 2912 * 8 = 23296 22 KiB */
int slidingMoveHead[3][64];
Here slidingMoveHead tells us the position of the move in the table.

I use a U64 instead of a struct because it allows us to do some relatively cheap decompression. My encoding is this:

Byte 0 - the to square plus two filler bits
Byte 1-2 - the next index for the table if this square is empty
Byte 3-4 - the next index for the table if this square has a piece on it (i.e. next direction)
Byte 5-7 - filler bits - perhaps you could use them as a REBEL-type attack table (probably not cheap)

In essence it is just a version of the above with no pointers and about half the size.

Matthew:out
tu ne cede malis, sed contra audentior ito
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: CPW Move Table compression

Post by Gerd Isenberg »

Hi Matthew,

sure that would be possible and desirable in 64-bit mode, where pointers need more space. Even with indices and zero as valid head index, one may still use zero index as sentinel value to terminate the loop. With some more twiddling, one may even go with a 32-bit structure 12:12:2:6 or so I guess. The 2912 range for the indices requires 12 bits. Instead of indexing next from an array in memory by isPiece, one may shift right the 32-bit value by (isPiece*12 + 8, value from a 0x1c piece code mask of the snelboard) and then mask with 0xfff for the next index if any, the target square and possibly 2-bit sliding piece type are inside the lower byte.

Best,
Gerd
User avatar
hgm
Posts: 28464
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: CPW Move Table compression

Post by hgm »

The most efficient storage scheme on i86/x64 architectures would actually be to separate the fields of the struct rather than pack them:

Code: Select all

int successorEmpty[SIZE];
int successorOccupied[SIZE];
Smove moves[SIZE];
Tsquare toSquares[SIZE];
Then the node number has only to be multiplied with a power of two, and fillers are not needed. And neither is unpacking. All elements can be accessed thouh the addressing mode CONSTANT + INDEX * SCALE (SCALE = 1, 2, 4 or 8), causing zero CPU load as it is done in the AGUs.

The main disadvantage is that it uses full-width ints (32 bits), where short ints (16-bit) would have been sufficient. Most compilers handle short ints very inefficiently compared to char and int, though. Packing the two successors in a 32-bit int is a possibility, but it requires an extra &0xFFFF or >>32 operation to use the Empty and Occupied successor index, respectively.

Perhaps the best solution would be to index the successors relative compared to the current node, and lay them out in such an order that you never need to jump more than 256. Then the successors could be (unsigned) char.

I am always a bit suspicious towards solutions that use large (L1-overflowing) tables. On modern CPUs, caching is really the speed-determining factor. So my instinct tells me it would be better to go for a move table that does not try to 'predict' when it will run into the board edge (because that would need a different table for each from-square), but just let it run into the boundary guards, and treat it as an occupied square. It would reduce the amount of table space required by a factor 64. This might well make up for more than the occasional extra iteration at the end of a ray. Long rays almost never run into the board edge anyway.