smaller tables for PEXT-style attack getters

wgarvin · Post by **wgarvin** » Wed Jan 13, 2016 11:36 pm

A half-baked idea just came to me, but maybe Gerd or someone, will pick it up and run with it.

I was thinking about PEXT/PDEP for rook or bishop attacks, and it occurred to me that you could use two different masks for the same square (i.e. instead of a table lookup of up to 12 bits, you'd have two table lookups using up to 6 inputs bits, and producing 8 bits of output--in other words, only one byte per table entry!) The tables would be small enough to virtually always hit in the cache. The two PEXTs before the lookup and the two PDEPs at the end, could be done in parallel. So it would only cost a few cycles more, which might be offset by fewer/cheaper cache misses.

To take this idea a bit further, confining the input/output bits of each table to a single rank/file or a single diagonal would also make most of the tables shareable. (i.e. for input square in a given file, you would not need 8 separate tables for each of the 8 ranks, you'd only need 1 such table). In fact the *rank* tables would also work just fine as *file* tables, saving another factor of two. [edit: and maybe some of the same tables could be used for bishop diagonals, if the upper index bits were zeroes?]

I haven't worked out the details, but it wouldn't surprise me if the complete tables for rooks would fit easily in L1 cache.

wgarvin · Post by **wgarvin** » Wed Jan 13, 2016 11:55 pm

For rooks I guess you'd need 64*(4*sizeof(BB) + 2*sizeof(PTR)), plus the 8 occupancy lookup tables which are 64 bytes each. So that adds up to a measly 3.5 KB on a 64bit machine. Bishops might need more occupancy tables, I am not sure. Even with a pessimistic assumption of needing 64 occupancy tables, that would still be a mere 7 KB for the bishops. And I suspect there are fewer unique tables.

[Edit: since the tables are so small anyway, you don't need 2 full pointers per square--two 16- or 32-bit offsets would easily be enough. That reduces rooks to 3 KB total, and bishops to 6.5 KB or probably less. That seems pretty tiny!]

jwes · Post by **jwes** » Thu Jan 14, 2016 4:09 am

wgarvin wrote:For rooks I guess you'd need So that adds up to a measly 3.5 KB on a 64bit machine. Bishops might need more occupancy tables, I am not sure. Even with a pessimistic assumption of needing 64 occupancy tables, that would still be a mere 7 KB for the bishops. And I suspect there are fewer unique tables.

[Edit: since the tables are so small anyway, you don't need 2 full pointers per square--two 16- or 32-bit offsets would easily be enough. That reduces rooks to 3 KB total, and bishops to 6.5 KB or probably less. That seems pretty tiny!]

I use 64*(2*sizeof(BB)), plus the 8 occupancy lookup tables which are 64 bytes each for rooks. I could reduce that to 8*(2*sizeof(BB)) at the cost of an array lookup. My code is:

Code: Select all

		c = _pext_u64(board, sq[square].masks[0] & 0x7e7e7e7e7e7e7e7eull);  // mask off bit 0 and 7 from rank mask
		c = oclkup[sq[square].p0][c];  // p0 is file of square
		tboard = _pdep_u64(c, sq[piece[5].sq].masks[0]);
		c = _pext_u64(bo.board, sq[piece[5].sq].masks[1] & 0xffffffffffff00ull);  // mask off bit 0 and 7 from file mask
		c = oclkup[sq[square].p90][c];  // p90 is rank of square
		tboard |= _pdep_u64(c, sq[piece[5].sq].masks[1]);

jwes · Post by **jwes** » Thu Jan 14, 2016 3:12 pm

wgarvin wrote:For rooks I guess you'd need 64*(4*sizeof(BB) + 2*sizeof(PTR)), plus the 8 occupancy lookup tables which are 64 bytes each. So that adds up to a measly 3.5 KB on a 64bit machine. Bishops might need more occupancy tables, I am not sure. Even with a pessimistic assumption of needing 64 occupancy tables, that would still be a mere 7 KB for the bishops. And I suspect there are fewer unique tables.

[Edit: since the tables are so small anyway, you don't need 2 full pointers per square--two 16- or 32-bit offsets would easily be enough. That reduces rooks to 3 KB total, and bishops to 6.5 KB or probably less. That seems pretty tiny!]

I haven't implemented this yet, but it looks like you could use the same occupancy tables for ranks, files, and diagonals. Shorter diagonals might have spurious attack bits, but those bits would be ignored by the PDEP.

jwes · Post by **jwes** » Sat Jan 16, 2016 2:01 am

jwes wrote:
wgarvin wrote:For rooks I guess you'd need 64*(4*sizeof(BB) + 2*sizeof(PTR)), plus the 8 occupancy lookup tables which are 64 bytes each. So that adds up to a measly 3.5 KB on a 64bit machine. Bishops might need more occupancy tables, I am not sure. Even with a pessimistic assumption of needing 64 occupancy tables, that would still be a mere 7 KB for the bishops. And I suspect there are fewer unique tables.

[Edit: since the tables are so small anyway, you don't need 2 full pointers per square--two 16- or 32-bit offsets would easily be enough. That reduces rooks to 3 KB total, and bishops to 6.5 KB or probably less. That seems pretty tiny!]
I haven't implemented this yet, but it looks like you could use the same occupancy tables for ranks, files, and diagonals. Shorter diagonals might have spurious attack bits, but those bits would be ignored by the PDEP.

I tried it and it seems to work fine.

smaller tables for PEXT-style attack getters

smaller tables for PEXT-style attack getters

Re: smaller tables for PEXT-style attack getters

Re: smaller tables for PEXT-style attack getters

Re: smaller tables for PEXT-style attack getters

Re: smaller tables for PEXT-style attack getters