Black magic bitboards

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Black magic bitboards

Post by Gerd Isenberg »

AlvaroBegue wrote:Let's see. The difference between (occ & mask) and (occ & ~mask) is precisely ~mask. So the traditional formula is `(occ & mask) * magic' and the new one is `((occ & mask) + Constant) * black_magic'. I believe any magic multiplier will work as a black_magic multiplier and vice versa.

So there is virtually no difference with the traditional magics, as far as I can tell.
I had the same objections. My vague idea now is that "moving" the index of the empty board occupancy from zero (0) more to the center of the whole 12-bit index range per square, 4096, makes it easier to overlap/pack the sparse arrays of different squares more efficiently ...
User avatar
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: Black magic bitboards

Post by Volker Annuss »

Let's look at the smallest example from my magics found, a bishop on square 9 (usually this is b7, but possibly also b2, g2 or g7). The mask for this sqare is 0x0040201008040000.

With

Code: Select all

index = ((occ | ~mask) * 0x083e3ee040080801) >> 55
we get an index in the range {488..504}.

With

Code: Select all

index = ((occ & mask) * 0x083e3ee040080801) >> 55
we get a bad collision for occ == 0 and occ == 0x0040000000000000 (usually a piece at g2) at index == 0.

Your constant C is

Code: Select all

0xFFBFDFEFF7FBFFFF*0x083e3ee040080801 == 0xF87FE0AF97F3F7FF
The multiplication gives

Code: Select all

0x0040000000000000*0x083e3ee040080801 == 0x0040000000000000
Adding C gives

Code: Select all

0x0040000000000000+0xF87FE0AF97F3F7FF == 0xF8BFE0AF97F3F7FF
with an overflow into the top 9 bits.

I have not found any white magic factor that gives a lookup table as compact as 17 consecutive positions and i believe there is no such one. Maybe I could find one that uses the ranges {0..n} and {495+n..511}

The sum of the sizes of my most compact lookup tables is

Code: Select all

black magic rook   99511*8
white magic rook  113490*8
black magic bishop  4625*8
white magic bishop  5773*8
and I still used more CPU time on finding white magics than on finding black ones.

Unfortunately when combining the per square tables to the overlapping table, the advantage of black magics is much less. Now I have a table size of 88316*8 for black magics and 88772*8 for white ones.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Black magic bitboards

Post by AlvaroBegue »

OK, now I finally understand the point. So it's a micro-optimization that will not change anything in practical terms for chess programming, but it's somewhat entertaining for some people as a challenge.

Since you have the infrastructure to test these things, would you be able to test one variation for me? (((occ & mask) * magic) + 0x8000000000000000ull) >> (64 - n)

Whether a magic multiplier works or not doesn't depend on the presence of the constant, but the resulting offsets might be in a narrower range, because we have shifted 0 to the middle of the range.
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Black magic bitboards

Post by Gerd Isenberg »

Ok, thanks. So with black magics it is easier to find factors with smaller index range (maxindex - minindex) per square, mainly because minindex is greater than zero opposed to white magics with zero for empty boards. However, since the smaller black table is denser or has smaller gaps at it tails, it is harder to merge/overlap with other square tables, so the overall win is less than the individual gains have promised.

How did you come to the idea of using black magics? Was it already used in Hermann or more recently tried in Arminius?
User avatar
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: Black magic bitboards

Post by Volker Annuss »

Gerd Isenberg wrote:How did you come to the idea of using black magics? Was it already used in Hermann or more recently tried in Arminius?
There were three things that came together for the idea.

1st I recently installed openSUSE 64 bit on my raspberry pi 3 and I wanted to test how fast my magic generator is on that machine. Optimizing the overlapping heavyly relies on testing if vectors of 64 bit numbers fit together. A set bit stands for a used table entry, a not set bit stands for an unused entry.

2nd there was this thread asking for dense magics that made me think about how magics work again.

And last but not least holidays, giving me some time to try it out. The changes to the optimizer were not that big, but it took some time to get rid of old optimized code that relied on the assumption that index == 0 is always used.

The changes are in Arminius since yesterday.
User avatar
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: Black magic bitboards

Post by Volker Annuss »

AlvaroBegue wrote:OK, now I finally understand the point. So it's a micro-optimization that will not change anything in practical terms for chess programming, but it's somewhat entertaining for some people as a challenge.
It was intended to get a smaller table, but since the size of the table does not drop significantly any more, I think you are right.[/quote]
Since you have the infrastructure to test these things, would you be able to test one variation for me? (((occ & mask) * magic) + 0x8000000000000000ull) >> (64 - n)

Whether a magic multiplier works or not doesn't depend on the presence of the constant, but the resulting offsets might be in a narrower range, because we have shifted 0 to the middle of the range.
The test of your variation is running. First results indicate, that it gives less good collisions to exploit and less big gaps of unused entries. Interesting enough to use some more CPU time on it to come closer to the optimum.
User avatar
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: Black magic bitboards

Post by Volker Annuss »

AlvaroBegue wrote:Since you have the infrastructure to test these things, would you be able to test one variation for me? (((occ & mask) * magic) + 0x8000000000000000ull) >> (64 - n)
That gave me a lookup table of size 90995*8. The sizes for the most compact per square tables without overlapping are 4767*8 for bishops and 101308*8 for rooks.

There are more gaps in the overlapped table than with the other methods.
niklasf
Posts: 42
Joined: Sat May 16, 2015 11:41 pm

Re: Black magic bitboards

Post by niklasf »

Very intresting idea. I tried shaving off a few bytes by spending more CPU time to find magics with minimal index range (and replacing them in your packing, or just tacking them on at the end).

It was barely worth it, saving only 519 entries, but here you go anyway:

Code: Select all

uint64_t lookup_table[87988];

bishop_magics[64] =
{
    { 0xa7020080601803d8ull, 60984 },
    { 0x13802040400801f1ull, 66046 },
    { 0x0a0080181001f60cull, 32910 },
    { 0x1840802004238008ull, 16369 },
    { 0xc03fe00100000000ull, 42115 },
    { 0x24c00bffff400000ull,   835 },
    { 0x0808101f40007f04ull, 18910 },
    { 0x100808201ec00080ull, 25911 },
    { 0xffa2feffbfefb7ffull, 63301 },
    { 0x083e3ee040080801ull, 16063 },
    { 0xc0800080181001f8ull, 17481 },
    { 0x0440007fe0031000ull, 59361 },
    { 0x2010007ffc000000ull, 18735 },
    { 0x1079ffe000ff8000ull, 61249 },
    { 0x3c0708101f400080ull, 68938 },
    { 0x080614080fa00040ull, 61791 },
    { 0x7ffe7fff817fcff9ull, 21893 },
    { 0x7ffebfffa01027fdull, 62068 },
    { 0x53018080c00f4001ull, 19829 },
    { 0x407e0001000ffb8aull, 26091 },
    { 0x201fe000fff80010ull, 15815 },
    { 0xffdfefffde39ffefull, 16419 },
    { 0xcc8808000fbf8002ull, 59777 },
    { 0x7ff7fbfff8203fffull, 16288 },
    { 0x8800013e8300c030ull, 33235 },
    { 0x0420009701806018ull, 15459 },
    { 0x7ffeff7f7f01f7fdull, 15863 },
    { 0x8700303010c0c006ull, 75555 },
    { 0xc800181810606000ull, 79445 },
    { 0x20002038001c8010ull, 15917 },
    { 0x087ff038000fc001ull,  8512 },
    { 0x00080c0c00083007ull, 73069 },
    { 0x00000080fc82c040ull, 16078 },
    { 0x000000407e416020ull, 19168 },
    { 0x00600203f8008020ull, 11056 },
    { 0xd003fefe04404080ull, 62544 },
    { 0xa00020c018003088ull, 80477 },
    { 0x7fbffe700bffe800ull, 75049 },
    { 0x107ff00fe4000f90ull, 32947 },
    { 0x7f8fffcff1d007f8ull, 59172 },
    { 0x0000004100f88080ull, 55845 },
    { 0x00000020807c4040ull, 61806 },
    { 0x00000041018700c0ull, 73601 },
    { 0x0010000080fc4080ull, 15546 },
    { 0x1000003c80180030ull, 45243 },
    { 0xc10000df80280050ull, 20333 },
    { 0xffffffbfeff80fdcull, 33402 },
    { 0x000000101003f812ull, 25917 },
    { 0x0800001f40808200ull, 32875 },
    { 0x084000101f3fd208ull,  4639 },
    { 0x080000000f808081ull, 17077 },
    { 0x0004000008003f80ull, 62324 },
    { 0x08000001001fe040ull, 18159 },
    { 0x72dd000040900a00ull, 61436 },
    { 0xfffffeffbfeff81dull, 57073 },
    { 0xcd8000200febf209ull, 61025 },
    { 0x100000101ec10082ull, 81259 },
    { 0x7fbaffffefe0c02full, 64083 },
    { 0x7f83fffffff07f7full, 56114 },
    { 0xfff1fffffff7ffc1ull, 57058 },
    { 0x0878040000ffe01full, 58912 },
    { 0x945e388000801012ull, 22194 },
    { 0x0840800080200fdaull, 70880 },
    { 0x100000c05f582008ull, 11140 },
};

rook_magics[64] =
{
    { 0x80280013ff84ffffull, 10890 },
    { 0x5ffbfefdfef67fffull, 50579 },
    { 0xffeffaffeffdffffull, 62020 },
    { 0x003000900300008aull, 67322 },
    { 0x0050028010500023ull, 80251 },
    { 0x0020012120a00020ull, 58503 },
    { 0x0030006000c00030ull, 51175 },
    { 0x0058005806b00002ull, 83130 },
    { 0x7fbff7fbfbeafffcull, 50430 },
    { 0x0000140081050002ull, 21613 },
    { 0x0000180043800048ull, 72625 },
    { 0x7fffe800021fffb8ull, 80755 },
    { 0xffffcffe7fcfffafull, 69753 },
    { 0x00001800c0180060ull, 26973 },
    { 0x4f8018005fd00018ull, 84972 },
    { 0x0000180030620018ull, 31958 },
    { 0x00300018010c0003ull, 69272 },
    { 0x0003000c0085ffffull, 48372 },
    { 0xfffdfff7fbfefff7ull, 65477 },
    { 0x7fc1ffdffc001fffull, 43972 },
    { 0xfffeffdffdffdfffull, 57154 },
    { 0x7c108007befff81full, 53521 },
    { 0x20408007bfe00810ull, 30534 },
    { 0x0400800558604100ull, 16548 },
    { 0x0040200010080008ull, 46407 },
    { 0x0010020008040004ull, 11841 },
    { 0xfffdfefff7fbfff7ull, 21112 },
    { 0xfebf7dfff8fefff9ull, 44214 },
    { 0xc00000ffe001ffe0ull, 57925 },
    { 0x4af01f00078007c3ull, 29574 },
    { 0xbffbfafffb683f7full, 17309 },
    { 0x0807f67ffa102040ull, 40143 },
    { 0x200008e800300030ull, 64659 },
    { 0x0000008780180018ull, 70469 },
    { 0x0000010300180018ull, 62917 },
    { 0x4000008180180018ull, 60997 },
    { 0x008080310005fffaull, 18554 },
    { 0x4000188100060006ull, 14385 },
    { 0xffffff7fffbfbfffull,     0 },
    { 0x0000802000200040ull, 38091 },
    { 0x20000202ec002800ull, 25122 },
    { 0xfffff9ff7cfff3ffull, 60083 },
    { 0x000000404b801800ull, 72209 },
    { 0x2000002fe03fd000ull, 67875 },
    { 0xffffff6ffe7fcffdull, 56290 },
    { 0xbff7efffbfc00fffull, 43807 },
    { 0x000000100800a804ull, 73365 },
    { 0x6054000a58005805ull, 76398 },
    { 0x0829000101150028ull, 20024 },
    { 0x00000085008a0014ull,  9513 },
    { 0x8000002b00408028ull, 24324 },
    { 0x4000002040790028ull, 22996 },
    { 0x7800002010288028ull, 23213 },
    { 0x0000001800e08018ull, 56002 },
    { 0xa3a80003f3a40048ull, 22809 },
    { 0x2003d80000500028ull, 44545 },
    { 0xfffff37eefefdfbeull, 36072 },
    { 0x40000280090013c1ull,  4750 },
    { 0xbf7ffeffbffaf71full,  6014 },
    { 0xfffdffff777b7d6eull, 36054 },
    { 0x48300007e8080c02ull, 78538 },
    { 0xafe0000fff780402ull, 28745 },
    { 0xee73fffbffbb77feull,  8555 },
    { 0x0002000308482882ull,  1009 },
};
Just in case they are useful for anyone: Here are all the dense magics (of which I ended up using only a few).

Code: Select all

// width without overlapping: 4606
bishops[64] = {
    0xa7020080601803d8ull,
    0x13802040400801f1ull,
    0x0a0080181001f60cull,
    0x1840802004238008ull,
    0x581fc40182094560ull,
    0xa0a01f0100824001ull,
    0x3ae7f04002008258ull,
    0x200c10101ec00080ull,
    0xb69d010040104803ull,
    0x833c802040102401ull,
    0xc0800080181001f8ull,
    0x0442008020048040ull,
    0x3253203c01880043ull,
    0xccf84020010081e4ull,
    0x3c0708101f400080ull,
    0x0d0614080fa00040ull,
    0xf300c0007e806002ull,
    0x41bea00040102002ull,
    0x53018080c00f4001ull,
    0xb00020820080104dull,
    0x0060200100080842ull,
    0x2100100021c58023ull,
    0xcc8808000fbf8002ull,
    0x1870060008300060ull,
    0x8800013e8300c030ull,
    0x0420009701806018ull,
    0x020003fa00802080ull,
    0x8700303010c0c006ull,
    0xc800181810606000ull,
    0x2a002038201c8010ull,
    0x20007004b80fc00eull,
    0x90c80c0c10083003ull,
    0x73600080cd02c040ull,
    0xdb60004066416020ull,
    0x83a80101f6010010ull,
    0x008b001c70080080ull,
    0xa00020c018003088ull,
    0x99001c600e001e30ull,
    0x84002007e0d7f078ull,
    0x0400003007f3f008ull,
    0x0800004100f90080ull,
    0x0220002080748040ull,
    0x2ba40080818300c0ull,
    0x1e23040080fb4080ull,
    0xa800003c80180030ull,
    0xc10000df80280050ull,
    0x8b8000200813f024ull,
    0x398000200405f812ull,
    0xc20000203e808182ull,
    0xc538000fa04032a5ull,
    0x800404000f808010ull,
    0x0005802808003f62ull,
    0xd0880001001fe05bull,
    0x72dd000040900a00ull,
    0x55003ee010200804ull,
    0xcd8000200febf209ull,
    0x20c000101ec10062ull,
    0x0a5f7c00101f4029ull,
    0x22048800000f8080ull,
    0x4a0c81005008003full,
    0x090a010001002020ull,
    0x945e388000801012ull,
    0xe73c000040400826ull,
    0xa500008060400fd8ull,
};

// width without overlapping: 97178
rooks[64] = {
    0x80500104d8000230ull,
    0x8018007400860002ull,
    0x8030010086000090ull,
    0x006006008120006cull,
    0x0050028010500023ull,
    0x003000c001900030ull,
    0x00300060003000c0ull,
    0x0058005806b00002ull,
    0x508028010114000aull,
    0x08000c0086000300ull,
    0x0000080401020008ull,
    0x0000200200200400ull,
    0x10003001c030000full,
    0x99001800c0180060ull,
    0x4f8018005fd00018ull,
    0x3580180030620018ull,
    0x10300018010c0008ull,
    0x2906000c00830004ull,
    0x8002000804010008ull,
    0x1804002002002004ull,
    0x0102002001002002ull,
    0x2211001000801040ull,
    0x4000004040008001ull,
    0x612c8007c007f020ull,
    0x1040200010080008ull,
    0x0100080010040011ull,
    0xa004010008020008ull,
    0x0000040020200200ull,
    0x2320020020010020ull,
    0x4af01f00078007c3ull,
    0x0a00008020200040ull,
    0x240fc00040000080ull,
    0x200008e800300031ull,
    0x0000080400100011ull,
    0x3400400880410010ull,
    0x4000200400200200ull,
    0x8800200100200200ull,
    0xa8001fc080070007ull,
    0x4000200080200040ull,
    0x2000802000200040ull,
    0x1c000202ec002800ull,
    0x0183000086000c00ull,
    0x703ffdc008004010ull,
    0x0020040002002020ull,
    0x290000302e803003ull,
    0x0400010000802020ull,
    0x140000100800a804ull,
    0x6054000a58005805ull,
    0x0829000101150028ull,
    0x1d00008a00850014ull,
    0x2b40007ec6030018ull,
    0xc020003da01e0020ull,
    0x0020002030018030ull,
    0x4080001800df8018ull,
    0xa3a80003f3a40048ull,
    0x9843d80000500028ull,
    0x000000804110102eull,
    0x52000080086eff41ull,
    0x150000564012ff8bull,
    0x024000100c080fd2ull,
    0x48300007e8080c02ull,
    0x2200000370040802ull,
    0x7100000400437802ull,
    0x00a2000828884302ull,
};
syzygy
Posts: 5608
Joined: Tue Feb 28, 2012 11:56 pm

Re: Black magic bitboards

Post by syzygy »

I have just implemented this in Cfish:
https://github.com/syzygy1/Cfish/commit ... 067c1c4995

Unfortunately it is slightly slower on my machine (i7-3930K Sandybridge) than the plain magic approach based on Volker's white magics. This is a bit surprising, since the array size goes down from 89524 to 87988 entries.

But now I tried on my laptop, and black magics wins there. Interesting.
Volker Annuss wrote:Now I have a table size of 88316*8 for black magics and 88772*8 for white ones.
Then I am interested in your latest white magics. Did you post them somewhere, or would you mind posting them here?

Results on my Kabylake laptop for "cfish bench 128 1 17 >/dev/null" (nps):

Code: Select all

fancy magics   2321090
white magics   2343727
black magics   2353732
plain bmi2     2355310
fancy bmi2     2364420
These are all the best results after several tries (and with cpu governor set to performance and the process bound to a particular cpu). Gaussian distribution does not apply here.

The fancy magics and plain bmi2 are essentially those from Stockfish. The fancy bmi2 implementation uses 200K rook tables.
syzygy
Posts: 5608
Joined: Tue Feb 28, 2012 11:56 pm

Re: Black magic bitboards

Post by syzygy »

syzygy wrote:Results on my Kabylake laptop for "cfish bench 128 1 17 >/dev/null" (nps):

Code: Select all

fancy magics   2321090
white magics   2343727
black magics   2353732
plain bmi2     2355310
fancy bmi2     2364420
Redirecting stderr to a file seems to remove some further disturbances. I now get this:

Code: Select all

fancy magics   2337686
white magics   2362632
black magics   2362235
plain bmi2     2372801
fancy bmi2     2381846
Again I keep the best results (least disturbed by random OS stuff).

It seems white magics and black magics are virtually indistinguishable in performance.

BMI2 is still better, but Volker's overlapping-table approach gets pretty close.