I had the same objections. My vague idea now is that "moving" the index of the empty board occupancy from zero (0) more to the center of the whole 12bit index range per square, 4096, makes it easier to overlap/pack the sparse arrays of different squares more efficiently ...AlvaroBegue wrote:Let's see. The difference between (occ & mask) and (occ & ~mask) is precisely ~mask. So the traditional formula is `(occ & mask) * magic' and the new one is `((occ & mask) + Constant) * black_magic'. I believe any magic multiplier will work as a black_magic multiplier and vice versa.
So there is virtually no difference with the traditional magics, as far as I can tell.
Black magic bitboards
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

 Posts: 2105
 Joined: Wed Mar 08, 2006 7:47 pm
 Location: Hattingen, Germany
Re: Black magic bitboards

 Posts: 163
 Joined: Mon Sep 03, 2007 7:15 am
Re: Black magic bitboards
Let's look at the smallest example from my magics found, a bishop on square 9 (usually this is b7, but possibly also b2, g2 or g7). The mask for this sqare is 0x0040201008040000.
With we get an index in the range {488..504}.
With we get a bad collision for occ == 0 and occ == 0x0040000000000000 (usually a piece at g2) at index == 0.
Your constant C is
The multiplication gives
Adding C giveswith an overflow into the top 9 bits.
I have not found any white magic factor that gives a lookup table as compact as 17 consecutive positions and i believe there is no such one. Maybe I could find one that uses the ranges {0..n} and {495+n..511}
The sum of the sizes of my most compact lookup tables isand I still used more CPU time on finding white magics than on finding black ones.
Unfortunately when combining the per square tables to the overlapping table, the advantage of black magics is much less. Now I have a table size of 88316*8 for black magics and 88772*8 for white ones.
With
Code: Select all
index = ((occ  ~mask) * 0x083e3ee040080801) >> 55
With
Code: Select all
index = ((occ & mask) * 0x083e3ee040080801) >> 55
Your constant C is
Code: Select all
0xFFBFDFEFF7FBFFFF*0x083e3ee040080801 == 0xF87FE0AF97F3F7FF
Code: Select all
0x0040000000000000*0x083e3ee040080801 == 0x0040000000000000
Code: Select all
0x0040000000000000+0xF87FE0AF97F3F7FF == 0xF8BFE0AF97F3F7FF
I have not found any white magic factor that gives a lookup table as compact as 17 consecutive positions and i believe there is no such one. Maybe I could find one that uses the ranges {0..n} and {495+n..511}
The sum of the sizes of my most compact lookup tables is
Code: Select all
black magic rook 99511*8
white magic rook 113490*8
black magic bishop 4625*8
white magic bishop 5773*8
Unfortunately when combining the per square tables to the overlapping table, the advantage of black magics is much less. Now I have a table size of 88316*8 for black magics and 88772*8 for white ones.

 Posts: 884
 Joined: Tue Mar 09, 2010 2:46 pm
 Location: New York
Re: Black magic bitboards
OK, now I finally understand the point. So it's a microoptimization that will not change anything in practical terms for chess programming, but it's somewhat entertaining for some people as a challenge.
Since you have the infrastructure to test these things, would you be able to test one variation for me? (((occ & mask) * magic) + 0x8000000000000000ull) >> (64  n)
Whether a magic multiplier works or not doesn't depend on the presence of the constant, but the resulting offsets might be in a narrower range, because we have shifted 0 to the middle of the range.
Since you have the infrastructure to test these things, would you be able to test one variation for me? (((occ & mask) * magic) + 0x8000000000000000ull) >> (64  n)
Whether a magic multiplier works or not doesn't depend on the presence of the constant, but the resulting offsets might be in a narrower range, because we have shifted 0 to the middle of the range.

 Posts: 2105
 Joined: Wed Mar 08, 2006 7:47 pm
 Location: Hattingen, Germany
Re: Black magic bitboards
Ok, thanks. So with black magics it is easier to find factors with smaller index range (maxindex  minindex) per square, mainly because minindex is greater than zero opposed to white magics with zero for empty boards. However, since the smaller black table is denser or has smaller gaps at it tails, it is harder to merge/overlap with other square tables, so the overall win is less than the individual gains have promised.
How did you come to the idea of using black magics? Was it already used in Hermann or more recently tried in Arminius?
How did you come to the idea of using black magics? Was it already used in Hermann or more recently tried in Arminius?

 Posts: 163
 Joined: Mon Sep 03, 2007 7:15 am
Re: Black magic bitboards
There were three things that came together for the idea.Gerd Isenberg wrote:How did you come to the idea of using black magics? Was it already used in Hermann or more recently tried in Arminius?
1st I recently installed openSUSE 64 bit on my raspberry pi 3 and I wanted to test how fast my magic generator is on that machine. Optimizing the overlapping heavyly relies on testing if vectors of 64 bit numbers fit together. A set bit stands for a used table entry, a not set bit stands for an unused entry.
2nd there was this thread asking for dense magics that made me think about how magics work again.
And last but not least holidays, giving me some time to try it out. The changes to the optimizer were not that big, but it took some time to get rid of old optimized code that relied on the assumption that index == 0 is always used.
The changes are in Arminius since yesterday.

 Posts: 163
 Joined: Mon Sep 03, 2007 7:15 am
Re: Black magic bitboards
It was intended to get a smaller table, but since the size of the table does not drop significantly any more, I think you are right.[/quote]AlvaroBegue wrote:OK, now I finally understand the point. So it's a microoptimization that will not change anything in practical terms for chess programming, but it's somewhat entertaining for some people as a challenge.
The test of your variation is running. First results indicate, that it gives less good collisions to exploit and less big gaps of unused entries. Interesting enough to use some more CPU time on it to come closer to the optimum.Since you have the infrastructure to test these things, would you be able to test one variation for me? (((occ & mask) * magic) + 0x8000000000000000ull) >> (64  n)
Whether a magic multiplier works or not doesn't depend on the presence of the constant, but the resulting offsets might be in a narrower range, because we have shifted 0 to the middle of the range.

 Posts: 163
 Joined: Mon Sep 03, 2007 7:15 am
Re: Black magic bitboards
That gave me a lookup table of size 90995*8. The sizes for the most compact per square tables without overlapping are 4767*8 for bishops and 101308*8 for rooks.AlvaroBegue wrote:Since you have the infrastructure to test these things, would you be able to test one variation for me? (((occ & mask) * magic) + 0x8000000000000000ull) >> (64  n)
There are more gaps in the overlapped table than with the other methods.
Re: Black magic bitboards
Very intresting idea. I tried shaving off a few bytes by spending more CPU time to find magics with minimal index range (and replacing them in your packing, or just tacking them on at the end).
It was barely worth it, saving only 519 entries, but here you go anyway:
Just in case they are useful for anyone: Here are all the dense magics (of which I ended up using only a few).
It was barely worth it, saving only 519 entries, but here you go anyway:
Code: Select all
uint64_t lookup_table[87988];
bishop_magics[64] =
{
{ 0xa7020080601803d8ull, 60984 },
{ 0x13802040400801f1ull, 66046 },
{ 0x0a0080181001f60cull, 32910 },
{ 0x1840802004238008ull, 16369 },
{ 0xc03fe00100000000ull, 42115 },
{ 0x24c00bffff400000ull, 835 },
{ 0x0808101f40007f04ull, 18910 },
{ 0x100808201ec00080ull, 25911 },
{ 0xffa2feffbfefb7ffull, 63301 },
{ 0x083e3ee040080801ull, 16063 },
{ 0xc0800080181001f8ull, 17481 },
{ 0x0440007fe0031000ull, 59361 },
{ 0x2010007ffc000000ull, 18735 },
{ 0x1079ffe000ff8000ull, 61249 },
{ 0x3c0708101f400080ull, 68938 },
{ 0x080614080fa00040ull, 61791 },
{ 0x7ffe7fff817fcff9ull, 21893 },
{ 0x7ffebfffa01027fdull, 62068 },
{ 0x53018080c00f4001ull, 19829 },
{ 0x407e0001000ffb8aull, 26091 },
{ 0x201fe000fff80010ull, 15815 },
{ 0xffdfefffde39ffefull, 16419 },
{ 0xcc8808000fbf8002ull, 59777 },
{ 0x7ff7fbfff8203fffull, 16288 },
{ 0x8800013e8300c030ull, 33235 },
{ 0x0420009701806018ull, 15459 },
{ 0x7ffeff7f7f01f7fdull, 15863 },
{ 0x8700303010c0c006ull, 75555 },
{ 0xc800181810606000ull, 79445 },
{ 0x20002038001c8010ull, 15917 },
{ 0x087ff038000fc001ull, 8512 },
{ 0x00080c0c00083007ull, 73069 },
{ 0x00000080fc82c040ull, 16078 },
{ 0x000000407e416020ull, 19168 },
{ 0x00600203f8008020ull, 11056 },
{ 0xd003fefe04404080ull, 62544 },
{ 0xa00020c018003088ull, 80477 },
{ 0x7fbffe700bffe800ull, 75049 },
{ 0x107ff00fe4000f90ull, 32947 },
{ 0x7f8fffcff1d007f8ull, 59172 },
{ 0x0000004100f88080ull, 55845 },
{ 0x00000020807c4040ull, 61806 },
{ 0x00000041018700c0ull, 73601 },
{ 0x0010000080fc4080ull, 15546 },
{ 0x1000003c80180030ull, 45243 },
{ 0xc10000df80280050ull, 20333 },
{ 0xffffffbfeff80fdcull, 33402 },
{ 0x000000101003f812ull, 25917 },
{ 0x0800001f40808200ull, 32875 },
{ 0x084000101f3fd208ull, 4639 },
{ 0x080000000f808081ull, 17077 },
{ 0x0004000008003f80ull, 62324 },
{ 0x08000001001fe040ull, 18159 },
{ 0x72dd000040900a00ull, 61436 },
{ 0xfffffeffbfeff81dull, 57073 },
{ 0xcd8000200febf209ull, 61025 },
{ 0x100000101ec10082ull, 81259 },
{ 0x7fbaffffefe0c02full, 64083 },
{ 0x7f83fffffff07f7full, 56114 },
{ 0xfff1fffffff7ffc1ull, 57058 },
{ 0x0878040000ffe01full, 58912 },
{ 0x945e388000801012ull, 22194 },
{ 0x0840800080200fdaull, 70880 },
{ 0x100000c05f582008ull, 11140 },
};
rook_magics[64] =
{
{ 0x80280013ff84ffffull, 10890 },
{ 0x5ffbfefdfef67fffull, 50579 },
{ 0xffeffaffeffdffffull, 62020 },
{ 0x003000900300008aull, 67322 },
{ 0x0050028010500023ull, 80251 },
{ 0x0020012120a00020ull, 58503 },
{ 0x0030006000c00030ull, 51175 },
{ 0x0058005806b00002ull, 83130 },
{ 0x7fbff7fbfbeafffcull, 50430 },
{ 0x0000140081050002ull, 21613 },
{ 0x0000180043800048ull, 72625 },
{ 0x7fffe800021fffb8ull, 80755 },
{ 0xffffcffe7fcfffafull, 69753 },
{ 0x00001800c0180060ull, 26973 },
{ 0x4f8018005fd00018ull, 84972 },
{ 0x0000180030620018ull, 31958 },
{ 0x00300018010c0003ull, 69272 },
{ 0x0003000c0085ffffull, 48372 },
{ 0xfffdfff7fbfefff7ull, 65477 },
{ 0x7fc1ffdffc001fffull, 43972 },
{ 0xfffeffdffdffdfffull, 57154 },
{ 0x7c108007befff81full, 53521 },
{ 0x20408007bfe00810ull, 30534 },
{ 0x0400800558604100ull, 16548 },
{ 0x0040200010080008ull, 46407 },
{ 0x0010020008040004ull, 11841 },
{ 0xfffdfefff7fbfff7ull, 21112 },
{ 0xfebf7dfff8fefff9ull, 44214 },
{ 0xc00000ffe001ffe0ull, 57925 },
{ 0x4af01f00078007c3ull, 29574 },
{ 0xbffbfafffb683f7full, 17309 },
{ 0x0807f67ffa102040ull, 40143 },
{ 0x200008e800300030ull, 64659 },
{ 0x0000008780180018ull, 70469 },
{ 0x0000010300180018ull, 62917 },
{ 0x4000008180180018ull, 60997 },
{ 0x008080310005fffaull, 18554 },
{ 0x4000188100060006ull, 14385 },
{ 0xffffff7fffbfbfffull, 0 },
{ 0x0000802000200040ull, 38091 },
{ 0x20000202ec002800ull, 25122 },
{ 0xfffff9ff7cfff3ffull, 60083 },
{ 0x000000404b801800ull, 72209 },
{ 0x2000002fe03fd000ull, 67875 },
{ 0xffffff6ffe7fcffdull, 56290 },
{ 0xbff7efffbfc00fffull, 43807 },
{ 0x000000100800a804ull, 73365 },
{ 0x6054000a58005805ull, 76398 },
{ 0x0829000101150028ull, 20024 },
{ 0x00000085008a0014ull, 9513 },
{ 0x8000002b00408028ull, 24324 },
{ 0x4000002040790028ull, 22996 },
{ 0x7800002010288028ull, 23213 },
{ 0x0000001800e08018ull, 56002 },
{ 0xa3a80003f3a40048ull, 22809 },
{ 0x2003d80000500028ull, 44545 },
{ 0xfffff37eefefdfbeull, 36072 },
{ 0x40000280090013c1ull, 4750 },
{ 0xbf7ffeffbffaf71full, 6014 },
{ 0xfffdffff777b7d6eull, 36054 },
{ 0x48300007e8080c02ull, 78538 },
{ 0xafe0000fff780402ull, 28745 },
{ 0xee73fffbffbb77feull, 8555 },
{ 0x0002000308482882ull, 1009 },
};
Code: Select all
// width without overlapping: 4606
bishops[64] = {
0xa7020080601803d8ull,
0x13802040400801f1ull,
0x0a0080181001f60cull,
0x1840802004238008ull,
0x581fc40182094560ull,
0xa0a01f0100824001ull,
0x3ae7f04002008258ull,
0x200c10101ec00080ull,
0xb69d010040104803ull,
0x833c802040102401ull,
0xc0800080181001f8ull,
0x0442008020048040ull,
0x3253203c01880043ull,
0xccf84020010081e4ull,
0x3c0708101f400080ull,
0x0d0614080fa00040ull,
0xf300c0007e806002ull,
0x41bea00040102002ull,
0x53018080c00f4001ull,
0xb00020820080104dull,
0x0060200100080842ull,
0x2100100021c58023ull,
0xcc8808000fbf8002ull,
0x1870060008300060ull,
0x8800013e8300c030ull,
0x0420009701806018ull,
0x020003fa00802080ull,
0x8700303010c0c006ull,
0xc800181810606000ull,
0x2a002038201c8010ull,
0x20007004b80fc00eull,
0x90c80c0c10083003ull,
0x73600080cd02c040ull,
0xdb60004066416020ull,
0x83a80101f6010010ull,
0x008b001c70080080ull,
0xa00020c018003088ull,
0x99001c600e001e30ull,
0x84002007e0d7f078ull,
0x0400003007f3f008ull,
0x0800004100f90080ull,
0x0220002080748040ull,
0x2ba40080818300c0ull,
0x1e23040080fb4080ull,
0xa800003c80180030ull,
0xc10000df80280050ull,
0x8b8000200813f024ull,
0x398000200405f812ull,
0xc20000203e808182ull,
0xc538000fa04032a5ull,
0x800404000f808010ull,
0x0005802808003f62ull,
0xd0880001001fe05bull,
0x72dd000040900a00ull,
0x55003ee010200804ull,
0xcd8000200febf209ull,
0x20c000101ec10062ull,
0x0a5f7c00101f4029ull,
0x22048800000f8080ull,
0x4a0c81005008003full,
0x090a010001002020ull,
0x945e388000801012ull,
0xe73c000040400826ull,
0xa500008060400fd8ull,
};
// width without overlapping: 97178
rooks[64] = {
0x80500104d8000230ull,
0x8018007400860002ull,
0x8030010086000090ull,
0x006006008120006cull,
0x0050028010500023ull,
0x003000c001900030ull,
0x00300060003000c0ull,
0x0058005806b00002ull,
0x508028010114000aull,
0x08000c0086000300ull,
0x0000080401020008ull,
0x0000200200200400ull,
0x10003001c030000full,
0x99001800c0180060ull,
0x4f8018005fd00018ull,
0x3580180030620018ull,
0x10300018010c0008ull,
0x2906000c00830004ull,
0x8002000804010008ull,
0x1804002002002004ull,
0x0102002001002002ull,
0x2211001000801040ull,
0x4000004040008001ull,
0x612c8007c007f020ull,
0x1040200010080008ull,
0x0100080010040011ull,
0xa004010008020008ull,
0x0000040020200200ull,
0x2320020020010020ull,
0x4af01f00078007c3ull,
0x0a00008020200040ull,
0x240fc00040000080ull,
0x200008e800300031ull,
0x0000080400100011ull,
0x3400400880410010ull,
0x4000200400200200ull,
0x8800200100200200ull,
0xa8001fc080070007ull,
0x4000200080200040ull,
0x2000802000200040ull,
0x1c000202ec002800ull,
0x0183000086000c00ull,
0x703ffdc008004010ull,
0x0020040002002020ull,
0x290000302e803003ull,
0x0400010000802020ull,
0x140000100800a804ull,
0x6054000a58005805ull,
0x0829000101150028ull,
0x1d00008a00850014ull,
0x2b40007ec6030018ull,
0xc020003da01e0020ull,
0x0020002030018030ull,
0x4080001800df8018ull,
0xa3a80003f3a40048ull,
0x9843d80000500028ull,
0x000000804110102eull,
0x52000080086eff41ull,
0x150000564012ff8bull,
0x024000100c080fd2ull,
0x48300007e8080c02ull,
0x2200000370040802ull,
0x7100000400437802ull,
0x00a2000828884302ull,
};
Re: Black magic bitboards
I have just implemented this in Cfish:
https://github.com/syzygy1/Cfish/commit ... 067c1c4995
Unfortunately it is slightly slower on my machine (i73930K Sandybridge) than the plain magic approach based on Volker's white magics. This is a bit surprising, since the array size goes down from 89524 to 87988 entries.
But now I tried on my laptop, and black magics wins there. Interesting.
Results on my Kabylake laptop for "cfish bench 128 1 17 >/dev/null" (nps):
These are all the best results after several tries (and with cpu governor set to performance and the process bound to a particular cpu). Gaussian distribution does not apply here.
The fancy magics and plain bmi2 are essentially those from Stockfish. The fancy bmi2 implementation uses 200K rook tables.
https://github.com/syzygy1/Cfish/commit ... 067c1c4995
Unfortunately it is slightly slower on my machine (i73930K Sandybridge) than the plain magic approach based on Volker's white magics. This is a bit surprising, since the array size goes down from 89524 to 87988 entries.
But now I tried on my laptop, and black magics wins there. Interesting.
Then I am interested in your latest white magics. Did you post them somewhere, or would you mind posting them here?Volker Annuss wrote:Now I have a table size of 88316*8 for black magics and 88772*8 for white ones.
Results on my Kabylake laptop for "cfish bench 128 1 17 >/dev/null" (nps):
Code: Select all
fancy magics 2321090
white magics 2343727
black magics 2353732
plain bmi2 2355310
fancy bmi2 2364420
The fancy magics and plain bmi2 are essentially those from Stockfish. The fancy bmi2 implementation uses 200K rook tables.
Re: Black magic bitboards
Redirecting stderr to a file seems to remove some further disturbances. I now get this:syzygy wrote:Results on my Kabylake laptop for "cfish bench 128 1 17 >/dev/null" (nps):Code: Select all
fancy magics 2321090 white magics 2343727 black magics 2353732 plain bmi2 2355310 fancy bmi2 2364420
Code: Select all
fancy magics 2337686
white magics 2362632
black magics 2362235
plain bmi2 2372801
fancy bmi2 2381846
It seems white magics and black magics are virtually indistinguishable in performance.
BMI2 is still better, but Volker's overlappingtable approach gets pretty close.