A 16 year quest to outperform Magic Bitboards finally ends

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 16, 2023 8:07 pm

With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.

martinn's test
Modified KGSS 102.305105 16672 [130kb] imul64 no
https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 90.846201 88891 [694kb] imul64 no
Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test!

martinn · Post by **martinn** » Sat Mar 18, 2023 8:49 pm

Mike Sherwin wrote: ↑Thu Mar 16, 2023 8:07 pm With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.

martinn's test
Modified KGSS 102.305105 16672 [130kb] imul64 no
https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 90.846201 88891 [694kb] imul64 no
Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test!

Hello Mike,
few days back I tried to change it with constexpr specifier. I would like if you could try it on your computer. In my computer it is slightly faster.

https://gist.github.com/martinnovaak/b6 ... cb45d4beab

It has too many characters so I can't post code here.

Back then after success with lookup in hSubset I got idea to use lookup also in vSubset (vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58))];. So I changed function Rook to:

Code: Select all

static constexpr uint64_t Rook(int sq, uint64_t occ)
{
	return hSubset[sq][(occ >> horizontal_shift_table[sq]) & 63] +
		vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58)];
}

Instead of & and >> operations there are two lookups but unfortunately in my tests it ended being slower than & and >> operations. I would like to ask you if you could check it on your computer if you also get worse results. So it's just a note that direct lookup is not always faster.

https://gist.github.com/martinnovaak/91 ... fcc22e564c

Mike Sherwin · Post by **Mike Sherwin** » Sat Mar 18, 2023 10:21 pm

martinn wrote: ↑Sat Mar 18, 2023 8:49 pm
Mike Sherwin wrote: ↑Thu Mar 16, 2023 8:07 pm With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.

martinn's test
Modified KGSS 102.305105 16672 [130kb] imul64 no
https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 90.846201 88891 [694kb] imul64 no
Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test!
Hello Mike,
few days back I tried to change it with constexpr specifier. I would like if you could try it on your computer. In my computer it is slightly faster.

https://gist.github.com/martinnovaak/b6 ... cb45d4beab

It has too many characters so I can't post code here.

Back then after success with lookup in hSubset I got idea to use lookup also in vSubset (vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58))];. So I changed function Rook to:
Code: Select all
static constexpr uint64_t Rook(int sq, uint64_t occ)
{
	return hSubset[sq][(occ >> horizontal_shift_table[sq]) & 63] +
		vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58)];
}
Instead of & and >> operations there are two lookups but unfortunately in my tests it ended being slower than & and >> operations. I would like to ask you if you could check it on your computer if you also get worse results. So it's just a note that direct lookup is not always faster.

https://gist.github.com/martinnovaak/91 ... fcc22e564c

I got almost identical performance (429) for both links and the way it was before the latest try. What did get a little more performance is changing the '+' op to the '|' op in the rook and bishop functions. It went from 429 to 435. I ran each test several times. VSVC 2022 and R9 3950x at 4.2GHz and 3600 MHz 18 latency memory.

A 16 year quest to outperform Magic Bitboards finally ends

A 16 year quest to outperform Magic Bitboards finally ends

Re: A 16 year quest to outperform Magic Bitboards finally ends

Re: A 16 year quest to outperform Magic Bitboards finally ends