A 16 year quest to outperform Magic Bitboards finally ends

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

A 16 year quest to outperform Magic Bitboards finally ends

Post by Mike Sherwin »

With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.

martinn's test
Modified KGSS 102.305105 16672 [130kb] imul64 no
https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 90.846201 88891 [694kb] imul64 no
Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test! :D
martinn
Posts: 20
Joined: Fri Mar 10, 2023 9:33 am
Full name: Martin Novák

Re: A 16 year quest to outperform Magic Bitboards finally ends

Post by martinn »

Mike Sherwin wrote: Thu Mar 16, 2023 8:07 pm With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.

martinn's test
Modified KGSS 102.305105 16672 [130kb] imul64 no
https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 90.846201 88891 [694kb] imul64 no
Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test! :D
Hello Mike,
few days back I tried to change it with constexpr specifier. I would like if you could try it on your computer. In my computer it is slightly faster.

https://gist.github.com/martinnovaak/b6 ... cb45d4beab

It has too many characters so I can't post code here.

Back then after success with lookup in hSubset I got idea to use lookup also in vSubset (vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58))];. So I changed function Rook to:

Code: Select all

static constexpr uint64_t Rook(int sq, uint64_t occ)
{
	return hSubset[sq][(occ >> horizontal_shift_table[sq]) & 63] +
		vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58)];
}

Instead of & and >> operations there are two lookups but unfortunately in my tests it ended being slower than & and >> operations. I would like to ask you if you could check it on your computer if you also get worse results. So it's just a note that direct lookup is not always faster.

https://gist.github.com/martinnovaak/91 ... fcc22e564c
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: A 16 year quest to outperform Magic Bitboards finally ends

Post by Mike Sherwin »

martinn wrote: Sat Mar 18, 2023 8:49 pm
Mike Sherwin wrote: Thu Mar 16, 2023 8:07 pm With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.

martinn's test
Modified KGSS 102.305105 16672 [130kb] imul64 no
https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 90.846201 88891 [694kb] imul64 no
Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy

It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test! :D
Hello Mike,
few days back I tried to change it with constexpr specifier. I would like if you could try it on your computer. In my computer it is slightly faster.

https://gist.github.com/martinnovaak/b6 ... cb45d4beab

It has too many characters so I can't post code here.

Back then after success with lookup in hSubset I got idea to use lookup also in vSubset (vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58))];. So I changed function Rook to:

Code: Select all

static constexpr uint64_t Rook(int sq, uint64_t occ)
{
	return hSubset[sq][(occ >> horizontal_shift_table[sq]) & 63] +
		vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58)];
}

Instead of & and >> operations there are two lookups but unfortunately in my tests it ended being slower than & and >> operations. I would like to ask you if you could check it on your computer if you also get worse results. So it's just a note that direct lookup is not always faster.

https://gist.github.com/martinnovaak/91 ... fcc22e564c
I got almost identical performance (429) for both links and the way it was before the latest try. What did get a little more performance is changing the '+' op to the '|' op in the rook and bishop functions. It went from 429 to 435. I ran each test several times. VSVC 2022 and R9 3950x at 4.2GHz and 3600 MHz 18 latency memory.