With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.
Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy
It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test!
Mike Sherwin wrote: ↑Thu Mar 16, 2023 8:07 pm
With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.
Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy
It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test!
Hello Mike,
few days back I tried to change it with constexpr specifier. I would like if you could try it on your computer. In my computer it is slightly faster.
It has too many characters so I can't post code here.
Back then after success with lookup in hSubset I got idea to use lookup also in vSubset (vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58))];. So I changed function Rook to:
Instead of & and >> operations there are two lookups but unfortunately in my tests it ended being slower than & and >> operations. I would like to ask you if you could check it on your computer if you also get worse results. So it's just a note that direct lookup is not always faster.
Mike Sherwin wrote: ↑Thu Mar 16, 2023 8:07 pm
With a little help from martinn I have finally done it! My Kindergarten Super SISSY bitboards are now faster than Black Magic bitboards. It depends on what compiler is used and what exact cpu it runs on. So there might be some room for debate. Here are some pertinent test results.
Daniel's latest test - before martinn's improvements
Kindergarten Super SISSY Bitboards 500.132535 16640 [130kb] imul64 no Michael Sherwin https://www.talkchess.com/forum3/viewto ... 4&start=30
Black Magic BB - Fixed shift 511.605299 88891 [694kb] imul64 no Onno Garms and Volker Annuss https://www.chessprogramming.org/Magic_ ... hift_Fancy
It looks obvious that if Daniel were to incorporate martinn's changes, that martinn's Modified KGSS will finally surpass Black Magic in Daniel's test!
Hello Mike,
few days back I tried to change it with constexpr specifier. I would like if you could try it on your computer. In my computer it is slightly faster.
It has too many characters so I can't post code here.
Back then after success with lookup in hSubset I got idea to use lookup also in vSubset (vSubset[sq][(((occ & vMask[sq]) * multiplier[sq]) >> 58))];. So I changed function Rook to:
Instead of & and >> operations there are two lookups but unfortunately in my tests it ended being slower than & and >> operations. I would like to ask you if you could check it on your computer if you also get worse results. So it's just a note that direct lookup is not always faster.
I got almost identical performance (429) for both links and the way it was before the latest try. What did get a little more performance is changing the '+' op to the '|' op in the rook and bishop functions. It went from 429 to 435. I ran each test several times. VSVC 2022 and R9 3950x at 4.2GHz and 3600 MHz 18 latency memory.