Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Mergi · Post by **Mergi** » Tue Dec 14, 2021 10:39 pm

dangi12012 wrote: ↑Tue Dec 14, 2021 9:16 pm
Mergi wrote: ↑Tue Dec 14, 2021 8:01 pm
dangi12012 wrote: ↑Tue Dec 14, 2021 7:21 pm So 88064 is the current record for variable shift fancy magic?
No idea, i'm using the publicly available magic numbers from https://www.chessprogramming.org/Best_Magics_so_far

Unless there were some new added in the past few months or i missed some when i was originally making the generator, 88064 for rooks and 4800 for bishops should be the smallest for fancy magics.
Ah the performance is slow because:
Code: Select all
((occ & m.inner_mask) * m.magic_num) >> m.shift
That is plain magic with variable shift

Can anyone here on this forum point me to the sourcecode of a fancy magic with fixed shift implementation?
It would look like that:
Code: Select all
((occ | m.inner_mask) * m.magic_num) >> (64-12)

I think you are confusing some things. Are you perhaps refering to the so called black magics https://www.chessprogramming.org/Magic_ ... ShiftFancy ? Both approaches are called fancy magic. Just one is with fixed shift and the other dynamic. It's not hard to implement, the normal magics code can be adjusted easily. The gains, if any, are absolutely negligible. I'll adjust it for the black magic numbers and send you the code in a moment.

Mergi · Post by **Mergi** » Tue Dec 14, 2021 11:02 pm

Here you go.

magic numbers taken from here http://www.talkchess.com/forum3/viewtopic.php?t=64790

dangi12012 · Post by **dangi12012** » Wed Dec 15, 2021 12:53 am

Mergi wrote: ↑Tue Dec 14, 2021 11:02 pm Here you go.

magic numbers taken from here http://www.talkchess.com/forum3/viewtopic.php?t=64790

There we go:

Code: Select all

Megalookups/s:
Exploading:     122.45MOps
Reference:      123.94MOps
KoggeStone:     176.53MOps
RotatedBoard:   132.81MOps
QBB Algo:       209.38MOps
BobMike:        333.72MOps
SlideArithm:    364.08MOps
XorRookSub:     476.93MOps
Hash Variable:  614.93MOps
Hash Fixed:     653.76MOps
Hash Fancy:     759.88MOps
Pext  :         899.04MOps
HyperCube:      939.62MOps

Hash Variable: 614.93MOps= return m.attacks[((occ & m.inner_mask) * m.magic_num) >> m.shift];
Hash Fixed: 653.76MOps = return m.attacks[square][(((occupancy)&magicmoves_r_mask[square]) * magicmoves_r_magics[square]) >> (64 - 12)];
Hash Fancy: 759.88MOps = return m.attacks[((occ | m.inner_mask) * m.black_magic_num) >> (64 - 12)];

I think Fixed and Fancy should be really the same. But the code is not constexpr because its from 2004 or so.

I just realized I made a big benchmark mistake T is just a namespace alias for the current algorithm. opt is volatile.

Code: Select all

for (int i = 0; i < N; i++) {
     opt = T::Queen(i, occ);
 }

So with loop unrolling the compiler inlines all the lookups onto the square so this distorts many algorithms.
Now its a real dilemma because I wanted these benchmarks to be representative - but as in real live its difficult - you engine or movegen might be written in a loop over squares - then this is representative for you. If the compiler doesnt know the square (which is also likely) all algos need the first lookup also and this order above looks very different.
So depending on your own style some movegenerators can skip one or two lookups.
Depends also on the hardware. While pext looks good - a pre 2018 cpu or any gpu certainly wouldnt be fastest with it.

So I will just release soon and i think at this point its easiest if your engine is written in a way where you can replace the sliding lookup code and test yourselfes which algorithm performs best for your engine/movegen.
I will also cleanup the code for release and you can be certain that everyone who wants to take a look at it can expect this interface:
In the topmost header just do:

Code: Select all

namespace Lookup = Chess_Lookup::Lookup_Hyper; //or whatever algorithm you want out of this up to date c++20 assortment
//Existing methods in any and all interfaces
//Lookup::Init()
//Lookup::Bishop(int sq, uint64_t occ)
//Lookup::Rook(int sq, uint64_t occ)
//Lookup::Queen(int sq, uint64_t occ)
//Lookup::Bishop_Xray(int sq, uint64_t occ)
//Lookup::Rook_Xray(int sq, uint64_t occ)

//Algos that are incremental: (Hypercube, Rotated Bitboards etc.) Same as above but occ is omitted
//+ Lookup::Move(int from, int to)

dangi12012 · Post by **dangi12012** » Wed Dec 15, 2021 12:41 pm

Update: Now with size information and caviat from above:
square loop:

Code: Select all

Megalookups/s:
Exploading:     123.55MOps      6 kB
Reference:      125.63MOps      8 kB
KoggeStone:     174.42MOps      0 kB
RotatedBoard:   132.11MOps      14 kB
QBB Algo:       208.23MOps      0 kB
BobMike:        336.06MOps      8 kB
SlideArithm:    340.07MOps      2 kB
XorRookSub:     500.24MOps      2 kB
Hash Variable:  615.44MOps      730 kB
Hash Fixed:     653.41MOps      2306 kB
Hash Fancy:     740.12MOps      696 kB
Pext  :         899.68MOps      843 kB
HyperCube:      942.97MOps      841 kB

unknown square loop:

Code: Select all

Megalookups/s:
Exploading:     68.92MOps       6 kB
Reference:      62.08MOps       8 kB
KoggeStone:     158.67MOps      0 kB
RotatedBoard:   77.65MOps       14 kB
QBB Algo:       177.80MOps      0 kB
BobMike:        265.23MOps      8 kB
SlideArithm:    246.21MOps      2 kB
XorRookSub:     336.63MOps      2 kB
Hash Variable:  471.06MOps      730 kB
Hash Fixed:     366.56MOps      2306 kB
Hash Fancy:     454.02MOps      696 kB
Pext  :         603.67MOps      843 kB
HyperCube:      657.90MOps      841 kB

Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]