Faster than Fancy magic: Hypercube Slider lookup [TEASER]

dangi12012 · Post by **dangi12012** » Thu Dec 09, 2021 3:44 am

So I want to tell you that I found a promising algorithm that seems to perform better than fancy and pext slider lookups among all test machines i have and across all 3 major compiler (gcc clang msvc)! (Intel, Zen2 AMD, Zen3 AMD)
It is losely related to one lookup method - https://www.chessprogramming.org/Rotated_Indices but works much more streamlined and has nothing to do with rotation.

I want to call it hypercube because I imagined the pext indices to "come out of the board" in a 3d way and how they are calculated - then how that caluclation can be done during make unmake without using pext. It would look like a 3d cube with electrical wires in between 8x8x32 cells. (still a cube with 32===8 elements you will see what i mean).
I played around with cross referencing bitfields for all 64 squares - but one night I had the idea that that is not needed at all.

Because you can attach your code to the boardstate and with 1 small branchless function you get all new 128 possible pext values and the preliminary tests and implementation of the prototype look very very promising.

Code: Select all

Megalookups/s:
Reference: 103.93MOps
Fancy: 818.23MOps
PEXT: 848.86MOps
HYPER: 1045.74MOps

Of course you cant verify without sourcecode so this is just a teaser until I am sure that I cannot optimize more. For now I can only show the beauty of the assembly thats behind it:

During makemove 1 jumptable operation is needed. No branches - no ifs. Works with bitboards and mailslot algorithms (the board representation does not matter in fact) - you only need from/to squares in 0..63 index form:

This is the output for -O1 with -O3 its smaller (and uglier - but faster)

Code: Select all

        or      DWORD PTR cfg[rip+68], 1024
        or      DWORD PTR cfg[rip+72], 1024
        or      DWORD PTR cfg[rip+76], 1024
        or      DWORD PTR cfg[rip+80], 1024
        or      DWORD PTR cfg[rip+84], 1024
        or      DWORD PTR cfg[rip+88], 1024
        or      DWORD PTR cfg[rip+32], 64
        or      DWORD PTR cfg[rip+96], 64
        or      DWORD PTR cfg[rip+128], 128
        or      DWORD PTR cfg[rip+160], 256
        or      DWORD PTR cfg[rip+192], 512
        or      DWORD PTR cfg[rip+36], 2097152
        or      DWORD PTR cfg[rip+100], 2097152
        or      DWORD PTR cfg[rip+136], 8388608
        or      DWORD PTR cfg[rip+172], 8388608
        or      DWORD PTR cfg[rip+208], 2097152

Here i omitted the arguments to not leak my code early:

Code: Select all

Rook/Bishop(int):
        movsx
        movzx
        mov
        mov

Now to make it clear: The 20 or/andnot operations during makemove replace any and all slider calculations for the current position! No hardware instructions needed like pext - and no imul64 multiplication like for fancy magic. Just OR and a very similar amount of memory to both fast methods.

I am looking forward to sharing my code and I am happy to answer if you have questions!

AndrewGrant · Post by **AndrewGrant** » Thu Dec 09, 2021 4:07 am

Is this solution going to create a 25MB binary and take 3.5 hours to compile?

dangi12012 · Post by **dangi12012** » Thu Dec 09, 2021 1:20 pm

AndrewGrant wrote: ↑Thu Dec 09, 2021 4:07 am Is this solution going to create a 25MB binary and take 3.5 hours to compile?

Fast to compile!

no loops - no ifs - no complex computer instruction - no 64 bit multiplication.

Fabio Gobbato · Post by **Fabio Gobbato** » Thu Dec 09, 2021 2:50 pm

Is it good also for arm cpu?

tcusr · Post by **tcusr** » Thu Dec 09, 2021 3:59 pm

how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release

dangi12012 · Post by **dangi12012** » Thu Dec 09, 2021 5:36 pm

tcusr wrote: ↑Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release

lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!

AndrewGrant · Post by **AndrewGrant** » Thu Dec 09, 2021 11:08 pm

dangi12012 wrote: ↑Thu Dec 09, 2021 5:36 pm
tcusr wrote: ↑Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!

The real question is how badly does this obfuscate the board. The point of bitboards is hardly for movegen, it (used to be, thanks to NNUE) is for the ease of computing extremely complex ideas for evaluation easily; and for SEE.

R. Tomasi · Post by **R. Tomasi** » Thu Dec 09, 2021 11:25 pm

AndrewGrant wrote: ↑Thu Dec 09, 2021 11:08 pm
dangi12012 wrote: ↑Thu Dec 09, 2021 5:36 pm
tcusr wrote: ↑Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!
The real question is how badly does this obfuscate the board. The point of bitboards is hardly for movegen, it (used to be, thanks to NNUE) is for the ease of computing extremely complex ideas for evaluation easily; and for SEE.

How I understand the rather sparse details given by Daniel is that it does not touch the board at all. Some sort of lookup table that incrementally updates with the move making/unmaking.... (?)

AndrewGrant · Post by **AndrewGrant** » Thu Dec 09, 2021 11:46 pm

R. Tomasi wrote: ↑Thu Dec 09, 2021 11:25 pm
AndrewGrant wrote: ↑Thu Dec 09, 2021 11:08 pm
dangi12012 wrote: ↑Thu Dec 09, 2021 5:36 pm
tcusr wrote: ↑Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!
The real question is how badly does this obfuscate the board. The point of bitboards is hardly for movegen, it (used to be, thanks to NNUE) is for the ease of computing extremely complex ideas for evaluation easily; and for SEE.
How I understand the rather sparse details given by Daniel is that it does not touch the board at all. Some sort of lookup table that incrementally updates with the move making/unmaking.... (?)

My expectation is that there will be no practical value to any of this, based on the previous program from him and the way he interacts on these forums. However, I'll reserve judgement. I could be dumb.

Harald · Post by **Harald** » Fri Dec 10, 2021 12:16 am

dangi12012 wrote

It is losely related to one lookup method - https://www.chessprogramming.org/Rotated_Indices but works much more streamlined and has nothing to do with rotation.

For comparison: I found an old post from 2007-08-25: Re: BitBoard Tests (Rotated Indices)
with a source code snippet bb_rotated_indices.cpp
Rotated indices were the fastest method in my tests.
https://www.talkchess.com/forum3/viewto ... es#p140155
See also
https://www.talkchess.com/forum3/viewto ... es#p140420
https://www.talkchess.com/forum3/viewto ... es#p140111

Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]