Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Discussion of chess software programming and technical issues.

Moderator: Ras

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by dangi12012 »

So I want to tell you that I found a promising algorithm that seems to perform better than fancy and pext slider lookups among all test machines i have and across all 3 major compiler (gcc clang msvc)! (Intel, Zen2 AMD, Zen3 AMD)
It is losely related to one lookup method - https://www.chessprogramming.org/Rotated_Indices but works much more streamlined and has nothing to do with rotation.

I want to call it hypercube because I imagined the pext indices to "come out of the board" in a 3d way and how they are calculated - then how that caluclation can be done during make unmake without using pext. It would look like a 3d cube with electrical wires in between 8x8x32 cells. (still a cube with 32===8 elements you will see what i mean).
I played around with cross referencing bitfields for all 64 squares - but one night I had the idea that that is not needed at all.

Because you can attach your code to the boardstate and with 1 small branchless function you get all new 128 possible pext values and the preliminary tests and implementation of the prototype look very very promising.

Code: Select all

Megalookups/s:
Reference: 103.93MOps
Fancy: 818.23MOps
PEXT: 848.86MOps
HYPER: 1045.74MOps
Of course you cant verify without sourcecode so this is just a teaser until I am sure that I cannot optimize more. For now I can only show the beauty of the assembly thats behind it:

During makemove 1 jumptable operation is needed. No branches - no ifs. Works with bitboards and mailslot algorithms (the board representation does not matter in fact) - you only need from/to squares in 0..63 index form:

This is the output for -O1 with -O3 its smaller (and uglier - but faster)

Code: Select all

        or      DWORD PTR cfg[rip+68], 1024
        or      DWORD PTR cfg[rip+72], 1024
        or      DWORD PTR cfg[rip+76], 1024
        or      DWORD PTR cfg[rip+80], 1024
        or      DWORD PTR cfg[rip+84], 1024
        or      DWORD PTR cfg[rip+88], 1024
        or      DWORD PTR cfg[rip+32], 64
        or      DWORD PTR cfg[rip+96], 64
        or      DWORD PTR cfg[rip+128], 128
        or      DWORD PTR cfg[rip+160], 256
        or      DWORD PTR cfg[rip+192], 512
        or      DWORD PTR cfg[rip+36], 2097152
        or      DWORD PTR cfg[rip+100], 2097152
        or      DWORD PTR cfg[rip+136], 8388608
        or      DWORD PTR cfg[rip+172], 8388608
        or      DWORD PTR cfg[rip+208], 2097152
Here i omitted the arguments to not leak my code early:

Code: Select all

Rook/Bishop(int):
        movsx
        movzx
        mov
        mov
Now to make it clear: The 20 or/andnot operations during makemove replace any and all slider calculations for the current position! No hardware instructions needed like pext - and no imul64 multiplication like for fancy magic. Just OR and a very similar amount of memory to both fast methods.

I am looking forward to sharing my code and I am happy to answer if you have questions!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
AndrewGrant
Posts: 1957
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by AndrewGrant »

Is this solution going to create a 25MB binary and take 3.5 hours to compile?
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by dangi12012 »

AndrewGrant wrote: Thu Dec 09, 2021 4:07 am Is this solution going to create a 25MB binary and take 3.5 hours to compile?
Fast to compile! :)

no loops - no ifs - no complex computer instruction - no 64 bit multiplication. :mrgreen:
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
Fabio Gobbato
Posts: 219
Joined: Fri Apr 11, 2014 10:45 am
Full name: Fabio Gobbato

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by Fabio Gobbato »

Is it good also for arm cpu?
tcusr
Posts: 325
Joined: Tue Aug 31, 2021 10:32 pm
Full name: tcusr

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by tcusr »

how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by dangi12012 »

tcusr wrote: Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
AndrewGrant
Posts: 1957
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by AndrewGrant »

dangi12012 wrote: Thu Dec 09, 2021 5:36 pm
tcusr wrote: Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!
The real question is how badly does this obfuscate the board. The point of bitboards is hardly for movegen, it (used to be, thanks to NNUE) is for the ease of computing extremely complex ideas for evaluation easily; and for SEE.
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by R. Tomasi »

AndrewGrant wrote: Thu Dec 09, 2021 11:08 pm
dangi12012 wrote: Thu Dec 09, 2021 5:36 pm
tcusr wrote: Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!
The real question is how badly does this obfuscate the board. The point of bitboards is hardly for movegen, it (used to be, thanks to NNUE) is for the ease of computing extremely complex ideas for evaluation easily; and for SEE.
How I understand the rather sparse details given by Daniel is that it does not touch the board at all. Some sort of lookup table that incrementally updates with the move making/unmaking.... (?)
AndrewGrant
Posts: 1957
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by AndrewGrant »

R. Tomasi wrote: Thu Dec 09, 2021 11:25 pm
AndrewGrant wrote: Thu Dec 09, 2021 11:08 pm
dangi12012 wrote: Thu Dec 09, 2021 5:36 pm
tcusr wrote: Thu Dec 09, 2021 3:59 pm how big are the lookup tables?
how does it work both for bitboards and mailbox?
did you use ESPRESSO?

looking forward for the release
lookup tables are same size as pext. So bigger than fancy magic - but still fits easily into L2 cache (thats important).
It works for both because it hooks into make/unmake. And just needs the from/to square integer to maintain its own state.

This could be solved directly so in a sense the solution didnt need espresso.
Assembly looks similarly good in arm so I am looking forward to release soon!
The real question is how badly does this obfuscate the board. The point of bitboards is hardly for movegen, it (used to be, thanks to NNUE) is for the ease of computing extremely complex ideas for evaluation easily; and for SEE.
How I understand the rather sparse details given by Daniel is that it does not touch the board at all. Some sort of lookup table that incrementally updates with the move making/unmaking.... (?)
My expectation is that there will be no practical value to any of this, based on the previous program from him and the way he interacts on these forums. However, I'll reserve judgement. I could be dumb.
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: Faster than Fancy magic: Hypercube Slider lookup [TEASER]

Post by Harald »

dangi12012 wrote
It is losely related to one lookup method - https://www.chessprogramming.org/Rotated_Indices but works much more streamlined and has nothing to do with rotation.
For comparison: I found an old post from 2007-08-25: Re: BitBoard Tests (Rotated Indices)
with a source code snippet bb_rotated_indices.cpp
Rotated indices were the fastest method in my tests.
https://www.talkchess.com/forum3/viewto ... es#p140155
See also
https://www.talkchess.com/forum3/viewto ... es#p140420
https://www.talkchess.com/forum3/viewto ... es#p140111