Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

dangi12012 · Post by **dangi12012** » Sun Oct 10, 2021 9:40 pm

Update: Youtube review from maksim -

maksimKorzh wrote: ↑Sun Oct 10, 2021 7:18 pmHere's a video covering your project I made:

Great to have a youtube channel on chessprogramming!

Look · Post by **Look** » Mon Oct 11, 2021 2:50 pm

It would be interesting to see "Gigantua" used in some C++ chess engines and measure the speed and Elo difference.

dangi12012 · Post by **dangi12012** » Fri Oct 22, 2021 12:16 am

Can anyone help me with a initial TT implementation? Maybe just write a comment if your engine uses a fast hash lookup. I would really like to see some good implementations

j.t. · Post by **j.t.** » Fri Oct 22, 2021 12:27 am

dangi12012 wrote: ↑Fri Oct 22, 2021 12:16 am Can anyone help me with an initial TT implementation? Maybe just write a comment if your engine uses a fast hash lookup. I would really like to see some good implementations

What's a fast hash lookup? TT lookups are just slow, you can't do anything about it, except buying a CPU with huge cache. The concrete implementation doesn't matter too much at this point. If you're just starting out, I would suggest using a simple always-replace strategy. Works well enough (sometimes even better than complex replacement strategies), is easy to implement, and can be improved easily when necessary.

dangi12012 · Post by **dangi12012** » Sat Oct 23, 2021 7:55 pm

j.t. wrote: ↑Fri Oct 22, 2021 12:27 am
dangi12012 wrote: ↑Fri Oct 22, 2021 12:16 am Can anyone help me with an initial TT implementation? Maybe just write a comment if your engine uses a fast hash lookup. I would really like to see some good implementations
What's a fast hash lookup? TT lookups are just slow, you can't do anything about it, except buying a CPU with huge cache. The concrete implementation doesn't matter too much at this point. If you're just starting out, I would suggest using a simple always-replace strategy. Works well enough (sometimes even better than complex replacement strategies), is easy to implement, and can be improved easily when necessary.

Well TT implementations will vary greatly in performance dependent by the programmer. While all should still use Zobrist. Maybe there is another Hash for Bitboards thats faster? - since you have 12 * 64 bits already + 6 bits metadata that need to be hashed into a single 64 bit key.

While moving you have from and to as 2x 64 bit to maybe enable some sort of incremental Hash. (not square centric zobrist)

https://github.com/abulmo/MPerft
https://github.com/kongsian/melee-perft

Desperado · Post by **Desperado** » Mon Nov 08, 2021 8:52 pm

AndrewGrant wrote: ↑Sat Oct 09, 2021 8:55 pm https://github.com/ankan-ban/perft_gpu

Interesting! Many years back that was a faster perft move generator already. I'm not sure if the statement is only for cpu code,
but Worlds Fastest Chess Move Generator - 2 Billion Moves/s per thread! is simply not true then, alhough it is pretty fast of course. Do i miss something?

And if there are faster solutions that might help to generate Perft 16 results, your solution seems to have a very limited usage.
Can it help real chess engines to become faster ?! or does it have some interesting attributes from implementation perspective,
that makes the code of an engine cleaner or easier?

dangi12012 · Post by **dangi12012** » Mon Nov 08, 2021 10:18 pm

Desperado wrote: ↑Mon Nov 08, 2021 8:52 pm
AndrewGrant wrote: ↑Sat Oct 09, 2021 8:55 pm https://github.com/ankan-ban/perft_gpu
Interesting! Many years back that was a faster perft move generator already. I'm not sure if the statement is only for cpu code,
but Worlds Fastest Chess Move Generator - 2 Billion Moves/s per thread! is simply not true then, alhough it is pretty fast of course. Do i miss something?

And if there are faster solutions that might help to generate Perft 16 results, your solution seems to have a very limited usage.
Can it help real chess engines to become faster ?! or does it have some interesting attributes from implementation perspective,
that makes the code of an engine cleaner or easier?

Many points to your post:
Yes the template metaprogramming can speed up any engine - this is the first to make Enpassant/Castling use zero cpu cycles when not possible
The visitor pattern enables you to get away without any type of movelist - callback recursion at the point of expansion for free with templates
This is not even incremental engine (as it turns out every attempt to remebering calculated moves is slower) because some things are so cheap with bitboards - the cost of looking into a 64 entry array is 3x slower.
I fully solved pins and checks with bitboards to the point where both only take 2 instructions + the sliderattack lookup for each piece during runtime for a fully legal movegen.

Of course this could help with Perft16 - or even the 8 Man Tablebases.
There are no faster perft programs per Thread. All others use multiple threads or run on the gpu (which is also multiple threads). Technically cuda would be a warp which is 32 threads all executing the same instruction or stalling.
Its easiest to compare 1 thread to 1 thread since you can scale up threads to whichever hardware you go to.

If someone is seriously interested in Perft16 - I would port my code to CUDA. It would not be too hard since its quite small 5 files - really there are 2 and the functions are simple (which helps the compiler) and it could even count castlings checks, double checks without any code changes at all.

My current first interest is changing this into a full engine - and since I am not interested in using existing stuff I start implementing dozends of gametree visitors from scratch and finding the best of the best implementations - like I did for this movegen.

Desperado · Post by **Desperado** » Mon Nov 08, 2021 11:19 pm

dangi12012 wrote: ↑Mon Nov 08, 2021 10:18 pm
Desperado wrote: ↑Mon Nov 08, 2021 8:52 pm
AndrewGrant wrote: ↑Sat Oct 09, 2021 8:55 pm https://github.com/ankan-ban/perft_gpu
Interesting! Many years back that was a faster perft move generator already. I'm not sure if the statement is only for cpu code,
but Worlds Fastest Chess Move Generator - 2 Billion Moves/s per thread! is simply not true then, alhough it is pretty fast of course. Do i miss something?

And if there are faster solutions that might help to generate Perft 16 results, your solution seems to have a very limited usage.
Can it help real chess engines to become faster ?! or does it have some interesting attributes from implementation perspective,
that makes the code of an engine cleaner or easier?
Many points to your post:
Yes the template metaprogramming can speed up any engine - this is the first to make Enpassant/Castling use zero cpu cycles when not possible
The visitor pattern enables you to get away without any type of movelist - callback recursion at the point of expansion for free with templates
This is not even incremental engine (as it turns out every attempt to remebering calculated moves is slower) because some things are so cheap with bitboards - the cost of looking into a 64 entry array is 3x slower.
I fully solved pins and checks with bitboards to the point where both only take 2 instructions + the sliderattack lookup for each piece during runtime for a fully legal movegen.

Of course this could help with Perft16 - or even the 8 Man Tablebases.
There are no faster perft programs per Thread. All others use multiple threads or run on the gpu (which is also multiple threads). Technically cuda would be a warp which is 32 threads all executing the same instruction or stalling.
Its easiest to compare 1 thread to 1 thread since you can scale up threads to whichever hardware you go to.

If someone is seriously interested in Perft16 - I would port my code to CUDA. It would not be too hard since its quite small 5 files - really there are 2 and the functions are simple (which helps the compiler) and it could even count castlings checks, double checks without any code changes at all.

My current first interest is changing this into a full engine - and since I am not interested in using existing stuff I start implementing dozends of gametree visitors from scratch and finding the best of the best implementations - like I did for this movegen.

Well, i am not sure what you want to say. You might have a very fast cpu solution per thread, that does not mean you have the fastest perft generator at all. The bitboard solution on a gpu can be scaled too like the thread solution. So, does your algorithm ported to a gpu will be faster than the perft generator that Andrew linked to? Do you have already a gpu solution?

Sorry if i am bit annoying, but if you are table tennis champion you are not tennis champion at the same time.
IMHO a solution that provides 27 Billion nodes per second on a GPU when there are already much faster gpus (guess 2 to 4 times faster) available today looks faster to me.

Really, seriously, there is no doubt that you did a very good job, but if someone needs for some reason the fastest perft generator,
would it be really yours?

Another viewpoint might be to compare the solutions based on the amount a a certain budget. If you have a budget of X, would the gpu
solution be faster than your thread solution? The idea is, that the budget would limit the hardware scaling and so you might compare things better.

Your headline is "Worlds fastest Chess Move Generator - ..." that suggests that is the fastest in general which is doing 2 Billion Moves/s per thread.
And not that is the fastest generator cpu based. Sorry for being so picky, but that is a very strong claim you made and as far as i can see it is not true (yet).

Can you provide a simple AlphaBeta engine that can achieve an engine Elo of 2300 (not 3300) with a pure material/mobility evaluation (alternatively PST). Based on that speed, you should easily reach that level. My guess is that you are severely underestimating the requirements for a chess engine, but you can prove me wrong.

Steve Maughan · Post by **Steve Maughan** » Mon Nov 08, 2021 11:47 pm

Desperado wrote: ↑Mon Nov 08, 2021 11:19 pm Your headline is "Worlds fastest Chess Move Generator - ..." that suggests that is the fastest in general which is doing 2 Billion Moves/s per thread.
And not that is the fastest generator cpu based. Sorry for being so picky, but that is a very strong claim you made and as far as i can see it is not true (yet).

I disagree. Perft speed is a valid measure of move generation speed (and little else). I don't have any problems at all with the claim (as long as it's true). There is no implied claim that it's a full chess program or the "fastest in general" — the claim is limited to the move generator.

Steve

Desperado · Post by **Desperado** » Tue Nov 09, 2021 12:12 am

Steve Maughan wrote: ↑Mon Nov 08, 2021 11:47 pm
Desperado wrote: ↑Mon Nov 08, 2021 11:19 pm Your headline is "Worlds fastest Chess Move Generator - ..." that suggests that is the fastest in general which is doing 2 Billion Moves/s per thread.
And not that is the fastest generator cpu based. Sorry for being so picky, but that is a very strong claim you made and as far as i can see it is not true (yet).
I disagree. Perft speed is a valid measure of move generation speed (and little else). I don't have any problems at all with the claim (as long as it's true). There is no implied claim that it's a full chess program or the "fastest in general" — the claim is limited to the move generator.

Steve

Hello Steve,

i fully understand that it is meant to be the fastest perft move generator. But i question exactly this!
If you follow the link of Andrew, there is a perft generator (gpu based) generating 27 Billion nodes per second, which is "a little bit" faster than 2 Billion, not?

My point on that is, that the claim has no limitation on cpus, and i am not sure if he has a gpu version that is faster than the one on the provided link.
So, does he really has the "Worlds fastest Chess Move Generator", based on a single threaded performance of 2 Billion Moves/s which is scalable of course. Well, gpu power is scalable too, or not?

Claiming to have the Worlds fastest Chess Move Generator means also comparing the hardware solution on which the solution runs.
At the end (so far) there is a faster chess move generator, although it is a gpu solution. That's my point.

Until he can port his solution successfully, there is at least one faster solution available to generate perft numbers.
That is what i mean with "general", a chess move generator that is faster but is based on a different technology.

Sorry, i don't see why we disagree. There is no point in which i say something contrary to you.

Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release

Re: Gigantua: 2 Gigamoves per Second per Core move generator - Sourcecode Release