Gigantua: 1.5 Giganodes per Second per Core move generator
Moderator: Ras
-
- Posts: 28387
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
Note that the quoted speed is not nodes/sec, but (bulk-counted) moves/sec. A node is about 50 moves in KiwiPete, so that makes it 30 Mnps. Still pretty fast, of course.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
This is because the visitor template (which is nodecount++) gets inlined by the compiler and it wont divide performance by 50. Especially when its *(movelist++) = move.Daniel answer:
Its not a traditional movegen. Even removing the bulk counting which forces every move to be expanded into the position yields:
No Bulk counting + No Hashtable + No multithreading + Collecting count stats per type of move (pawn, ep, castle, rook moves etc)
Perft Kiwi 6 - no bulk counting: 8031647685 10136ms 792.316 MNodes/s
Gigantua no Hash:
giga "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -" 7
Depth: 7 - r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
Perft 1: 48 0ms 2.66667 MNodes/s
Perft 2: 2039 0ms 47.4186 MNodes/s
Perft 3: 97862 0ms 543.678 MNodes/s
Perft 4: 4085603 3ms 1299.91 MNodes/s
Perft 5: 193690690 114ms 1684.31 MNodes/s
Perft 6: 8031647685 5202ms 1543.77 MNodes/s
Perft 7: 374190009323 224256ms 1668.58 MNodes/s
Quick Perft
Perft mode: Hash-table size = 256MB, bulk counting in horizon nodes
perft( 1)= 48 ( 0.000 sec)
perft( 2)= 2039 ( 0.000 sec)
perft( 3)= 97862 ( 0.001 sec)
perft( 4)= 4085603 ( 0.018 sec)
perft( 5)= 193690690 ( 0.388 sec)
perft( 6)= 8031647685 (10.400 sec)
perft( 7)= 374190009323 (270.706 sec)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 28387
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
A node is a position for which you generate all moves. I don't think you are doing that here.
I admit that 'generating moves' might be difficult to pinpoint with incremental updates. The criterion I used in 'the mailbox trials' was that I counted something as a node when enough work was done to decide whether moves would have to be searched from that position. This amounted to identifying all non-futile captures. I would say the perft equivalent is counting the number of moves in that position.
I admit that 'generating moves' might be difficult to pinpoint with incremental updates. The criterion I used in 'the mailbox trials' was that I counted something as a node when enough work was done to decide whether moves would have to be searched from that position. This amounted to identifying all non-futile captures. I would say the perft equivalent is counting the number of moves in that position.
-
- Posts: 179
- Joined: Tue Jun 15, 2021 8:11 pm
- Full name: Emanuel Torres
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
So to try this out, you expect us to run some executable, from a random stranger, who first posted on this forum today? And who claims that you can execute 16 instructions per clock cycle per thread. Hmmmm... no thanks.
[Moderation warning] This signature violated the rule against commercial exhortations.
-
- Posts: 1524
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
Interesting. I see your Perft is a bit faster than QPerft. QPerft has been considered the fastest one in the mailbox world.dangi12012 wrote: ↑Wed Sep 22, 2021 9:04 pm
Gigantua no Hash:
Perft 7: 374190009323 224256ms 1668.58 MNodes/s
Quick Perft
Perft mode: Hash-table size = 256MB, bulk counting in horizon nodes
perft( 7)= 374190009323 (270.706 sec)
For better understanding, can you post the Perft result of Stockfish on your hardware? SF uses magic bitboard (may be slow and not highly optimized for Perft) and doesn't use hash table for Perft too. On my computer, SF is a bit faster than QPerft. You may try several ways to compile SF for better results.
SF Perft doesn't measure elapsed time, you may follow this post to measure: forum3/viewtopic.php?f=7&t=76773&p=8858 ... ft#p885879
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
The most features chess GUI, based on opensource Banksia - the chess tournament manager
-
- Posts: 260
- Joined: Sat Mar 11, 2006 8:31 am
- Location: Malmö, Sweden
- Full name: Bo Persson
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
I think you read a bit too much into the numbers. As I read it, throughput 0.25 rather means that the core has 4 exceution units each doing 1 instruction per clock. Not doing 4 instructions each.dangi12012 wrote: ↑Wed Sep 22, 2021 8:05 pmAgner Fog Intstruction Tables
https://www.agner.org/optimize/instruction_tables.pdf
During writing of this I used this table extensively. In the 3rd column you have the "reciprocal throughput" listed which is 0.25 for most chess relevant instructions. Meaning one clock of 4.5 ghz can do 4 at once per Execution Unit. Also modern processors have more than one Execution Unit per thread.
I read this as "the execution units" do this work combined, not each of them individually.Agner Fog wrote: The values listed are the reciprocals of the throughputs,
i.e. the average number of clock cycles per instruction when the instructions are not
part of a limiting dependency chain. For example, a reciprocal throughput of 2 for
FMUL means that a new FMUL instruction can start executing 2 clock cycles after a
previous FMUL. A reciprocal throughput of 0.33 for ADD means that the execution
units can handle 3 integer additions per clock cycle.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
Nah its even better than you think. It beats multithreaded juddperft with a single thread. It beats qperft with hash enabled without hashing.phhnguyen wrote: ↑Thu Sep 23, 2021 4:58 am Interesting. I see your Perft is a bit faster than QPerft. QPerft has been considered the fastest one in the mailbox world.
For better understanding, can you post the Perft result of Stockfish on your hardware? SF uses magic bitboard (may be slow and not highly optimized for Perft) and doesn't use hash table for Perft too. On my computer, SF is a bit faster than QPerft. You may try several ways to compile SF for better results.
My original Idea was how to remove the movelist and still have a inlineable callback for each movetype.
Which means the function that visits every move is programmable and inlined and the entry point for more recursion (heuristics could stop expanding almost for free)
Also White and Black is a Template parameter and not a boolean -> So if constexpr White (Pawns >> 8) dont have any ifs at runtime.
Code will be on github soon! First I want to update the whitepaper and finish my article on codeproject! Idk if I even should make the code public - it was 2 years of thinking work.
Execute with Powershell: Measure-Command {./stockfish_14_x64_popcnt.exe go perft 7}
TotalMilliseconds : 9491.0305
Execute with cmd:
giga "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 7
Depth: 7 - rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Perft 1: 20 0ms 1.25 MNodes/s
Perft 2: 400 0ms 30.7692 MNodes/s
Perft 3: 8902 0ms 189.404 MNodes/s
Perft 4: 197281 0ms 590.662 MNodes/s
Perft 5: 4865609 5ms 886.429 MNodes/s
Perft 6: 119060324 128ms 929.077 MNodes/s
Perft 7: 3195901860 3118ms 1024.74 MNodes/s
Why is Perft7 50% slower in terms of nps than kiwipete? Because of the many many Enpassants. These get "if constexpr" compiled away if no target exists.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
What else would you want to do with it?dangi12012 wrote: ↑Thu Sep 23, 2021 11:11 am Code will be on github soon! First I want to update the whitepaper and finish my article on codeproject! Idk if I even should make the code public - it was 2 years of thinking work.
I guess most chess-programmers here will appreciate your innovation more if they can see the source and make their own builds to test.
-
- Posts: 1524
- Joined: Wed Apr 21, 2010 4:58 am
- Location: Australia
- Full name: Nguyen Hong Pham
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
Frankly speaking, I don’t think you can get much speed up just by removing the movelist and/or some callback/inline/trick optimisations. Of course that is from my experience but your program may be different. Thus I wait for more data, information and code!dangi12012 wrote: ↑Thu Sep 23, 2021 11:11 am
Nah its even better than you think. It beats multithreaded juddperft with a single thread. It beats qperft with hash enabled without hashing.
My original Idea was how to remove the movelist and still have a inlineable callback for each movetype.
Impressed speed!dangi12012 wrote: ↑Thu Sep 23, 2021 11:11 am
Which means the function that visits every move is programmable and inlined and the entry point for more recursion (heuristics could stop expanding almost for free)
Also White and Black is a Template parameter and not a boolean -> So if constexpr White (Pawns >> 8) dont have any ifs at runtime.
Code will be on github soon! First I want to update the whitepaper and finish my article on codeproject! Idk if I even should make the code public - it was 2 years of thinking work.
Execute with Powershell: Measure-Command {./stockfish_14_x64_popcnt.exe go perft 7}
TotalMilliseconds : 9491.0305
Execute with cmd:
giga "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1" 7
Depth: 7 - rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Perft 1: 20 0ms 1.25 MNodes/s
Perft 2: 400 0ms 30.7692 MNodes/s
Perft 3: 8902 0ms 189.404 MNodes/s
Perft 4: 197281 0ms 590.662 MNodes/s
Perft 5: 4865609 5ms 886.429 MNodes/s
Perft 6: 119060324 128ms 929.077 MNodes/s
Perft 7: 3195901860 3118ms 1024.74 MNodes/s
Why is Perft7 50% slower in terms of nps than kiwipete? Because of the many many Enpassants. These get "if constexpr" compiled away if no target exists.
Can you confirm with above Perft 7 result if your program used:
* hash?
* multi threads (how many threads)?
If it used hash and/or more than 1 thread, please post a new result of 1 thread, no hash, just for comparison. Thanks
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
The most features chess GUI, based on opensource Banksia - the chess tournament manager
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Gigantua: 1.5 Giganodes per Second per Core move generator
I can confirm: Single ThreadImpressed speed!
Can you confirm with above Perft 7 result if your program used:
* hash?
* multi threads (how many threads)?
If it used hash and/or more than 1 thread, please post a new result of 1 thread, no hash, just for comparison. Thanks
I can confirm: No hashtable
I guess for singlethread you can also check in taskmanager

Both statements you can also verify once the sourcecode is out
Think about how a human plays chess. If something changes on the board you dont recalculate every move of an unaffected piece. Also if you cant castle you wont think about castling at all. But most engines have at least if and & instruction for that.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer