In writing a chess engine for mostly the first time, I finally have my move generation somewhat complete and so I threw together a Perft test to compare my results with others. (Thanks, by the way for those who were nice enough to post node counts for others to compare.) Here are some of my timings for tests up to 6 ply and I was hoping that someone could weigh in on whether I am on the right track or whether I still have a lot of optimization or redesign to do. Currently, I am writing my engine in C#. It is single threaded running on an AMD Ryzen 5 4500U processor. My design currently uses a simple 64 element array as the board, and I am using piece lists to iterate through the pieces during move generation. For simplicity's sake, I am currently dynamically allocating a list to hold all of the generated moves, and this happens on every call to GenerateMoves. I suspect this needs to be changed to use a single array that is shared by all moves to eliminate the heap allocations. But I don't have concrete evidence of that yet.
Results:
1: Elapsed = 00:00:00.0000166
2: Elapsed = 00:00:00.0002136
3: Elapsed = 00:00:00.0286717
4: Elapsed = 00:00:00.1139569
5: Elapsed = 00:00:02.8700864
6: Elapsed = 00:01:01.1239183
While the node counts are all "correct" for these tests, I have to admit I am surprised by the exponential explosion in timings, especially when going from 4 to 5 or 6 ply. All constructive comments welcome.
My Perft Results
Moderators: hgm, Rebel, chrisw
-
- Posts: 245
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
-
- Posts: 868
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: My Perft Results
Your progression of times looks normal because after the first couple depths the branching factor looks to be between 30 and 40 which is what it should be.
Perft measurement itself has its mysteries. Many authors report n/s without specifying which speedup tricks were used or not used.
You are using C# and from what I understand about C# is that if it is poorly written it can run extremely slow. On the other hand lithander (Thomas Jahn) got node rates in his engine Leorik that some of us did not believe possible using C#. I read recently that some C#'s do not have a JIT compiler and a couple of newer ones do.
The bottom line though is that 1 minute for perft 6 is in the ballpark of 120 times slower than the fastest claims. My perft 6 takes about 3.5 seconds. It is written in C. And it does not use the fastest move generator, magic. Mine uses no tricks and another's that uses no tricks runs 50% faster than mine, but he uses magic. Crafty also uses magic and does perft 6 in 2 seconds. I don't think Bob uses any tricks because his perft has always cared about accuracy and not speed.
Good luck with your engine!
Perft measurement itself has its mysteries. Many authors report n/s without specifying which speedup tricks were used or not used.
You are using C# and from what I understand about C# is that if it is poorly written it can run extremely slow. On the other hand lithander (Thomas Jahn) got node rates in his engine Leorik that some of us did not believe possible using C#. I read recently that some C#'s do not have a JIT compiler and a couple of newer ones do.
The bottom line though is that 1 minute for perft 6 is in the ballpark of 120 times slower than the fastest claims. My perft 6 takes about 3.5 seconds. It is written in C. And it does not use the fastest move generator, magic. Mine uses no tricks and another's that uses no tricks runs 50% faster than mine, but he uses magic. Crafty also uses magic and does perft 6 in 2 seconds. I don't think Bob uses any tricks because his perft has always cared about accuracy and not speed.
Good luck with your engine!
-
- Posts: 245
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: My Perft Results
Thank you for the reply! And also, wow! It seems I have a lot of catching up to do! While I did put some effort into the design leveraging as many techniques as I could to speed things up, I'm sure I have a lot of tricks to learn. Being able to process 6 ply in 3.5 seconds is awesome!! I do believe that if I avoid some of the pitfalls of using a garbage collected language, I should be able to approach the speed of a compiled language like C/C++. Well, at least in comparison to a typical developer. I don't expect to compete with some of the code geniuses I've run into. But luckily if I can get my results approaching something reasonable, I'll be happy. My main focus (or tentative goal I should probably say) will be developing a self-learning, 2nd or 3rd order evaluation function that uses a genetic algorithm to implement self-optimization. I don't know much about neural networks at the moment, so I prefer to stay clear of them for now.
I guess I'm going to try some simple optimizations on my move generation for now without resorting to throwing everything away and redesign using bitboards. But hey, who knows? I could end up there when all is said and done. For now, I'm going to refactor to encode my moves into an integer and to introduce an object pool to handle my move lists. Hopefully, I can see some improvement. Thanks again for your comments. It's better to know you have your work cut out for you than not!
I guess I'm going to try some simple optimizations on my move generation for now without resorting to throwing everything away and redesign using bitboards. But hey, who knows? I could end up there when all is said and done. For now, I'm going to refactor to encode my moves into an integer and to introduce an object pool to handle my move lists. Hopefully, I can see some improvement. Thanks again for your comments. It's better to know you have your work cut out for you than not!
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: My Perft Results
Similarly for my engine except for 10x12 mailbox and the ability to check the legality for non-king, non-check-evasion moves without making them. Perft 6 for the starting position in 2.08s on a 4700U, single-threaded in C, not using hash tables for perft. The exponential explosion is normal - perft 7 takes me 54.6s already.
The only place for heap allocation in a chess engine should be during initialisation or configuration, in particular for the hash tables. During engine run, there should be no heap alloc/free at all. I use a fixed array length for the moves, allocated on the stack.For simplicity's sake, I am currently dynamically allocating a list to hold all of the generated moves
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 245
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: My Perft Results
That sounds fantastic! I am far from there at the moment. I am hoping my next round of optimizations will show some improvements; however, I am making these changes based on experience not on the results of a performance profiler. I'll need to zero in on my real bottle necks or Amdahl's Law is going to bite me in my (well you know).Ras wrote: ↑Fri Nov 04, 2022 12:44 am Similarly for my engine except for 10x12 mailbox and the ability to check the legality for non-king, non-check-evasion moves without making them. Perft 6 for the starting position in 2.08s on a 4700U, single-threaded in C, not using hash tables for perft. The exponential explosion is normal - perft 7 takes me 54.6s already.
I hear ya. I knew that when I started, but at that time I was favoring correctness over performance. But now seeing how far I'm off from where I want to be it looks like I made the wrong choice. Live and learn as they say. I am still going to make a little bit of a compromise here by creating a pool of lists that can be checked in and out of the pool as needed, but they will never be deallocated. Since I won't ever need more lists than the number of ply I'm searching I probably only need 20 lists in the pool each capable of storing 100 moves. And since my moves will now be encoded into a 32-bit integer that's only around 400 bytes per list or about 8K. This won't be as efficient as using one 8K sized array and maintaining indices into the array for each ply, but it should be a lot better than what I have now. Tiny steps... tiny steps...Ras wrote: ↑Fri Nov 04, 2022 12:44 am The only place for heap allocation in a chess engine should be during initialisation or configuration, in particular for the hash tables. During engine run, there should be no heap alloc/free at all. I use a fixed array length for the moves, allocated on the stack.
-
- Posts: 3
- Joined: Fri Oct 28, 2022 6:33 am
- Full name: Keith Downes
Re: My Perft Results
The good news is c# can achieve ~90% of the speed of c++ when working only with the stack and pointers, so don’t worry about the language right now. When you’ve done all you can in c# speed-wise, the code should almost copy/paste into c++. I got a ray tracer within 6% of c++ in c# (no PGO in both) and my chess engine within 10% (PGO in c#). .net 7 is 6.2% faster than .net 6 in my chess engine so that’ll be another speed bump for you when it’s released
-
- Posts: 27833
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: My Perft Results
Qperft uses mailbox, and it calculates perft(6) in 0.7 sec on my 3.2GHz i7.Mike Sherwin wrote: ↑Thu Nov 03, 2022 10:56 pmMy perft 6 takes about 3.5 seconds. It is written in C. And it does not use the fastest move generator, magic. Mine uses no tricks and another's that uses no tricks runs 50% faster than mine, but he uses magic. Crafty also uses magic and does perft 6 in 2 seconds. I don't think Bob uses any tricks because his perft has always cared about accuracy and not speed.
-
- Posts: 1565
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: My Perft Results
Nightmare uses C++ with bit-boards and magics, it calculates perft(6) in 0.79 seconds on my 3 GHz. i9-10980XE.
This is with the standard move generator and move logic that it also uses during the search, so it does some extra work that is not needed to calculate Perft().
This is with the standard move generator and move logic that it also uses during the search, so it does some extra work that is not needed to calculate Perft().
Code: Select all
r n b q k b n r
* * * * * * * *
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
+ + + + + + + +
R N B Q K B N R
....................
perft: 6
moves: 119060324 ok!
time: 0.790838 sec.
mps: 150549555
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: My Perft Results
Optimized Bitboard approach:
Perft 6: 119060324 88ms
Perft 7: 3195901860 2225ms
Perft 6: 119060324 88ms
Perft 7: 3195901860 2225ms
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1565
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: My Perft Results
Ankan's GPU Perft is still faster than this, and that is on a mere GTX-780.dangi12012 wrote: ↑Fri Nov 04, 2022 2:32 pm Optimized Bitboard approach:
Perft 6: 119060324 88ms
Perft 7: 3195901860 2225ms
http://www.talkchess.com/forum3/viewtopic.php?t=48387
I wonder what it would do on a RTX-4090Ti.