My Perft Results

JoAnnP38 · Post by **JoAnnP38** » Thu Nov 03, 2022 7:44 pm

In writing a chess engine for mostly the first time, I finally have my move generation somewhat complete and so I threw together a Perft test to compare my results with others. (Thanks, by the way for those who were nice enough to post node counts for others to compare.) Here are some of my timings for tests up to 6 ply and I was hoping that someone could weigh in on whether I am on the right track or whether I still have a lot of optimization or redesign to do. Currently, I am writing my engine in C#. It is single threaded running on an AMD Ryzen 5 4500U processor. My design currently uses a simple 64 element array as the board, and I am using piece lists to iterate through the pieces during move generation. For simplicity's sake, I am currently dynamically allocating a list to hold all of the generated moves, and this happens on every call to GenerateMoves. I suspect this needs to be changed to use a single array that is shared by all moves to eliminate the heap allocations. But I don't have concrete evidence of that yet.

Results:
1: Elapsed = 00:00:00.0000166
2: Elapsed = 00:00:00.0002136
3: Elapsed = 00:00:00.0286717
4: Elapsed = 00:00:00.1139569
5: Elapsed = 00:00:02.8700864
6: Elapsed = 00:01:01.1239183

While the node counts are all "correct" for these tests, I have to admit I am surprised by the exponential explosion in timings, especially when going from 4 to 5 or 6 ply. All constructive comments welcome.

Mike Sherwin · Post by **Mike Sherwin** » Thu Nov 03, 2022 10:56 pm

Your progression of times looks normal because after the first couple depths the branching factor looks to be between 30 and 40 which is what it should be.

Perft measurement itself has its mysteries. Many authors report n/s without specifying which speedup tricks were used or not used.

You are using C# and from what I understand about C# is that if it is poorly written it can run extremely slow. On the other hand lithander (Thomas Jahn) got node rates in his engine Leorik that some of us did not believe possible using C#. I read recently that some C#'s do not have a JIT compiler and a couple of newer ones do.

The bottom line though is that 1 minute for perft 6 is in the ballpark of 120 times slower than the fastest claims. My perft 6 takes about 3.5 seconds. It is written in C. And it does not use the fastest move generator, magic. Mine uses no tricks and another's that uses no tricks runs 50% faster than mine, but he uses magic. Crafty also uses magic and does perft 6 in 2 seconds. I don't think Bob uses any tricks because his perft has always cared about accuracy and not speed.

Good luck with your engine!

JoAnnP38 · Post by **JoAnnP38** » Fri Nov 04, 2022 12:11 am

Thank you for the reply! And also, wow! It seems I have a lot of catching up to do! While I did put some effort into the design leveraging as many techniques as I could to speed things up, I'm sure I have a lot of tricks to learn. Being able to process 6 ply in 3.5 seconds is awesome!! I do believe that if I avoid some of the pitfalls of using a garbage collected language, I should be able to approach the speed of a compiled language like C/C++. Well, at least in comparison to a typical developer. I don't expect to compete with some of the code geniuses I've run into. But luckily if I can get my results approaching something reasonable, I'll be happy. My main focus (or tentative goal I should probably say) will be developing a self-learning, 2nd or 3rd order evaluation function that uses a genetic algorithm to implement self-optimization. I don't know much about neural networks at the moment, so I prefer to stay clear of them for now.

I guess I'm going to try some simple optimizations on my move generation for now without resorting to throwing everything away and redesign using bitboards. But hey, who knows? I could end up there when all is said and done. For now, I'm going to refactor to encode my moves into an integer and to introduce an object pool to handle my move lists. Hopefully, I can see some improvement. Thanks again for your comments. It's better to know you have your work cut out for you than not!

Ras · Post by **Ras** » Fri Nov 04, 2022 12:44 am

JoAnnP38 wrote: ↑Thu Nov 03, 2022 7:44 pmMy design currently uses a simple 64 element array as the board, and I am using piece lists to iterate through the pieces during move generation.

Similarly for my engine except for 10x12 mailbox and the ability to check the legality for non-king, non-check-evasion moves without making them. Perft 6 for the starting position in 2.08s on a 4700U, single-threaded in C, not using hash tables for perft. The exponential explosion is normal - perft 7 takes me 54.6s already.

For simplicity's sake, I am currently dynamically allocating a list to hold all of the generated moves

The only place for heap allocation in a chess engine should be during initialisation or configuration, in particular for the hash tables. During engine run, there should be no heap alloc/free at all. I use a fixed array length for the moves, allocated on the stack.

JoAnnP38 · Post by **JoAnnP38** » Fri Nov 04, 2022 1:42 am

Ras wrote: ↑Fri Nov 04, 2022 12:44 am Similarly for my engine except for 10x12 mailbox and the ability to check the legality for non-king, non-check-evasion moves without making them. Perft 6 for the starting position in 2.08s on a 4700U, single-threaded in C, not using hash tables for perft. The exponential explosion is normal - perft 7 takes me 54.6s already.

That sounds fantastic! I am far from there at the moment. I am hoping my next round of optimizations will show some improvements; however, I am making these changes based on experience not on the results of a performance profiler. I'll need to zero in on my real bottle necks or Amdahl's Law is going to bite me in my (well you know).

Ras wrote: ↑Fri Nov 04, 2022 12:44 am The only place for heap allocation in a chess engine should be during initialisation or configuration, in particular for the hash tables. During engine run, there should be no heap alloc/free at all. I use a fixed array length for the moves, allocated on the stack.

I hear ya. I knew that when I started, but at that time I was favoring correctness over performance. But now seeing how far I'm off from where I want to be it looks like I made the wrong choice. Live and learn as they say.

I am still going to make a little bit of a compromise here by creating a pool of lists that can be checked in and out of the pool as needed, but they will never be deallocated. Since I won't ever need more lists than the number of ply I'm searching I probably only need 20 lists in the pool each capable of storing 100 moves. And since my moves will now be encoded into a 32-bit integer that's only around 400 bytes per list or about 8K. This won't be as efficient as using one 8K sized array and maintaining indices into the array for each ply, but it should be a lot better than what I have now. Tiny steps... tiny steps...

Iketh · Post by **Iketh** » Fri Nov 04, 2022 7:34 am

The good news is c# can achieve ~90% of the speed of c++ when working only with the stack and pointers, so don’t worry about the language right now. When you’ve done all you can in c# speed-wise, the code should almost copy/paste into c++. I got a ray tracer within 6% of c++ in c# (no PGO in both) and my chess engine within 10% (PGO in c#). .net 7 is 6.2% faster than .net 6 in my chess engine so that’ll be another speed bump for you when it’s released

hgm · Post by **hgm** » Fri Nov 04, 2022 9:19 am

Mike Sherwin wrote: ↑Thu Nov 03, 2022 10:56 pmMy perft 6 takes about 3.5 seconds. It is written in C. And it does not use the fastest move generator, magic. Mine uses no tricks and another's that uses no tricks runs 50% faster than mine, but he uses magic. Crafty also uses magic and does perft 6 in 2 seconds. I don't think Bob uses any tricks because his perft has always cared about accuracy and not speed.

Qperft uses mailbox, and it calculates perft(6) in 0.7 sec on my 3.2GHz i7.

Joost Buijs · Post by **Joost Buijs** » Fri Nov 04, 2022 10:13 am

Nightmare uses C++ with bit-boards and magics, it calculates perft(6) in 0.79 seconds on my 3 GHz. i9-10980XE.
This is with the standard move generator and move logic that it also uses during the search, so it does some extra work that is not needed to calculate Perft().

Code: Select all

 r n b q k b n r
 * * * * * * * *
 . . . . . . . .
 . . . . . . . .
 . . . . . . . .
 . . . . . . . .
 + + + + + + + +
 R N B Q K B N R

....................
perft: 6
moves: 119060324 ok!
time: 0.790838 sec.
mps: 150549555

dangi12012 · Post by **dangi12012** » Fri Nov 04, 2022 2:32 pm

Optimized Bitboard approach:

Perft 6: 119060324 88ms
Perft 7: 3195901860 2225ms

Joost Buijs · Post by **Joost Buijs** » Fri Nov 04, 2022 3:04 pm

dangi12012 wrote: ↑Fri Nov 04, 2022 2:32 pm Optimized Bitboard approach:

Perft 6: 119060324 88ms
Perft 7: 3195901860 2225ms

Ankan's GPU Perft is still faster than this, and that is on a mere GTX-780.

http://www.talkchess.com/forum3/viewtopic.php?t=48387

I wonder what it would do on a RTX-4090Ti.

My Perft Results

My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results

Re: My Perft Results