lithander wrote: ↑Sun Sep 19, 2021 9:29 pm
It's not that much work since the original engine I based this on was already quite small and I stripped everything from the source that isn't strictly needed for perft.
Would it be worth it? I think so. I think it's interesting to see how the bitboard algorithms look implemented in different languages. If we port the code into more languages this could become something like a Rosetta Code task but with a workload that's much more relevant for chess programmers and with a focus on performance.
I'll take a look at it, if someone doesn't beat me to it. I've just finally picked up Rustic's development again, after a hectic month at work, and a month of kitten wrangling. I'm finally testing and integrating some of the refactors I wrote almost three months ago. (More consistent / nomenclature / naming for the TT, shortening / un-C-ing the code a bit where possible, and converting it to full fail-soft instead of a strange mostly-fail-soft-but-fail-hard-in-practice mix.) Then I'd finally be happy with the code, so I can write the PST-tuner. (And then a board->FEN convertor and dump EPD's to tune on my own data in the future.)
If I think this port doesn't distract me for six weeks, I may try it.
Which explains my interest in the topic! But my results indicate that I don't shoot myself in the knee too much by sticking with C# so that's nice.
Don't mix up memes and proverbs. You either take an arrow in the knee, or shoot yourself in the foot
mvanthoor wrote: ↑Sun Sep 19, 2021 9:21 pm
Oh, and you're now in the camp of optimizing code for speed. You'll probably write another engine in C#, find out it's about 70 Elo stronger than MinimalChess 0.6 with the same feature set, and THEN you'll wonder how strong it could have been if you had written it in C or Rust....
Sounds like the story of my life only thing that's missing is the final step: once you've written the C version, you will wonder how strong it could have been had you not written it in a hurry and rewrite it from scratch
lithander wrote: ↑Sun Sep 19, 2021 9:11 pm
I've made 3 more commits with optimizations and dedicated a release on github with explanations of what changed, a link to the commit and my performance measurement of this version. You can find all the details here: https://github.com/lithander/QBB-Perft/releases
If you're just interested in a summary:
Version 1.0: 60555ms to complete, about 22M NPS.
Version 1.1: 54599ms to complete, about 25M NPS.
Version 1.2: 40298ms to complete, about 33M NPS.
Version 1.3: 31351ms to complete, about 42M NPS. Version 1.4: 28132ms to complete, about 48M NPS.
The C reference takes 16837ms to complete at about 70M NPS. So the C# code as of version 1.4 is only 30% slower.
I'm also quite happy that I didn't have to resort to any dirty tricks, yet.
...and of course it doesn't have to stop here! For example R.Tomasi contributed the idea of using preallocated arrays for the movelist. My own ideas (stackalloc & Span<T>) weren't quite as fast and I'm glad I had his version to set the bar for what should theoretically be possible and inspire the final implementation. Together I'm sure we can find more things to improve upon.
Very impressive. But from the comments on the release page, i see that with the last two commits, you have now moved from optimizing the data structures and other language dependant things, to changing some of the algorithms and how they function. So unless the C compiler really does optimize these functions to the same level, its no longer a fair comparison, if the C version has to do extra work compared to C#.
klx wrote: ↑Sun Sep 19, 2021 5:30 pm
Just ported the code to Java. ~1.8x slower than C for now. Will share the code in a bit so you can add it to your repo Thomas. Now it's someone else's turn to do Rust and Go
How much work is this? I might take a look for a Rust-port, but I'm not sure if it's worth it.
In my case about 1 hour for the port (making it compile basically), and another half hour for fixing some bugs to make the numbers match (eg double shift unsigned >> needs to be triple shift in Java). The final Java code is very similar to the C version though, if you have to do a lot of syntactic changes it might take a little longer.
[Moderation warning] This signature violated the rule against commercial exhortations.
lithander wrote: ↑Sun Sep 19, 2021 9:11 pm
I've made 3 more commits with optimizations and dedicated a release on github with explanations of what changed, a link to the commit and my performance measurement of this version. You can find all the details here: https://github.com/lithander/QBB-Perft/releases
If you're just interested in a summary:
Version 1.0: 60555ms to complete, about 22M NPS.
Version 1.1: 54599ms to complete, about 25M NPS.
Version 1.2: 40298ms to complete, about 33M NPS.
Version 1.3: 31351ms to complete, about 42M NPS. Version 1.4: 28132ms to complete, about 48M NPS.
The C reference takes 16837ms to complete at about 70M NPS. So the C# code as of version 1.4 is only 30% slower.
I'm also quite happy that I didn't have to resort to any dirty tricks, yet.
...and of course it doesn't have to stop here! For example R.Tomasi contributed the idea of using preallocated arrays for the movelist. My own ideas (stackalloc & Span<T>) weren't quite as fast and I'm glad I had his version to set the bar for what should theoretically be possible and inspire the final implementation. Together I'm sure we can find more things to improve upon.
Very impressive. But from the comments on the release page, i see that with the last two commits, you have now moved from optimizing the data structures and other language dependant things, to changing some of the algorithms and how they function. So unless the C compiler really does optimize these functions to the same level, its no longer a fair comparison, if the C version has to do extra work compared to C#.
Even if I'm on team C# on this one, I will change the reference C version to get the same changes. This redundant LSB thingy is still buggin me - I suspect it's an optimization that only shows a strong effect on some CPUs, so I'd like to have it. To make the comparision fair, it should also be done in the C version. So when I'm onto that, I'll implement the algorithmic changes into the C version while I'm at it.
Thing is, I'm on vacation and my gf won't be too happy if I spend the whole day coding, so bear with me if it might take me a day or two.