Steve Maughan wrote:I decided to explore Maverick's code through the lens of a profiler. My native tongue (Delphi) isn't strong in this area, so this is new ground for me.
The output seemed a little odd and I wonder if a compiler / profiling Ninja may be able to shed some light.
Here's the output:
[...]
Most of the procedures look familiar. But there are two which, as far as I know, are not in my code - these are "_mcount_private" and "__fentry__". And they are sucking up a lot of CPU juice.
Does anyone know what they are? And the obvious follow-on question, is there any way to reduce their load?
Many thanks,
Steve
These two functions are obviously internal functions of gprof. "_mcount_private" definitely is, and "__fentry__" looks like that as well. gprof uses instrumentation as well as sampling, where instrumentation means that some code is added to the program to achieve the profiling functionality. That new code is not for free, you will always see a (more or less significant) overhead when using an instrumenting profiler. An overhead of more than 30% is not unusual. To obtain your "real" numbers you will have to filter out those internal functions appropriately by subtracting their total time values from the overall sum and then recalculating all relevant percent values.
Regarding your profiling results, I think that you might want to have a look at your legality check implementation. Your numbers were from profiling "perft" so they are not fully representative for an actual search (which would be most relevant), but one can already see that your legality checking (presumably covered by a function named like "is_in_check_after_move()") consumes almost as much time as the whole move generation.
So my advice would be:
1. Do everything that needs to be done to get a stable engine first, of course.
2. Then do everything to get a strong engine ...
3. And then, when you actually want to start optimizing the performance of your engine, please consider to improve your legality checking as follows.
a) You can safely skip legality testing of pseudo-legal moves that belong to none of these categories:
- check evasions,
- king moves (including castles),
- en passant captures,
- moves of a pinned piece.
Only these can ever be illegal, and statistically they are quite rare in total, so you can actually save a lot of redundant legality testing.
b) For moves from the categories above, it is not always necessary to perform a "makeMove - legality test - unmakeMove" sequence, it can be possible to do a "static" legality test as well that saves the make/unmake overhead.
Please note, however, that any optimization on that level, especially in the area of basic operations like move generation/legality checking, tends to produce the following two typical outcomes:
1. introduction of new bugs,
2. a performance improvement of only very few percent for the real search.
One reason for the latter is that move generation itself usually consumes only a small fraction of the overall search time (10% or less might be a typical number) so any relative improvement of that component gets divided by a factor of 10 or more. Compare 10/100 to 8/98, that is what you can expect from a 20% speedup of the move generator. (All numbers are just examples.)
Sven