Sesse wrote: ↑Wed Apr 08, 2020 12:52 pm
Are you sure it's not being inlined?
Any profiler worth its salt should allow you to go in and look at individual lines or instructions these days, so you should be able to peek there.
Hi

I'm going to investigate this by dropping optimizations down. Rust is VERY aggressive with inlining when you have maximum optimizations set. I'll first disable LTO, and then, if necessary, start dropping opt-level down from 3. I'm assuming that if a non-optimized function pinpoints line X as being the slowest, that it'll also be the slowest in the optimized function.
I've discovered a few things though, yesterday evening. "if"-statements in Rust can be very expensive. For example:
Code: Select all
let x = <do some calculation and comparisons that seem to be very heavy... and might deliver 0 as the end result>
After some thinking, I reach the conclusion that I need to do the calculation only in a certain condition; otherwise, it will always be 0 as a result. So I then write:
Code: Select all
let mut x = 0;
if <some condition here> {
x = <heavy calculation>
}
Result: slower code. (One of those constructions increased my perft 7 time from 114 seconds to 120...) I have seen this in more than one place. Very often, at least in Rust, it seems that it is faster to just do perform X (possibly for nothing), than to first check if you should go and do it. The check is more expensive than actually performing X for nothing.
So, I went through some of the functions in the move generator and regrouped some things to minimize IF-checks, and then also inlined square_attacked(). (In Rust, on the highest optimization settings, the compiler decides on inlining, except if you state explicity always/never for a function).
This regrouping and inlining dropped perft 7 from the starting position from 114 to 100 seconds. So that's a 13% speedup right there, without actually changing any code or logic. square_attacked() and add_move() completely dropped off the proiler's radar.
When I first got Perft working, it ran at 277 seconds for perft 7. (Without hash, without bulk-counting, or special move generators.) Now, some time later, it runs at 100 seconds, still without hash, bulk-counting or special move generators. That's a 64% speedup, without actually changing any logic or adding tricks or functionality; it's just a matter of replacing technique X with Y, regrouping, inlining a function, and cutting out superfluous/double IF-checks.
I'll start having a look at less strongly optimized code to see if I can get some more information from the profiler.
Some of the progress was made through direct help by this community; even if there wasn't help like 'try this or that', posts often contained usable hints or idea's in the right direction. Thanks people
