Can we learn anything from PGO compiled executables?
Does the compilation process produce any revised source code? Can we decompile a PGO generated executable and see what changes the compiler made and use this for future projects?
Can we learn anything from PGO compiled executables?
Moderators: hgm, Rebel, chrisw
-
- Posts: 484
- Joined: Wed Nov 18, 2009 1:09 am
-
- Posts: 778
- Joined: Sat Jul 01, 2006 7:11 am
Re: Can we learn anything from PGO compiled executables?
I believe that the main point of PGO is to make the program more cache friendly, e.g. moving code around so that function entries and jump targets for the most used code are cache aligned and do not map to the same cache locations. I would much rather let the compiler do this.mhalstern wrote:Can we learn anything from PGO compiled executables?
Does the compilation process produce any revised source code? Can we decompile a PGO generated executable and see what changes the compiler made and use this for future projects?
-
- Posts: 388
- Joined: Sun Dec 21, 2008 6:57 pm
- Location: Washington, DC
Re: Can we learn anything from PGO compiled executables?
I have no answers, but while on the subject of PGO, I'm curious about people's experience with this...
Has anyone had significant (measurable) benefit from PGO? If so, what compiler were you using, and what was your procedure for running the instrumented build to generate the profile data?
Has anyone had significant (measurable) benefit from PGO? If so, what compiler were you using, and what was your procedure for running the instrumented build to generate the profile data?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Can we learn anything from PGO compiled executables?
PGO is only about branches. The idea is this.mhalstern wrote:Can we learn anything from PGO compiled executables?
Does the compilation process produce any revised source code? Can we decompile a PGO generated executable and see what changes the compiler made and use this for future projects?
Given this C statement:
if (condition) {
}
The issue becomes, is "condition" true most of the time or not? If it is true, then the above code is optimal. If it is false, then you want to change it to this:
if (!condition) go to boondocks;
return_from_boondocks:
and at label boondocks you take the above block of code (inside the braces) and follow that with a jmp back_from_boondocks.
The idea is this. If the branch is true most of the time, you execute that code most of the time, and since it is in sequential memory addresses, cache block fills will be correctly pre-fetching things you actually need. But if it is false most of the time, you fetch that code (in the original example) and then skip around it. By moving that code elsewhere, and only jumping to it on the less common case where condition is false, you increase cache hits. Yes, you can see that if you ask for assembly output. I don't think you can get the modified C, however, but I might be wrong since I have not tried in years.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Can we learn anything from PGO compiled executables?
I have had excellent results with Intel's C++ compiler. gcc produces good results when it works, but I have had many versions of gcc that would simply crash and burn when profiling Crafty. Particularly when I try to profile a parallel search run so that all the parallel stuff gets optimized as well. This often produces corrupted PGO profile data files which breaks the compiler.Greg Strong wrote:I have no answers, but while on the subject of PGO, I'm curious about people's experience with this...
Has anyone had significant (measurable) benefit from PGO? If so, what compiler were you using, and what was your procedure for running the instrumented build to generate the profile data?
-
- Posts: 351
- Joined: Sat Apr 01, 2006 2:03 am
Re: Can we learn anything from PGO compiled executables?
Right. One other thing worth mentioning is that modern instruction pipelines rarely for a branch condition to be evaluated. It will instead predict taken or not taken and do a rollback if the branch condition was predicted incorrectly. Thus, predicting a branch incorrectly entails a costly rollback.bob wrote: The idea is this. If the branch is true most of the time, you execute that code most of the time, and since it is in sequential memory addresses, cache block fills will be correctly pre-fetching things you actually need. But if it is false most of the time, you fetch that code (in the original example) and then skip around it. By moving that code elsewhere, and only jumping to it on the less common case where condition is false, you increase cache hits. Yes, you can see that if you ask for assembly output. I don't think you can get the modified C, however, but I might be wrong since I have not tried in years.
PGO helps moderate this problem by doing a statistical analysis of branches as Bob points out.
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Can we learn anything from PGO compiled executables?
PGO also allows the compiler inline those few functions where it really matters.
This saves caller overhead (when inlining) but also I-cache space (when *not* inlining because its not needed).
This saves caller overhead (when inlining) but also I-cache space (when *not* inlining because its not needed).
-
- Posts: 484
- Joined: Wed Nov 18, 2009 1:09 am
Re: Can we learn anything from PGO compiled executables?
This is interesting:
What if I have a condition that is only met if there are more than 2 queens on the board. This happens rarely. What if profiling the compile, I gave then engine 100 test positions with 4 queens on the board. Would the compiled code assume that the condition were met initially, and have to take the time to rollback?
In any event, if not using PGO, are there always options to disable these branch predictions?
What if I have a condition that is only met if there are more than 2 queens on the board. This happens rarely. What if profiling the compile, I gave then engine 100 test positions with 4 queens on the board. Would the compiled code assume that the condition were met initially, and have to take the time to rollback?
In any event, if not using PGO, are there always options to disable these branch predictions?
-
- Posts: 351
- Joined: Sat Apr 01, 2006 2:03 am
Re: Can we learn anything from PGO compiled executables?
For the first question, I think the answer is a yes. The compiler will rollback if pathological data was fed in the profile phase. As C/C++ generates assembly code, you are forced to live with worse branch prediction. However, the impact may not be that bad as the code to check whether there are 2 queens on the board is probably rare and thus you will waste CPU cycles only in rare cases.mhalstern wrote:This is interesting:
What if I have a condition that is only met if there are more than 2 queens on the board. This happens rarely. What if profiling the compile, I gave then engine 100 test positions with 4 queens on the board. Would the compiled code assume that the condition were met initially, and have to take the time to rollback?
In any event, if not using PGO, are there always options to disable these branch predictions?
That being said, there is a hardware level dynamic branch prediction available on modern CPUs which can be leveraged to improve over badly profiled PGO data. An example is the "branch whether hint" on the itanium processor. These are not used often yet.
If you do not use PGO, the compiler typically assumes that every branch has a 50/50 likelihood. However, there are also optimizations in this case for loops, string comparisons etc but the typical branch will be taken about 50% of the time.
-
- Posts: 351
- Joined: Sat Apr 01, 2006 2:03 am
Re: Can we learn anything from PGO compiled executables?
I meant to add in my previous post that branch prediction is done in hardware anyway (with a small branch prediction cache). Thus, if the pathological PGO seed data is bad, the hardware branch predictor could come to the rescue, but one should not depend on it.
PGO normally helps optimize the hardware branch predictor not the other way round.
PGO normally helps optimize the hardware branch predictor not the other way round.