Can we learn anything from PGO compiled executables?

bob · Post by **bob** » Wed Dec 02, 2009 8:01 am

shiv wrote:I meant to add in my previous post that branch prediction is done in hardware anyway (with a small branch prediction cache). Thus, if the pathological PGO seed data is bad, the hardware branch predictor could come to the rescue, but one should not depend on it.

PGO normally helps optimize the hardware branch predictor not the other way round.

It doesn't really help with branch prediction. It tries to move code out of the sequential access pattern if you skip it frequently, so that it doesn't get prefetched into a cache block and then not used. PGO is really an attempt to determine how branches are executed, and then is used to re-organize the machine code as mentioned above. Branch prediction is what it is, and what it is is _very_ good on today's processors. But memory bandwidth and latency is a big issue, and pre-fetching using 32/64 byte cache blocks is a way to reduce that effect. But only if you are prefetching data and/or instructions that are actually needed. PGO helps with the instruction part. The programmer has to handle the data part and organize things to take advantage of prefetching.

And I am _not_ talking about the prefetch opcode on many processors that doesn't stall the CPU waiting for data, but does cause cache to start reading the data before it is actually referenced. That is not what this is about.

shiv · Post by **shiv** » Thu Dec 03, 2009 11:21 pm

bob wrote: It doesn't really help with branch prediction. It tries to move code out of the sequential access pattern if you skip it frequently, so that it doesn't get prefetched into a cache block and then not used. PGO is really an attempt to determine how branches are executed, and then is used to re-organize the machine code as mentioned above. Branch prediction is what it is, and what it is is _very_ good on today's processors. But memory bandwidth and latency is a big issue, and pre-fetching using 32/64 byte cache blocks is a way to reduce that effect. But only if you are prefetching data and/or instructions that are actually needed. PGO helps with the instruction part. The programmer has to handle the data part and organize things to take advantage of prefetching.

And I am _not_ talking about the prefetch opcode on many processors that doesn't stall the CPU waiting for data, but does cause cache to start reading the data before it is actually referenced. That is not what this is about.

Thanks for the detailed explanation. I meant helping branch prediction indirectly, but this is not a precise way to state it as it benefits but does not directly help. A good PGO blog is at http://blogs.msdn.com/vcblog/archive/20 ... /pogo.aspx

A question I have is does PGO interface with basic block chaining (does the ability to block chain get better after profiling actual runs)? Ie if block A 80% of the time goes to block B, would it optimize for this case? I remember playing with the source code of some emulators that implemented block chaining from scratch (e.g. qemu), I am now curious how this would work in practice with compilers/PGO.

Can we learn anything from PGO compiled executables?

Re: Can we learn anything from PGO compiled executables?

Re: Can we learn anything from PGO compiled executables?