I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:
Not at my computer, so I can't check your patch, but I've gotten a small speedup by adding -march=native -mtune=native which lets GCC choose extra optimization flags based on the user's machine.
However, it probably isn't a good patch because all the release binaries would be optimized for whatever CPU abrok.eu uses to host its servers.
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:
Joerg Oster wrote:
Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.
Also running Linux and g++ -4.8.x.
Interesting. Thanks for testing this. Did you have Turbo Boost (or Turbo Core) disabled? I only just realized the impossibility of benchmarking accurately with Turbo Boost on.
ZirconiumX wrote:I've gotten a small speedup by adding -march=native -mtune=native which lets GCC choose extra optimization flags based on the user's machine.
Matthew:out
I used to do the same. But my current testing seems to indicate that removing all invocations of -msse and -msse3 from the makefile and just using -O3 -fno-tree-pre is best. At least on my system using gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04). Odd.
Joerg Oster wrote:
Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.
Also running Linux and g++ -4.8.x.
Interesting. Thanks for testing this. Did you have Turbo Boost (or Turbo Core) disabled? I only just realized the impossibility of benchmarking accurately with Turbo Boost on.
Of course.
TurboBoost resp. TurboCore is an absolute no-go in serious engine testing.
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:
which defaults to a single-threaded deterministic search.
Does anyone else get similar results using gcc? I'm running Linux.
I reported this for Crafty a few months back. The newer options (generate/use) seem to re-order data, but not particularly effectively. profile-arcs and profile-use are definitely better for me too.
bob wrote:profile-arcs and profile-use are definitely better for me too.
For me it has to be profile-arcs paired with branch-probabilities. If I replace the latter with profile-use, I lose the gain. According to the documentation, profile-use enables branch-probabilities, but it also enables a half-dozen other things (at least one of which seems to degrade the optimization).
Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
Krgp wrote:Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
I use those all the time with Crafty and gcc 4.7.3...
If you do any multi-threaded benchmarking, you do need to add
-fprofile-correction
on the final compile, because the threaded profiling apparently has a few issues with corruption in the .gcda file. The above fixes it.