Krgp wrote:Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
I use those all the time with Crafty and gcc 4.7.3...
If you do any multi-threaded benchmarking, you do need to add
-fprofile-correction
on the final compile, because the threaded profiling apparently has a few issues with corruption in the .gcda file. The above fixes it.
It's due to a race condition on the profile data, since GCC makes the assumption the code is single-threaded, and does not insert (what it considers to be) useless file locking code into the program.
Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
This appears to be a known bug when using LTO and profiling together.
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
This appears to be a known bug when using LTO and profiling together.
Not much detail given though, so it's probably best to just give 4.7.3 a miss.
In addition to the above, can someone test if adding -Ofast gives a speedup?
Matthew:out
For me, -Ofast gives about a 1/2 of 1% reduction in speed.
About LTO bugs in gcc-4.7.3---Kiran indicated that removing -flto from the makefile did not resolve the issue for him. (I suggested that he try that, after I saw the same bug report you found.)
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
This appears to be a known bug when using LTO and profiling together.
Not much detail given though, so it's probably best to just give 4.7.3 a miss.
In addition to the above, can someone test if adding -Ofast gives a speedup?
Matthew:out
Same here ... -Ofast gives 0.5 to 1% reduction in speed ... btw 4.7.3 can build with the tweaks suggested by Zullil ... with the 'recommended' optimization -O2 !! with -O3 it is not possible ... only thing is that all extra speed gain is lost ... so practically no visible benefit ... builds on 4.8.2 with the tweaks & -O3 are 0.5 % faster than those on 4.7.3 & -O2 ... so what is better -fprofile-arcs and -fbranch-probabilities -O3 on 4.8.2 / -O2 on 4.7.3; or -fprofile-generate and -fprofile-use -Ofast on 4.7.3 ? ... speed of all three combinations is roughly equal ... or the fourth -fprofile-generate and -fprofile-use -O3 on 4.7.3 is also equall ...