For me, with the exception of the inlined popcnt instruction, enabling sse or sse3 actually produced code that was a bit slower.
Indeed -fprofile-arcs & -fbranch-probabilities & no link time optimization produce fastest builds ... but one has to use 4.8 & above for that ... if it could be done on 4.7.3 - it's still 3% speed gain over & above of the 4% given by -fprofile-arcs & -fbranch-probabilities ... a total of around 7% !!
There is a work around - BYO by Brice Allenbrand (RW builds) at https://www.dropbox.com/sh/4rubami2nvld ... ft7y9a/BYO does it ! - what Brice has done is he uses cpuz for getting profile & uses -fbranch-probabilities ... result is that 7% speed gain on 4.7.3 - I tried using his script, modified it a bit (-O3 instead of -Ofast used by Brice) - did get the 7% gain !! ... only if the idea (of course no need of using cpuz - I suppose) could be incorporated in 'make-file' ...
Krgp wrote:
There is a work around - BYO by Brice Allenbrand (RW builds) at https://www.dropbox.com/sh/4rubami2nvld ... ft7y9a/BYO does it ! - what Brice has done is he uses cpuz for getting profile & uses -fbranch-probabilities ... result is that 7% speed gain on 4.7.3 - I tried using his script, modified it a bit (-O3 instead of -Ofast used by Brice) - did get the 7% gain !! ... only if the idea (of course no need of using cpuz - I suppose) could be incorporated in 'make-file' ...
The file at the URL you provided is some sort of bundled Windows executable, with the Stockfish source code and scripts contained within. Since I don't use Windows, I can't even open it to see what's there.
Krgp wrote:
There is a work around - BYO by Brice Allenbrand (RW builds) at https://www.dropbox.com/sh/4rubami2nvld ... ft7y9a/BYO does it ! - what Brice has done is he uses cpuz for getting profile & uses -fbranch-probabilities ... result is that 7% speed gain on 4.7.3 - I tried using his script, modified it a bit (-O3 instead of -Ofast used by Brice) - did get the 7% gain !! ... only if the idea (of course no need of using cpuz - I suppose) could be incorporated in 'make-file' ...
The file at the URL you provided is some sort of bundled Windows executable, with the Stockfish source code and scripts contained within. Since I don't use Windows, I can't even open it to see what's there.
Well ... in the bundle there are 3 GCCs (473, 48b & 49c), cpuz and a batch file ... here is the script Brice uses :
Krgp wrote:
what Brice has done is he uses cpuz for getting profile & uses -fbranch-probabilities ... result is that 7% speed gain on 4.7.3 - I tried using his script, modified it a bit (-O3 instead of -Ofast used by Brice) - did get the 7% gain !! ... only if the idea (of course no need of using cpuz - I suppose) could be incorporated in 'make-file' ...
Well, I spent about three minutes with the script. It seems cpuz is simply used to determine what instruction sets the processor supports, so that flags can be set appropriately (eg, -DUSE_POPCNT, -DUSE_PEXT).
The actual workaround for the gcc-4.7.3 "bug" appears to be the deletion of the files ucioption.gcda and ucioption.gcno prior to the final compilation; if I recall correctly the error you reported involved ucioption.o.
If I can get a copy of gcc-4.7.3, I might spend a few minutes, but there's nothing magical in the script.
Krgp wrote:
what Brice has done is he uses cpuz for getting profile & uses -fbranch-probabilities ... result is that 7% speed gain on 4.7.3 - I tried using his script, modified it a bit (-O3 instead of -Ofast used by Brice) - did get the 7% gain !! ... only if the idea (of course no need of using cpuz - I suppose) could be incorporated in 'make-file' ...
Well, I spent about three minutes with the script. It seems cpuz is simply used to determine what instruction sets the processor supports, so that flags can be set appropriately (eg, -DUSE_POPCNT, -DUSE_PEXT).
The actual workaround for the gcc-4.7.3 "bug" appears to be the deletion of the files ucioption.gcda and ucioption.gcno prior to the final compilation; if I recall correctly the error you reported involved ucioption.o.
If I can get a copy of gcc-4.7.3, I might spend a few minutes, but there's nothing magical in the script.
Turns out I already had 4.7.3 installed. Indeed, there is an error in the final compilation:
g++-4.7 -Wall -Wcast-qual -fno-exceptions -fno-rtti -fbranch-probabilities -ansi -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -DIS_64BIT -DUSE_BSFQ -DUSE_POPCNT -c -o ucioption.o ucioption.cpp
ucioption.cpp:160:1: internal compiler error: in edge_badness, at ipa-inline.c:793
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
The bug is not reproducible, so it is likely a hardware or OS problem.
make[2]: *** [ucioption.o] Error 1
make[2]: Leaving directory `/home/louis/Documents/Chess/Stockfish/src'
make[1]: *** [gcc-profile-use] Error 2
make[1]: Leaving directory `/home/louis/Documents/Chess/Stockfish/src'
make: *** [profile-build] Error 2
I note that a ucioption.gcda is not present at all, so for me removing this file is clearly not the workaround.
zullil wrote:
The actual workaround for the gcc-4.7.3 "bug" appears to be the deletion of the files ucioption.gcda and ucioption.gcno prior to the final compilation; if I recall correctly the error you reported involved ucioption.o.
If I can get a copy of gcc-4.7.3, I might spend a few minutes, but there's nothing magical in the script.
Robert Hyatt also mentioned 'corruption' (?!) of .gcda files ...
4.7.3 seems to be hard to get .. but 4.7.4 is released with 'bug fixes' (http://gcc.gnu.org/gcc-4.7/) hopefully this bug is fixed ... in the meantime is there any 'other' way to delete .gcno & .gcda files prior to final compilation? ... the 'additional' 3% speed gain is too tempting ...
zullil wrote:
The actual workaround for the gcc-4.7.3 "bug" appears to be the deletion of the files ucioption.gcda and ucioption.gcno prior to the final compilation; if I recall correctly the error you reported involved ucioption.o.
If I can get a copy of gcc-4.7.3, I might spend a few minutes, but there's nothing magical in the script.
Robert Hyatt also mentioned 'corruption' (?!) of .gcda files ...
4.7.3 seems to be hard to get .. but 4.7.4 is released with 'bug fixes' (http://gcc.gnu.org/gcc-4.7/) hopefully this bug is fixed ... in the meantime is there any 'other' way to delete .gcno & .gcda files prior to final compilation? ... the 'additional' 3% speed gain is too tempting ...
My problem with 4.7.3 was that it failed when no .gcda file was present, so deleting was not the fix. It seems that running only
to generate a .gcda, and then modified the makefile to complete the final compilation only.
In any case, I got only a 1% speed-up over gcc-4.8, so I'm not sure finding a good workaround is worth the effort. May see if I can get 4.7.4 for my linux system ...
zullil wrote:
The actual workaround for the gcc-4.7.3 "bug" appears to be the deletion of the files ucioption.gcda and ucioption.gcno prior to the final compilation; if I recall correctly the error you reported involved ucioption.o.
If I can get a copy of gcc-4.7.3, I might spend a few minutes, but there's nothing magical in the script.
Robert Hyatt also mentioned 'corruption' (?!) of .gcda files ...
4.7.3 seems to be hard to get .. but 4.7.4 is released with 'bug fixes' (http://gcc.gnu.org/gcc-4.7/) hopefully this bug is fixed ... in the meantime is there any 'other' way to delete .gcno & .gcda files prior to final compilation? ... the 'additional' 3% speed gain is too tempting ...
My problem with 4.7.3 was that it failed when no .gcda file was present, so deleting was not the fix. It seems that running only
to generate a .gcda, and then modified the makefile to complete the final compilation only.
In any case, I got only a 1% speed-up over gcc-4.8, so I'm not sure finding a good workaround is worth the effort. May see if I can get 4.7.4 for my linux system ...
After more careful investigation:
ucioption.gcda is produced when the bench command is run during the profiling stage. But it must be "corrupt" in some manner; using it during the final build causes the gcc-4.7.3 error (and the file is deleted for some reason, which is why I thought it was never present).
If this corrupt .gcda is deleted prior to the final compilation, no error occurs. There's simply a message:
I am also doing similar (however I use Brice's batch file after various modifications - and do get additional 3% speed gain) at the moment ... a slight difference being I can afford to use -march=native ... so no need for other flags ... also I had initially thought 'cpuz' is being used for just determining basic architecture ... however the .txt file generated is quite similar to the output of gcc -c -Q -march=native --help=target ... so I suspected it's being used to get profile data (however could read it in the script itself) ... anyway thanks for your shell script ... will try it out and post feedback ...
I am also doing similar (however I use Brice's batch file after various modifications - and do get additional 3% speed gain) at the moment ... a slight difference being I can afford to use -march=native ... so no need for other flags ... also I had initially thought 'cpuz' is being used for just determining basic architecture ... however the .txt file generated is quite similar to the output of gcc -c -Q -march=native --help=target ... so I suspected it's being used to get profile data (however could read it in the script itself) ... anyway thanks for your shell script ... will try it out and post feedback ...
The script you posted invokes gcc -Q --help=target -march=native and writes the output into StockFishRW_BYO.txt
I think all that does is document which compiler switches are enabled/disabled when -march=native is used during compilation.