PGO improvement for Stockfish?

Discussion of chess software programming and technical issues.

Moderator: Ras

ZirconiumX
Posts: 1362
Joined: Sun Jul 17, 2011 11:14 am
Full name: Hannah Ravensloft

Re: PGO improvement for Stockfish?

Post by ZirconiumX »

bob wrote:
Krgp wrote:Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
I use those all the time with Crafty and gcc 4.7.3...

If you do any multi-threaded benchmarking, you do need to add

-fprofile-correction

on the final compile, because the threaded profiling apparently has a few issues with corruption in the .gcda file. The above fixes it.
It's due to a race condition on the profile data, since GCC makes the assumption the code is single-threaded, and does not insert (what it considers to be) useless file locking code into the program.

Matthew:out
tu ne cede malis, sed contra audentior ito
Krgp
Posts: 20
Joined: Mon Nov 04, 2013 6:18 am

Re: PGO improvement for Stockfish?

Post by Krgp »

Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
KP
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil »

Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Krgp
Posts: 20
Joined: Mon Nov 04, 2013 6:18 am

Re: PGO improvement for Stockfish?

Post by Krgp »

zullil wrote:
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
KP
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: PGO improvement for Stockfish?

Post by bob »

Krgp wrote:
zullil wrote:
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
Krgp
Posts: 20
Joined: Mon Nov 04, 2013 6:18 am

Re: PGO improvement for Stockfish?

Post by Krgp »

bob wrote:
Krgp wrote:
zullil wrote:
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
KP
ZirconiumX
Posts: 1362
Joined: Sun Jul 17, 2011 11:14 am
Full name: Hannah Ravensloft

Re: PGO improvement for Stockfish?

Post by ZirconiumX »

Krgp wrote:
bob wrote:
Krgp wrote:
zullil wrote:
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
This appears to be a known bug when using LTO and profiling together.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54417

Not much detail given though, so it's probably best to just give 4.7.3 a miss.

In addition to the above, can someone test if adding -Ofast gives a speedup?

Matthew:out
tu ne cede malis, sed contra audentior ito
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil »

ZirconiumX wrote:
Krgp wrote:
bob wrote:
Krgp wrote:
zullil wrote:
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
This appears to be a known bug when using LTO and profiling together.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54417

Not much detail given though, so it's probably best to just give 4.7.3 a miss.

In addition to the above, can someone test if adding -Ofast gives a speedup?

Matthew:out
For me, -Ofast gives about a 1/2 of 1% reduction in speed.

About LTO bugs in gcc-4.7.3---Kiran indicated that removing -flto from the makefile did not resolve the issue for him. (I suggested that he try that, after I saw the same bug report you found.)
Krgp
Posts: 20
Joined: Mon Nov 04, 2013 6:18 am

Re: PGO improvement for Stockfish?

Post by Krgp »

ZirconiumX wrote:
Krgp wrote:
bob wrote:
Krgp wrote:
zullil wrote:
Krgp wrote:Many thanks for the guidance ... it however still doesn't work even after ... -fprofile-correction ... same error on 4.7.3 (at least for SF & on Win 8.1 pro 64) ... works fine on 4.8.2, 4.9.0 even without -fprofile-correction ...
What happens if you remove -flto from the makefile?
Removing -flto also does not cure it ... same error on 4.7.3
Are you using the c11 version or the C++ version? might be that this is too new and broken for 4.7.3...
C++ ... may be due to 4.7.3 does not have native support for i7-4770k (haswell) architecture ... 4.8.2 also does not have it for 'haswell' but it does have it for 'core=avx2' and 4.9.0 natively supports 'haswell' ... (it however has been confirmed that it does not work also on 'Gulftown' 970) ... so 'haswell' - should not be an issue ...
This appears to be a known bug when using LTO and profiling together.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54417

Not much detail given though, so it's probably best to just give 4.7.3 a miss.

In addition to the above, can someone test if adding -Ofast gives a speedup?

Matthew:out
Same here ... -Ofast gives 0.5 to 1% reduction in speed ... btw 4.7.3 can build with the tweaks suggested by Zullil ... with the 'recommended' optimization -O2 !! with -O3 it is not possible ... only thing is that all extra speed gain is lost ... so practically no visible benefit ... builds on 4.8.2 with the tweaks & -O3 are 0.5 % faster than those on 4.7.3 & -O2 ... so what is better -fprofile-arcs and -fbranch-probabilities -O3 on 4.8.2 / -O2 on 4.7.3; or -fprofile-generate and -fprofile-use -Ofast on 4.7.3 ? ... speed of all three combinations is roughly equal ... or the fourth -fprofile-generate and -fprofile-use -O3 on 4.7.3 is also equall ...
KP
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil »

Here are the changes from the default makefile that produce the fastest Stockfish binary for me:

Code: Select all

louis@LZsT5610:~/Documents/Chess/Stockfish/src$ diff Makefile_new Makefile
274,275c274,275
< 		CXXFLAGS +=
< 		DEPENDFLAGS +=
---
> 		CXXFLAGS += -msse
> 		DEPENDFLAGS += -msse
288c288
< 	CXXFLAGS += -DUSE_POPCNT
---
> 	CXXFLAGS += -msse3 -DUSE_POPCNT
308c308
< 			CXXFLAGS +=
---
> 			CXXFLAGS += -flto
450c450
< 	EXTRACXXFLAGS='-fprofile-arcs' \
---
> 	EXTRACXXFLAGS='-fprofile-generate' \
456c456
< 	EXTRACXXFLAGS='-fbranch-probabilities' \
---
> 	EXTRACXXFLAGS='-fprofile-use' \
With gcc-4.8.1, this improves nps by about 4%, as measured using the standard Stockfish (single-threaded, deterministic) benchmark. I'm building with

Code: Select all

make profile-build ARCH=x86-64-modern
For me, with the exception of the inlined popcnt instruction, enabling sse or sse3 actually produced code that was a bit slower.