PGO improvement for Stockfish?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

PGO improvement for Stockfish?

Post by zullil »

When building the latest Stockfish with g++-4.8 using

Code: Select all

make profile-build ARCH=x86-64-modern
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:

Code: Select all

gcc-profile-make:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-generate' \
	EXTRALDFLAGS='-lgcov' \
	all

gcc-profile-use:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-use' \
	EXTRALDFLAGS='-lgcov' \
	all
Timing was done by disabling Turbo Boost and invoking the standard

Code: Select all

./stockfish bench
which defaults to a single-threaded deterministic search.

Does anyone else get similar results using gcc? I'm running Linux.
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: PGO improvement for Stockfish?

Post by ZirconiumX »

Not at my computer, so I can't check your patch, but I've gotten a small speedup by adding -march=native -mtune=native which lets GCC choose extra optimization flags based on the user's machine.

However, it probably isn't a good patch because all the release binaries would be optimized for whatever CPU abrok.eu uses to host its servers.

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: PGO improvement for Stockfish?

Post by Joerg Oster »

zullil wrote:When building the latest Stockfish with g++-4.8 using

Code: Select all

make profile-build ARCH=x86-64-modern
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:

Code: Select all

gcc-profile-make:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-generate' \
	EXTRALDFLAGS='-lgcov' \
	all

gcc-profile-use:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-use' \
	EXTRALDFLAGS='-lgcov' \
	all
Timing was done by disabling Turbo Boost and invoking the standard

Code: Select all

./stockfish bench
which defaults to a single-threaded deterministic search.

Does anyone else get similar results using gcc? I'm running Linux.
Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.

Also running Linux and g++ -4.8.x.
Jörg Oster
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil »

Joerg Oster wrote: Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.

Also running Linux and g++ -4.8.x.
Interesting. Thanks for testing this. Did you have Turbo Boost (or Turbo Core) disabled? I only just realized the impossibility of benchmarking accurately with Turbo Boost on.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil »

ZirconiumX wrote:I've gotten a small speedup by adding -march=native -mtune=native which lets GCC choose extra optimization flags based on the user's machine.

Matthew:out
I used to do the same. But my current testing seems to indicate that removing all invocations of -msse and -msse3 from the makefile and just using -O3 -fno-tree-pre is best. At least on my system using gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04). Odd.
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: PGO improvement for Stockfish?

Post by Joerg Oster »

zullil wrote:
Joerg Oster wrote: Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.

Also running Linux and g++ -4.8.x.
Interesting. Thanks for testing this. Did you have Turbo Boost (or Turbo Core) disabled? I only just realized the impossibility of benchmarking accurately with Turbo Boost on.
Of course. :D
TurboBoost resp. TurboCore is an absolute no-go in serious engine testing.
Jörg Oster
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: PGO improvement for Stockfish?

Post by bob »

zullil wrote:When building the latest Stockfish with g++-4.8 using

Code: Select all

make profile-build ARCH=x86-64-modern
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:

Code: Select all

gcc-profile-make:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-generate' \
	EXTRALDFLAGS='-lgcov' \
	all

gcc-profile-use:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-use' \
	EXTRALDFLAGS='-lgcov' \
	all
Timing was done by disabling Turbo Boost and invoking the standard

Code: Select all

./stockfish bench
which defaults to a single-threaded deterministic search.

Does anyone else get similar results using gcc? I'm running Linux.
I reported this for Crafty a few months back. The newer options (generate/use) seem to re-order data, but not particularly effectively. profile-arcs and profile-use are definitely better for me too.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil »

bob wrote:profile-arcs and profile-use are definitely better for me too.
For me it has to be profile-arcs paired with branch-probabilities. If I replace the latter with profile-use, I lose the gain. According to the documentation, profile-use enables branch-probabilities, but it also enables a half-dozen other things (at least one of which seems to degrade the optimization).
Krgp
Posts: 20
Joined: Mon Nov 04, 2013 6:18 am

Re: PGO improvement for Stockfish?

Post by Krgp »

Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
KP
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: PGO improvement for Stockfish?

Post by bob »

Krgp wrote:Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
I use those all the time with Crafty and gcc 4.7.3...

If you do any multi-threaded benchmarking, you do need to add

-fprofile-correction

on the final compile, because the threaded profiling apparently has a few issues with corruption in the .gcda file. The above fixes it.