PGO improvement for Stockfish?

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
zullil
Posts: 5178
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

PGO improvement for Stockfish?

Post by zullil » Thu Jun 05, 2014 12:05 am

When building the latest Stockfish with g++-4.8 using

Code: Select all

make profile-build ARCH=x86-64-modern
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:

Code: Select all

gcc-profile-make:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-generate' \
	EXTRALDFLAGS='-lgcov' \
	all

gcc-profile-use:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-use' \
	EXTRALDFLAGS='-lgcov' \
	all
Timing was done by disabling Turbo Boost and invoking the standard

Code: Select all

./stockfish bench
which defaults to a single-threaded deterministic search.

Does anyone else get similar results using gcc? I'm running Linux.

ZirconiumX
Posts: 1327
Joined: Sun Jul 17, 2011 9:14 am

Re: PGO improvement for Stockfish?

Post by ZirconiumX » Thu Jun 05, 2014 6:35 am

Not at my computer, so I can't check your patch, but I've gotten a small speedup by adding -march=native -mtune=native which lets GCC choose extra optimization flags based on the user's machine.

However, it probably isn't a good patch because all the release binaries would be optimized for whatever CPU abrok.eu uses to host its servers.

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.

Joerg Oster
Posts: 665
Joined: Fri Mar 10, 2006 3:29 pm
Location: Germany

Re: PGO improvement for Stockfish?

Post by Joerg Oster » Thu Jun 05, 2014 10:45 am

zullil wrote:When building the latest Stockfish with g++-4.8 using

Code: Select all

make profile-build ARCH=x86-64-modern
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:

Code: Select all

gcc-profile-make:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-generate' \
	EXTRALDFLAGS='-lgcov' \
	all

gcc-profile-use:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-use' \
	EXTRALDFLAGS='-lgcov' \
	all
Timing was done by disabling Turbo Boost and invoking the standard

Code: Select all

./stockfish bench
which defaults to a single-threaded deterministic search.

Does anyone else get similar results using gcc? I'm running Linux.
Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.

Also running Linux and g++ -4.8.x.
Jörg Oster

zullil
Posts: 5178
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil » Thu Jun 05, 2014 11:19 am

Joerg Oster wrote: Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.

Also running Linux and g++ -4.8.x.
Interesting. Thanks for testing this. Did you have Turbo Boost (or Turbo Core) disabled? I only just realized the impossibility of benchmarking accurately with Turbo Boost on.

zullil
Posts: 5178
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil » Thu Jun 05, 2014 11:32 am

ZirconiumX wrote:I've gotten a small speedup by adding -march=native -mtune=native which lets GCC choose extra optimization flags based on the user's machine.

Matthew:out
I used to do the same. But my current testing seems to indicate that removing all invocations of -msse and -msse3 from the makefile and just using -O3 -fno-tree-pre is best. At least on my system using gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04). Odd.

Joerg Oster
Posts: 665
Joined: Fri Mar 10, 2006 3:29 pm
Location: Germany

Re: PGO improvement for Stockfish?

Post by Joerg Oster » Thu Jun 05, 2014 2:55 pm

zullil wrote:
Joerg Oster wrote: Well, I cannot confirm.
Trying with latest dev, default profile-build is slightly faster than with your modification.

Also running Linux and g++ -4.8.x.
Interesting. Thanks for testing this. Did you have Turbo Boost (or Turbo Core) disabled? I only just realized the impossibility of benchmarking accurately with Turbo Boost on.
Of course. :D
TurboBoost resp. TurboCore is an absolute no-go in serious engine testing.
Jörg Oster

bob
Posts: 20369
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: PGO improvement for Stockfish?

Post by bob » Thu Jun 05, 2014 4:34 pm

zullil wrote:When building the latest Stockfish with g++-4.8 using

Code: Select all

make profile-build ARCH=x86-64-modern
I consistently get a binary that is about 3% faster if I replace -fprofile-generate and -fprofile-use with -fprofile-arcs and -fbranch-probabilities, respectively, in the supplied makefile:

Code: Select all

gcc-profile-make:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-generate' \
	EXTRALDFLAGS='-lgcov' \
	all

gcc-profile-use:
	$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
	EXTRACXXFLAGS='-fprofile-use' \
	EXTRALDFLAGS='-lgcov' \
	all
Timing was done by disabling Turbo Boost and invoking the standard

Code: Select all

./stockfish bench
which defaults to a single-threaded deterministic search.

Does anyone else get similar results using gcc? I'm running Linux.
I reported this for Crafty a few months back. The newer options (generate/use) seem to re-order data, but not particularly effectively. profile-arcs and profile-use are definitely better for me too.

zullil
Posts: 5178
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

Re: PGO improvement for Stockfish?

Post by zullil » Thu Jun 05, 2014 8:28 pm

bob wrote:profile-arcs and profile-use are definitely better for me too.
For me it has to be profile-arcs paired with branch-probabilities. If I replace the latter with profile-use, I lose the gain. According to the documentation, profile-use enables branch-probabilities, but it also enables a half-dozen other things (at least one of which seems to degrade the optimization).

Krgp
Posts: 20
Joined: Mon Nov 04, 2013 5:18 am

Re: PGO improvement for Stockfish?

Post by Krgp » Sun Jun 08, 2014 11:56 am

Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
KP

bob
Posts: 20369
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: PGO improvement for Stockfish?

Post by bob » Sun Jun 08, 2014 2:53 pm

Krgp wrote:Well ... 'profile-arcs' &/or 'branch-probabilities' (paired or used either of them alone) do not work with GCC 473 (internal compiler error: in edge_badness, at ipa-inline.c:793, make[2]: *** [ucioption.o] Error 1) , with 482, 483 or 490 - these (both together) indeed give a considerable speed gain - around 4% (for 482) around 3% (for 483 & for 490) on i7-4770k - even with 'turbo boost' ON, OC (@ 4.5 GHz) on and all of them off.
I use those all the time with Crafty and gcc 4.7.3...

If you do any multi-threaded benchmarking, you do need to add

-fprofile-correction

on the final compile, because the threaded profiling apparently has a few issues with corruption in the .gcda file. The above fixes it.

Post Reply