The best compiler for chess, Intel or gcc or something else?

syzygy · Post by **syzygy** » Wed Oct 10, 2012 11:47 pm

abulmo wrote:
lucasart wrote:
abulmo wrote: -O4 does nothing more than -O3 with gcc 4.7
Well on mine it does (see my detailled post).
I guess it is just a problem of accuracy.
I reproduced your experiment on stockfish 2.3.1, except I run the bench 10 times and rebuild the executable several times:

1) Normal build "make build ARCH=x86-64" with -O3
Code: Select all
3696
3607
3661
3597
3649
3628
3648
3600
3648
3606
ie in average 3634 +/- 32 (1 standard deviation)
2) Normal build with -O4
Code: Select all
3622
3571
3609
3606
3596
3608
3609
3604
3605
3569
in average 3600 +/- 17
3) Another normal build with -O3
Code: Select all
3612
3563
3590
3599
3619
3555
3603
3601
3547
3607
in average 3590 +/- 25
My conclusion is that bench time fluctuates between runs and also between compilations. IMHO it is very hard to detect small speed enhancement accurately.

If the question is whether there is any difference between gcc -O3 and gcc -O4, just look at the resulting binaries. If they are identical, there is no difference (at least not for that program). If they are not identical, there is a difference.

Bench time certainly fluctuates between runs, but not between compilations provided you use the same compiler and compiler options, and the same source code. The resulting binaries are simply the same. (Ok, it might fluctuate if you use profile guided optimization with different profiles, because then the resulting binaries might differ...)

abulmo · Post by **abulmo** » Thu Oct 11, 2012 12:45 am

syzygy wrote:If the question is whether there is any difference between gcc -O3 and gcc -O4, just look at the resulting binaries. If they are identical, there is no difference (at least not for that program). If they are not identical, there is a difference.

If I compile twice stockfish with the same options I got different binaries. So I do not think it is conclusive.

syzygy wrote:Bench time certainly fluctuates between runs, but not between compilations provided you use the same compiler and compiler options, and the same source code. The resulting binaries are simply the same. (Ok, it might fluctuate if you use profile guided optimization with different profiles, because then the resulting binaries might differ...)

With -O3 gcc enables -fguess-branch-probability, which means (if I understand correctly) it will chose some branches at random, and produces non deterministic binaries.

syzygy · Post by **syzygy** » Thu Oct 11, 2012 2:38 am

abulmo wrote:
syzygy wrote:If the question is whether there is any difference between gcc -O3 and gcc -O4, just look at the resulting binaries. If they are identical, there is no difference (at least not for that program). If they are not identical, there is a difference.
If I compile twice stockfish with the same options I got different binaries. So I do not think it is conclusive.

But doesn't stockfish offer a compilation mode with profile-guided optimisations?

syzygy wrote:Bench time certainly fluctuates between runs, but not between compilations provided you use the same compiler and compiler options, and the same source code. The resulting binaries are simply the same. (Ok, it might fluctuate if you use profile guided optimization with different profiles, because then the resulting binaries might differ...)
With -O3 gcc enables -fguess-branch-probability, which means (if I understand correctly) it will chose some branches at random, and produces non deterministic binaries.

This surprises me, but the gcc documentation agrees with you. I have never noticed non-deterministic behaviour of -O3 or even -O6, though. Although I might turn out to be wrong, I am guessing that -fguess-branch-probability uses a random generator using the same seed on every run of gcc (and is therefore deterministic).

bnemias · Post by **bnemias** » Thu Oct 11, 2012 4:24 am

abulmo wrote:If I compile twice stockfish with the same options I got different binaries.

This has been my experience. I suspect gcc probably embeds some kind of compile-time data that includes the timestamp. If that's the case you'd want to disable anything like that before comparing binaries with a hash.

abulmo · Post by **abulmo** » Thu Oct 11, 2012 7:33 am

bnemias wrote:
abulmo wrote:If I compile twice stockfish with the same options I got different binaries.
This has been my experience. I suspect gcc probably embeds some kind of compile-time data that includes the timestamp. If that's the case you'd want to disable anything like that before comparing binaries with a hash.

Yes, even more if macro like __DATE__ and __TIME__ are used in the program. The fguess-branch-probability does have an effect too. If I disable it, the size of the binary changes. Maybe the size of the binary is enough to establish -O3 and -O4 as the same, optimization level as they produce two binaries of the same size.

rbarreira · Post by **rbarreira** » Thu Oct 11, 2012 12:00 pm

bnemias wrote:
abulmo wrote:If I compile twice stockfish with the same options I got different binaries.
This has been my experience. I suspect gcc probably embeds some kind of compile-time data that includes the timestamp. If that's the case you'd want to disable anything like that before comparing binaries with a hash.

In my experience this is not the case. With gcc 4.4 at least, the md5sum of two successive compilations is exactly the same. I also tried -O3 and -O4 and the resulting files were exactly equal.

Of course if you use PGO or things like __TIME__ the binaries may not be the same. I haven't checked if Stockfish uses that

abulmo · Post by **abulmo** » Thu Oct 11, 2012 1:02 pm

abulmo wrote:

If I compile twice stockfish with the same options I got different binaries. So I do not think it is conclusive.
[/quote]
Actually, it looks like the -g flag (always set in stockfish's Makefile) is the culprit, and the strip command cannot remove all of its "garbage" (about 20 bytes of random data stay at address 0x24c - 0x260.
After correcting the makefile by removing the "-g flag", adding a "-s" to the LDFLAGS to automatically strip the executable at link time, and removing the __DATE__ macro in stockfish's code, I get the same executable accross different compilations.

diep · Post by **diep** » Fri Oct 12, 2012 1:44 am

bob wrote:
Don wrote:
bob wrote:
Don wrote:P.S. PGO was useful before 4.6 - at least for me. But since then it has worked extremely well for me.

Don wrote:
lucasart wrote:In my experience, nothing beats GCC 4.7. As for PGO, I have never found that to be faster: maybe it used to in earlier versions, but with -O4 -flto, it's as fast w/o PGO

Who needs ICC or Mickeysoft VC++ anymore
That is pretty odd, I found major benefit's for PGO with Komodo. Maybe it is very program specific then.
For well-written programs, PGO is not going to produce huge improvements. But it will improve things significantly. 10% to as much as 20% is certainly possible. But this is mainly about optimizing the direct instruction path of a program so that the cache doesn't load blocks of code that are rarely used. There's very little gain elsewhere. But with a lot of if statements, particularly if-then-else type structures, it will move the uncommon path out of the primary execution stream and cause cache prefetching (filling an entire block) to work better since it won't prefetch the blocks of code that are infrequently used.
I actually meant that before 4.6 it was USELESS - I could see no advantage at all. By mistake I said it was "useful" but that is not what I meant.

I do take some care to avoid conditional instructions as much as possible but any chess program is relatively heavy on logic. But complex nested if/then statements - are you saying that PGO does the best with them? I could easily believe that.
Think about what you would do if you knew the history of every branch in your program.

If you had code like this:

if (c) { statements }

but you knew that most of the time (> 50%) c is false, you would want to write it like this:

if (c) goto xxxx
back_again:
...

and somewhere else you would do this:

xxxx:
{ statements }
go to back_again;

Now when that executes, the {statements} are not in the direct code path and don't get brought into cache, taking up space and time, as well as booting something else out.

That is about all PGO can do. Branch prediction is done in the hardware, so it can't help there. But if you have a ton of if statements, like a chess engine evaluation (for one place) then it can help. I see about a 10% improvement with icc. gcc has always been problematic for me and is unreliable when doing PGO. It either crashes or produces corrupt PGO temp files, particularly if I try to PGO everything including the threaded code...

Well things are more complex.

Suppose 'c' is just a variable that doesn't get changed say 20 instructions prior to execution.

I would guess some processors already can evaluate the branch then prior to taking it; at least save out some cycles.

Yet most CPU manufacturers are not exactly documenting how their branch prediction works - so it's not clear to us mortals how to write the C code there in order to lose less cycles.

Wouldn't it be able to build a small test there testing this out?

diep · Post by **diep** » Fri Oct 12, 2012 1:45 am

For Diep -O3 is not faster than -O2 when using PGO with GCC 4.7.0

rbarreira · Post by **rbarreira** » Fri Oct 12, 2012 10:12 am

bob wrote: Think about what you would do if you knew the history of every branch in your program.

If you had code like this:

if (c) { statements }

but you knew that most of the time (> 50%) c is false, you would want to write it like this:

if (c) goto xxxx
back_again:
...

and somewhere else you would do this:

xxxx:
{ statements }
go to back_again;

Now when that executes, the {statements} are not in the direct code path and don't get brought into cache, taking up space and time, as well as booting something else out.

That is about all PGO can do. Branch prediction is done in the hardware, so it can't help there. But if you have a ton of if statements, like a chess engine evaluation (for one place) then it can help. I see about a 10% improvement with icc. gcc has always been problematic for me and is unreliable when doing PGO. It either crashes or produces corrupt PGO temp files, particularly if I try to PGO everything including the threaded code...

PGO does more than that, at least in some compilers:

http://msdn.microsoft.com/en-us/library ... 80%29.aspx

http://software.intel.com/sites/product ... C30CE3.htm

The best compiler for chess, Intel or gcc or something else?

Has GCC caught up with Intel with respect to performance?

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e

Re: The best compiler for chess, Intel or gcc or something e