The best compiler for chess, Intel or gcc or something else?

Discussion of chess software programming and technical issues.

Moderator: Ras

Has GCC caught up with Intel with respect to performance?

Poll ended at Sun Oct 14, 2012 4:32 pm

Yes
15
60%
No
10
40%
 
Total votes: 25

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: The best compiler for chess, Intel or gcc or something e

Post by diep »

ZirconiumX wrote:
velmarin wrote:You can download a free evaluation of Intel Compiler 30 days,
Download Visual Studio 90-day trial, albeit installed in a virtual machine and perform their own tests.

There would have to compare two compilers.
Nothing like making yourself.
Jose, ICC is free for non-commercial use - do a bit of digging and you should get a free license key for unlimited use - providing you don't make ANY money out of your executables.

MSVC is also free - I can't remember the name of the stripped down version though.

GCC is free for all uses - big plus for Don.

Matthew:out
msvc and icc are both paid compilers. No nothing free versions anymore.

Diep has lots of chessknowledge so we can prove that massive amount of branches are unavoidable.

if( generic pattern ) {
.. other patterns..
]
else {
SKIP
}

A single branch can skip simply quite some code and function calls. Skipping is always faster.

So being clever with branches gets really important for a compiler when having massive code. I didn't checkout how GCC 4.7.0 totally messes up there, i had posted publicly examples of 4.5 there.

They had slowed down 4.0 - 4.5 series deliberately in order to be a lot slower on AMD and lose just a little on intel cpu's (p4 and later), yet objectively all those examples showed slow code.

I am 100% convinced a lot of such choices of GCC that objectively produce slower code, they are still inside compiler, as it hardly speeded up at AMD processors meanwhile got really a lot faster on intel cpu's.

Yet you do need some chessknowledge of course to hit those cases :)
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: The best compiler for chess, Intel or gcc or something e

Post by abulmo »

lucasart wrote:[In my experience, nothing beats GCC 4.7. As for PGO, I have never found that to be faster: maybe it used to in earlier versions, but with -O4 -flto, it's as fast w/o PGO

[...]

Normal build -O4
instead of -O3, which is what the default Makefile uses (suboptimal).
-O4 does nothing more than -O3 with gcc 4.7

Code: Select all

$ gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
$ gcc -c -Q -O4 --help=optimizers > /tmp/O4-opts
$ diff /tmp/O3-opts /tmp/O4-opts
shows no difference.

on the other hand there is -Ofast which enables unsafe math optimizations, but it helps very little in a chess program I guess.
Richard
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: The best compiler for chess, Intel or gcc or something e

Post by abulmo »

bob wrote: GCC has had so many PGO-related bugs I quit using it because it would crash when I tried to PGO a threaded program...
Did you try the -fprofile-correction flag?
Richard
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: The best compiler for chess, Intel or gcc or something e

Post by lucasart »

abulmo wrote:
lucasart wrote:[In my experience, nothing beats GCC 4.7. As for PGO, I have never found that to be faster: maybe it used to in earlier versions, but with -O4 -flto, it's as fast w/o PGO

[...]

Normal build -O4
instead of -O3, which is what the default Makefile uses (suboptimal).
-O4 does nothing more than -O3 with gcc 4.7

Code: Select all

$ gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
$ gcc -c -Q -O4 --help=optimizers > /tmp/O4-opts
$ diff /tmp/O3-opts /tmp/O4-opts
shows no difference.

on the other hand there is -Ofast which enables unsafe math optimizations, but it helps very little in a chess program I guess.
Well on mine it does (see my detailled post).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: The best compiler for chess, Intel or gcc or something e

Post by abulmo »

lucasart wrote:
abulmo wrote: -O4 does nothing more than -O3 with gcc 4.7
Well on mine it does (see my detailled post).
I guess it is just a problem of accuracy.
I reproduced your experiment on stockfish 2.3.1, except I run the bench 10 times and rebuild the executable several times:

1) Normal build "make build ARCH=x86-64" with -O3

Code: Select all

3696
3607
3661
3597
3649
3628
3648
3600
3648
3606
ie in average 3634 +/- 32 (1 standard deviation)
2) Normal build with -O4

Code: Select all

3622
3571
3609
3606
3596
3608
3609
3604
3605
3569
in average 3600 +/- 17
3) Another normal build with -O3

Code: Select all

3612
3563
3590
3599
3619
3555
3603
3601
3547
3607
in average 3590 +/- 25
My conclusion is that bench time fluctuates between runs and also between compilations. IMHO it is very hard to detect small speed enhancement accurately.
Richard
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The best compiler for chess, Intel or gcc or something e

Post by bob »

Don wrote:
bob wrote:
Don wrote:P.S. PGO was useful before 4.6 - at least for me. But since then it has worked extremely well for me.
Don wrote:
lucasart wrote:In my experience, nothing beats GCC 4.7. As for PGO, I have never found that to be faster: maybe it used to in earlier versions, but with -O4 -flto, it's as fast w/o PGO

Who needs ICC or Mickeysoft VC++ anymore :wink:
That is pretty odd, I found major benefit's for PGO with Komodo. Maybe it is very program specific then.
For well-written programs, PGO is not going to produce huge improvements. But it will improve things significantly. 10% to as much as 20% is certainly possible. But this is mainly about optimizing the direct instruction path of a program so that the cache doesn't load blocks of code that are rarely used. There's very little gain elsewhere. But with a lot of if statements, particularly if-then-else type structures, it will move the uncommon path out of the primary execution stream and cause cache prefetching (filling an entire block) to work better since it won't prefetch the blocks of code that are infrequently used.
I actually meant that before 4.6 it was USELESS - I could see no advantage at all. By mistake I said it was "useful" but that is not what I meant.

I do take some care to avoid conditional instructions as much as possible but any chess program is relatively heavy on logic. But complex nested if/then statements - are you saying that PGO does the best with them? I could easily believe that.
Think about what you would do if you knew the history of every branch in your program.

If you had code like this:

if (c) { statements }

but you knew that most of the time (> 50%) c is false, you would want to write it like this:

if (c) goto xxxx
back_again:
...


and somewhere else you would do this:

xxxx:
{ statements }
go to back_again;


Now when that executes, the {statements} are not in the direct code path and don't get brought into cache, taking up space and time, as well as booting something else out.

That is about all PGO can do. Branch prediction is done in the hardware, so it can't help there. But if you have a ton of if statements, like a chess engine evaluation (for one place) then it can help. I see about a 10% improvement with icc. gcc has always been problematic for me and is unreliable when doing PGO. It either crashes or produces corrupt PGO temp files, particularly if I try to PGO everything including the threaded code...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The best compiler for chess, Intel or gcc or something e

Post by bob »

jdart wrote:As I have reported before, my experience is that, even though GCC PGO is broken, recent GCC versions w/o PGO are at least as good as Intel compiles with PGO. That is on my code, and results are likely to vary depending on what your code is like.

On Windows I do not use GCC, I use MSVC with PGO. But there too I have not seen better results with the Intel compiler.

--Jon
Here's my results for three test positions. First entry is icc + PGO, second is gcc with no PGO.

log.001: time=24.93 mat=0 n=59567345 fh=94% nps=2.4M
log.002: time=28.55 mat=0 n=59567345 fh=94% nps=2.1M

log.001: time=21.76 mat=0 n=57099213 fh=94% nps=2.6M
log.002: time=22.77 mat=0 n=57099213 fh=94% nps=2.5M

log.001: time=52.53 mat=0 n=115343692 fh=92% nps=2.2M
log.002: time=55.53 mat=0 n=115343692 fh=92% nps=2.1M


As you can see, icc wins every last time by upwards of 5%...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The best compiler for chess, Intel or gcc or something e

Post by bob »

abulmo wrote:
bob wrote: GCC has had so many PGO-related bugs I quit using it because it would crash when I tried to PGO a threaded program...
Did you try the -fprofile-correction flag?
No, but that seems like an amateurish way of fixing something, does it not?

"If your program crashes, then try this option to see if it will cure the problem..." Not exactly something I am interested in doing...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The best compiler for chess, Intel or gcc or something e

Post by bob »

abulmo wrote:
lucasart wrote:
abulmo wrote: -O4 does nothing more than -O3 with gcc 4.7
Well on mine it does (see my detailled post).
I guess it is just a problem of accuracy.
I reproduced your experiment on stockfish 2.3.1, except I run the bench 10 times and rebuild the executable several times:

1) Normal build "make build ARCH=x86-64" with -O3

Code: Select all

3696
3607
3661
3597
3649
3628
3648
3600
3648
3606
ie in average 3634 +/- 32 (1 standard deviation)
2) Normal build with -O4

Code: Select all

3622
3571
3609
3606
3596
3608
3609
3604
3605
3569
in average 3600 +/- 17
3) Another normal build with -O3

Code: Select all

3612
3563
3590
3599
3619
3555
3603
3601
3547
3607
in average 3590 +/- 25
My conclusion is that bench time fluctuates between runs and also between compilations. IMHO it is very hard to detect small speed enhancement accurately.
Here's what 10 consecutive test runs look like on my machine...

log.001: time=20.18 mat=0 n=59567345 fh=94% nps=3.0M
log.002: time=20.21 mat=0 n=59567345 fh=94% nps=2.9M
log.003: time=20.22 mat=0 n=59567345 fh=94% nps=2.9M
log.004: time=20.21 mat=0 n=59567345 fh=94% nps=2.9M
log.005: time=20.27 mat=0 n=59567345 fh=94% nps=2.9M
log.006: time=20.21 mat=0 n=59567345 fh=94% nps=2.9M
log.007: time=20.20 mat=0 n=59567345 fh=94% nps=2.9M
log.008: time=20.17 mat=0 n=59567345 fh=94% nps=3.0M
log.009: time=20.15 mat=0 n=59567345 fh=94% nps=3.0M
log.010: time=20.19 mat=0 n=59567345 fh=94% nps=3.0M

Smallest = 20.15 seconds, largest = 20.27 seconds. about .12 seconds variability.

When I test optimization changes, I run for at least 60 seconds. which takes me down to < 0.2% variability.

BTW for each of those, I ran a different copy of crafty in between each run, using a hash size big enough to flush everything out of memory, so that each started off with nothing in memory, nothing in cache...
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: The best compiler for chess, Intel or gcc or something e

Post by abulmo »

bob wrote:
abulmo wrote:
bob wrote: GCC has had so many PGO-related bugs I quit using it because it would crash when I tried to PGO a threaded program...
Did you try the -fprofile-correction flag?
No, but that seems like an amateurish way of fixing something, does it not?

"If your program crashes, then try this option to see if it will cure the problem..." Not exactly something I am interested in doing...
This is indeed a temporary workaround, but is is also the only way to make pgo compilation by gcc to its end on multithreaded applications.
I agree we may have some doubts on the quality of the information provided by this correction, and thus, of the quality of the optimization.
Richard