Compiler switches

Discussion of chess software programming and technical issues.

Moderators: hgm, chrisw, Rebel

jarkkop
Posts: 198
Joined: Thu Mar 09, 2006 2:44 am
Location: Helsinki, Finland

Compiler switches

Post by jarkkop »

I was wondering what are the best /safest optimization level.

The problem is that in Intel compiler with /O3 level gives fastest application in NPS but it doesn't find the solution so fast always that /fast switch.
What kind of tricks the compiler does because the search results are not same either with different switches.

Jarkko
Vempele

Re: Compiler switches

Post by Vempele »

The problem is that in Intel compiler with /O3 level gives fastest application in NPS but it doesn't find the solution so fast always that /fast switch.
Are you, by any chance, compiling Glaurung or another MP engine? Glaurung uses 2 threads (even if you only have one CPU it still does it) by default, which makes the analysis unpredictable.
jarkkop
Posts: 198
Joined: Thu Mar 09, 2006 2:44 am
Location: Helsinki, Finland

Re: Compiler switches

Post by jarkkop »

Toga II 1.3x4 was the program in question
User avatar
abik
Posts: 822
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

Dear Jarkko,
Ignoring non-determinism caused by timing functions or multi-threading (and ignoring floating-point differences due to e.g. re-association less relevant in the chess context), all compiler optimizations levels should yield the same results, otherwise there is a compiler bug. Are you able to show a difference under deterministic circumstances? If so, we would like to hear about that, for example at the Intel software forum (http://softwarecommunity.intel.com/isn/ ... us/Forums/).
Aart Bik
http://www.aartbik.com/
User avatar
abik
Posts: 822
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

… and, just to be clear, with “same results” I mean that different compiler optimizations should yield binaries that compute the same score, principal variation, and number of nodes visited (but not the nodes-per-second rating which depends on execution time; if execution times would not change over optimizations, there would not be much use of them). Also, the program must be deterministic, so no time-dependent decisions such as:

search() {

if (time_is_up()) return;
}

Floating-point computations may yield different results under different optimizations switches. Most numerical programmers are aware of this and use “stable” algorithms to guard against those differences. Less experienced programmers sometimes rely on exact outcomes, as in the example below:

double x = ….;
if (x == 42.0) {
play_decent_chess();
}
else {
resign_immediately();
}

Such programming constructs may work for years using the same compiler, platform and optimization levels, until one day a different compiler yields x == 41.999999999999 and the rating of the program drops dramatically….
:wink:
User avatar
hgm
Posts: 28206
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Compiler switches

Post by hgm »

This is not entirely true. Some optimalizations at the higher optimization levels are 'unsafe', bcause they make certain assumptions that might not be fulfilled (e.g. that different pointers do not point to the same object, or that pointers will not get to point to simple variables, so that after

p=iniptr(&i);k=23*i+1;*p++=10;n=23*i+1;

the optimizer can still trust that the first and the second i are the same i so that it can assign the same value to k and n, without knowing what does the function iniptr(), defined in an other file).
User avatar
abik
Posts: 822
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

I guess you are referring to switches like –Qansi-alias and keywords like “restrict”. Note, however, that these are user-assertions, not actual optimization levels. For example, -Qansi-alias asserts that the program complies to the ANSI aliasing rules (and, then, the program may be optimized accordingly). But you are right that if the programmer gives incorrect assertions, results may vary.
User avatar
Denis P. Mendoza
Posts: 415
Joined: Fri Dec 15, 2006 9:46 pm
Location: Philippines

Re: Compiler switches

Post by Denis P. Mendoza »

jarkkop wrote:I was wondering what are the best /safest optimization level.

The problem is that in Intel compiler with /O3 level gives fastest application in NPS but it doesn't find the solution so fast always that /fast switch.
What kind of tricks the compiler does because the search results are not same either with different switches.

Jarkko
Jarrko,

Aart made a very informative explanation, which added to my knowledge too (Thanks!). But answering your question about the switches, there's one article on our homesite which may help:
http://www.superchessengine.com/programming.htm

I'm not an expert, but the switches by Intel and MSVC are almost the same. I took some notes on the page as seen below.
Under Microsoft VS Express : (once you have a makefile :D )

Go to MS compiler command prompt . Then typed these two commands.

For language c

cl *.c
cl -o yourprogram.exe *.obj

or for Language Cpp engine

cl *.cpp
cl -o yourprogram.exe *.obj

or

1) open cmd.Exe
2) execute vcvars.bat (inside VC++ files) for path & others variables
3) do nmake in command line

To optimize your .exe

Bryan Hoffman : " I compile from the command line and no make files. Start out with this (note you will need the compiler that is able to do profile guided optimization (PGO) "

for language c


cl /O2 /GL /c *.c
link /LTCG:PGI /PGD:yourprogram.pgd *.obj /RELEASE /out:yourprogram.exe

Now run your program thur some positions about 25 and analyze for 10 seconds each

link /LTCG:PGO /PGD:yourprogram.pgd /RELEASE *.obj

for language C++

cl /O2 /GL /c *.cpp
link /LTCG:PGI /PGD:yourprogram.pgd *.obj /RELEASE /out:yourprogram.exe

Now run your program thur some positions about 25 and analyze for 10 seconds each

link /LTCG:PGO /PGD:yourprogram.pgd /RELEASE *.obj



Here's a laymans quide which may help:

When compiling Toga/Fruit engines, these are sample switches compared to above for Intel C++ compiler (if you have a GUI) - but not exactly the same.

//nologo /ML /W3 /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /Fp"Release/TogaII.pch" /YX /Fo"Release/" /Fd"Release/" /FD /GL /Qprof_gen /c

Compile an executable in "Release" mode.

Then play a couple of games or just epd tests for profiling.

After gathering enough *.dyn files, just changed Qprof_gen to Qprof_use

Finally, compile again the resulting executable using the profiling data.

There you have it, you have your TogaII optimized engine.

There are still some additional flags to add for further enhancing your compile. It's a trial and error thing on my part. But understanding the relevance of the code with the switches will likely be a tool to make it work better.
jarkkop
Posts: 198
Joined: Thu Mar 09, 2006 2:44 am
Location: Helsinki, Finland

Re: Compiler switches

Post by jarkkop »

Compiler 9.1.034
Prosessor Intel Core Duo2 E4300@2.6GHz, 2GB RAM
Source code: Toga 1.2.1a

There are differencies in PV, any comments?
Below are test runs with different combinations
-----------------------------------------------------------------------------

icl /O1 /G7 /QxT /Zm1000 /W1 *.cpp
info multipv 1 depth 15 seldepth 31 score cp 11 time 9250 nodes 8185985 pv g1f3
g8f6 d2d4 e7e6 e2e3 b8c6 f1d3 d7d5 e1g1 f8d6 b1c3 e8g8 c3b5 d6b4 c1d2



icl /O2 /G7 /QxT /Zm1000 /W1 *.cpp
info multipv 1 depth 15 seldepth 31 score cp 11 time 8204 nodes 8185985 pv g1f3
g8f6 d2d4 e7e6 e2e3 b8c6 f1d3 d7d5 e1g1 f8d6 b1c3 e8g8 c3b5 d6b4 c1d2



icl /O3 /G7 /QxT /Zm1000 /W1 *.cpp
info multipv 1 depth 15 seldepth 31 score cp 11 time 8625 nodes 8185985 pv g1f3
g8f6 d2d4 e7e6 e2e3 b8c6 f1d3 d7d5 e1g1 f8d6 b1c3 e8g8 c3b5 d6b4 c1d2



icl /O3 /G7 /QxT /GA /GF /Gs /Zm1000 /W1 /Qunroll /Qprof_gen *.cpp
icl /O3 /G7 /QxT /GA /GF /Zm1000 /W1 /Qunroll /Qprof_use /Qipo *.cpp
info multipv 1 depth 15 seldepth 31 score cp 11 time 6266 nodes 8185985 pv g1f3
g8f6 d2d4 e7e6 e2e3 b8c6 f1d3 d7d5 e1g1 f8d6 b1c3 e8g8 c3b5 d6b4 c1d2



icl /fast /G7 /QxT /GA /GF /Gs /Zm1000 /W1 /Qunroll /Qprof_gen *.cpp
icl /fast /G7 /QxT /GA /GF /Zm1000 /W1 /Qunroll /Qprof_use /Qipo *.cpp
info multipv 1 depth 15 seldepth 44 score cp 16 time 17047 nodes 21493535 pv b1c
3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1



icl /fast /G7 /GA /GF /Gs /Zm1000 /W1 /Qunroll /Qprof_gen *.cpp
icl /fast /G7 /GA /GF /Gs /Zm1000 /W1 /Qunroll /Qprof_use *.cpp
info multipv 1 depth 15 seldepth 44 score cp 16 time 16890 nodes 21493535 pv b1c
3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1



icl /fast /G7 /GA /GF /Zm1000 /W1 /Qunroll /Qipo *.cpp
info multipv 1 depth 15 seldepth 44 score cp 16 time 20250 nodes 21493535 pv b1c
3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1



icl /O3 /G7 /GA /GF /Zm1000 /W1 /Qunroll /Qipo *.cpp
info multipv 1 depth 15 seldepth 44 score cp 16 time 19187 nodes 21493535 pv b1c
3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1
User avatar
abik
Posts: 822
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

Without actually downloading and debugging the source code, I am obviously forced to guess. Since the results are consistent between two sets of runs, I assume all non-determinism (threads, timers, transposition tables, etc.) has been taken care off. My first guess in any other context than chess would be that floating-point optimizations cause the difference. Perhaps someone familiar with the source could comment on the occurrence of floating-point operations that may modify behavior? Also, uninitialized variables (i.e. programmer error) have been known to change behavior between optimizations levels. Finally, there could be a real compiler bug of course. If getting the source is easy and without any legal restrictions, I don’t mind combining my personal hobby and job briefly and have a look at this. Please send me instructions off-line. But I cannot get to this until Monday, and can at this point, hopefully understandable, not spent too much time on it either.