Page 2 of 4

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Sun Nov 26, 2017 7:36 pm
by syzygy
D Sceviour wrote:
syzygy wrote:That higher optimisation levels can turn "minor bugs" into crashes is not a reason not to use optimisation. Such "minor bugs" simply need to be fixed. If use of an uninitialised variable does not crash the program, it will likely make it produce incorrect results (which can be almost impossible to notice in a chess engine, except for a measurable decrease in playing strength).
The gcc <variable> "may be used uninitialized" gives unpredictable bogus results. I visually inspect each element and then ignore the warnings if there is nothing wrong. Of course, something else may be triggering the warning.
Yes, gcc sometimes produces bogus warnings (the most annoying being array out of bounds where no array is accessed out of bounds), but where it is right, the proper solution is to fix it and not to accept it as a "minor bug" and compile without optimisation hoping there is no harm ;-)

If there is a bug, a crash is the best thing that can happen since it ensures that the bug is detected. The problem with some of these bugs is that the crash (not the bug) goes away when compiling in debug mode. That makes it harder to locate them. But nowadays it is pretty easy to locate them using -fsanitize.

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Sun Nov 26, 2017 8:02 pm
by AndrewGrant

Code: Select all

// Using Linked makefile
BENCH DEPTH |  COMP  | NPS
     20     | GCC4.8 | 3592466
     20     | GCC5.4 | 3367175
     20     | GCC6.4 | 3366372
     20     | GCC7.2 | 3246964
     
// Using Linked makefile plus -flto
BENCH DEPTH |  COMP  | NPS
     20     | GCC4.8 | 3611056
     20     | GCC5.4 | 3498027
     20     | GCC6.4 | 3550826
     20     | GCC7.2 | 3323299
     
GCC4.8 with    -flto BENCH 22 NPS=3553729

GCC4.8 without -flto BENCH 22 NPS=3587155

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Sun Nov 26, 2017 8:15 pm
by AndrewGrant

Code: Select all

// Using Linked makefile
BENCH DEPTH |  COMP  | NPS
     20     | GCC4.8 | 3592466
     20     | GCC5.4 | 3367175
     20     | GCC6.4 | 3366372
     20     | GCC7.2 | 3246964
     
// Using Linked makefile plus -flto
BENCH DEPTH |  COMP  | NPS
     20     | GCC4.8 | 3611056
     20     | GCC5.4 | 3498027
     20     | GCC6.4 | 3550826
     20     | GCC7.2 | 3323299
     
GCC4.8 with    -flto BENCH 22 NPS=3553729

GCC4.8 without -flto BENCH 22 NPS=3587155
[/quote]

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Sun Nov 26, 2017 10:16 pm
by Ras
syzygy wrote:He posted a link to the Makefile (he compiles with -O3).
It is well-known that -O3 can produce slower binaries. Should also be compared with what happens with -O2.

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Tue Nov 28, 2017 4:08 am
by Dann Corbit
I definitely get best performance with gcc 7.2. It beats all my other compilers handily.

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Tue Nov 28, 2017 4:55 pm
by AndrewGrant
What OS?
What CPU?
What Flags?

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Tue Nov 28, 2017 5:39 pm
by Dann Corbit
AndrewGrant wrote:What OS?
Windows 10, Windows 2012 Server

What CPU?
Intel and AMD

What Flags?
O3, pgo and the typical flag set. Something I always add that I rarely see others use is:

Code: Select all

	ifeq ($&#40;comp&#41;,mingw&#41;
		CXXFLAGS += -mtune=native
        endif
and I always link statically in case someone wants a copy of the binary.

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Tue Nov 28, 2017 6:14 pm
by AndrewGrant
Final question. GCC or G++?

I tried doing a PGO build, and actually got a different bench... which really confuses me.

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Tue Nov 28, 2017 7:15 pm
by Dann Corbit
AndrewGrant wrote:Final question. GCC or G++?
Both.

I tried doing a PGO build, and actually got a different bench... which really confuses me.
I guess that you have undefined behavior in your code.
My recommendation is to use both GCC and CLANG with warnings turned up to crazy maximum and examine each and every one. (Expect thousands).

Now, I assume by bench you mean something that should be reproducible like perft or perhaps a single threaded search. I would not expect a multi-threaded search to give the same result even on the same machine and binary when repeated. It is possible for a somewhat different single threaded bench on a search to be correct. What I mean is that the code is slightly different with things like inlining instead of function calls. Most of the time these changes make no difference. If you have floating point anywhere in your program, that can do all kinds of whacky things. For instance, the total of a long column of floating point numbers which vary greatly in size will be different if you sum them forward or backwards and different again if you sort them first. Even using Kahan's adder won't completely fix that sort of thing. It only reduces the effect.


What exactly does your bench do?

Re: gcc4.8 outperforming gcc5, gcc6, gcc7

Posted: Tue Nov 28, 2017 7:28 pm
by AndrewGrant
I already do gcc -Wall -Wextra -Wshadow, but I'll look for more flags. I currently get no warnings when I compile.

I have a bench the same way stockfish does. Do a depth 13 search on a set of positions. Single threaded. I have no problem reproducing the bench on any non PGO compile, accross the 7+ machines I've run it on

I'll see what I can do tonight, when I get back to my computer with clang