Page 2 of 4
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Sun Nov 26, 2017 7:36 pm
by syzygy
D Sceviour wrote:syzygy wrote:That higher optimisation levels can turn "minor bugs" into crashes is not a reason not to use optimisation. Such "minor bugs" simply need to be fixed. If use of an uninitialised variable does not crash the program, it will likely make it produce incorrect results (which can be almost impossible to notice in a chess engine, except for a measurable decrease in playing strength).
The gcc <variable> "may be used uninitialized" gives unpredictable bogus results. I visually inspect each element and then ignore the warnings if there is nothing wrong. Of course, something else may be triggering the warning.
Yes, gcc sometimes produces bogus warnings (the most annoying being array out of bounds where no array is accessed out of bounds), but where it is right, the proper solution is to fix it and not to accept it as a "minor bug" and compile without optimisation hoping there is no harm
If there is a bug, a crash is the best thing that can happen since it ensures that the bug is detected. The problem with some of these bugs is that the crash (not the bug) goes away when compiling in debug mode. That makes it harder to locate them. But nowadays it is pretty easy to locate them using -fsanitize.
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Sun Nov 26, 2017 8:02 pm
by AndrewGrant
Code: Select all
// Using Linked makefile
BENCH DEPTH | COMP | NPS
20 | GCC4.8 | 3592466
20 | GCC5.4 | 3367175
20 | GCC6.4 | 3366372
20 | GCC7.2 | 3246964
// Using Linked makefile plus -flto
BENCH DEPTH | COMP | NPS
20 | GCC4.8 | 3611056
20 | GCC5.4 | 3498027
20 | GCC6.4 | 3550826
20 | GCC7.2 | 3323299
GCC4.8 with -flto BENCH 22 NPS=3553729
GCC4.8 without -flto BENCH 22 NPS=3587155
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Sun Nov 26, 2017 8:15 pm
by AndrewGrant
Code: Select all
// Using Linked makefile
BENCH DEPTH | COMP | NPS
20 | GCC4.8 | 3592466
20 | GCC5.4 | 3367175
20 | GCC6.4 | 3366372
20 | GCC7.2 | 3246964
// Using Linked makefile plus -flto
BENCH DEPTH | COMP | NPS
20 | GCC4.8 | 3611056
20 | GCC5.4 | 3498027
20 | GCC6.4 | 3550826
20 | GCC7.2 | 3323299
GCC4.8 with -flto BENCH 22 NPS=3553729
GCC4.8 without -flto BENCH 22 NPS=3587155
[/quote]
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Sun Nov 26, 2017 10:16 pm
by Ras
syzygy wrote:He posted a link to the Makefile (he compiles with -O3).
It is well-known that -O3 can produce slower binaries. Should also be compared with what happens with -O2.
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Tue Nov 28, 2017 4:08 am
by Dann Corbit
I definitely get best performance with gcc 7.2. It beats all my other compilers handily.
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Tue Nov 28, 2017 4:55 pm
by AndrewGrant
What OS?
What CPU?
What Flags?
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Tue Nov 28, 2017 5:39 pm
by Dann Corbit
AndrewGrant wrote:What OS?
Windows 10, Windows 2012 Server
What CPU?
Intel and AMD
What Flags?
O3, pgo and the typical flag set. Something I always add that I rarely see others use is:
Code: Select all
ifeq ($(comp),mingw)
CXXFLAGS += -mtune=native
endif
and I always link statically in case someone wants a copy of the binary.
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Tue Nov 28, 2017 6:14 pm
by AndrewGrant
Final question. GCC or G++?
I tried doing a PGO build, and actually got a different bench... which really confuses me.
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Tue Nov 28, 2017 7:15 pm
by Dann Corbit
AndrewGrant wrote:Final question. GCC or G++?
Both.
I tried doing a PGO build, and actually got a different bench... which really confuses me.
I guess that you have undefined behavior in your code.
My recommendation is to use both GCC and CLANG with warnings turned up to crazy maximum and examine each and every one. (Expect thousands).
Now, I assume by bench you mean something that should be reproducible like perft or perhaps a single threaded search. I would not expect a multi-threaded search to give the same result even on the same machine and binary when repeated. It is possible for a somewhat different single threaded bench on a search to be correct. What I mean is that the code is slightly different with things like inlining instead of function calls. Most of the time these changes make no difference. If you have floating point anywhere in your program, that can do all kinds of whacky things. For instance, the total of a long column of floating point numbers which vary greatly in size will be different if you sum them forward or backwards and different again if you sort them first. Even using Kahan's adder won't completely fix that sort of thing. It only reduces the effect.
What exactly does your bench do?
Re: gcc4.8 outperforming gcc5, gcc6, gcc7
Posted: Tue Nov 28, 2017 7:28 pm
by AndrewGrant
I already do gcc -Wall -Wextra -Wshadow, but I'll look for more flags. I currently get no warnings when I compile.
I have a bench the same way stockfish does. Do a depth 13 search on a set of positions. Single threaded. I have no problem reproducing the bench on any non PGO compile, accross the 7+ machines I've run it on
I'll see what I can do tonight, when I get back to my computer with clang