Zeta Dva v0402* gets an x2 NPS speedup by -O3, CFish 13 x3, both with GCC and Intel i5-6500 with AVX2.
*acknowledging that it is a quite simple engine compared to the fish.
--
Srdja
How much speedup do you get by compiler optimizations?
Moderator: Ras
-
- Posts: 3040
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
-
- Posts: 445
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: How much speedup do you get by compiler optimizations?
As I am playing with my Othello engine (Edax), here are the impact of various compilation options:
I display time to do a benchmark rescaled to 100 for clang -O3.
So, for this program and clang, -O3 is 10× faster than -O0; -O2 is as fast as -O3, -Os & -O1 are slightly slower.
gcc is 8% slower than clang. The -O3 option is x7 faster than -O0.
Other options: flto as no impact for clang and a slightly negative one for gcc, which is expected as all the C files are gathered into a single one. Disabling PGO cost 3% for clang and 0.4% for gcc.
The -march option as a big impact too, as the engine contains dedicated code & algorithms depending on the targeted cpu.
Code: Select all
clang -O3 100.0 ± 0.2
clang -O2 99.9 ± 0.2
clang -Os 100.6 ± 0.2
clang -O1 102.2 ± 0.2
clang -O0 1041.3 ± 0.2
gcc -O3 108.0 ± 0.2
gcc -O2 109.4 ± 0.2
gcc -O1 121.5 ± 0.2
gcc -Os 123.3 ± 0.2
gcc -O0 747.3 ± 0.2
So, for this program and clang, -O3 is 10× faster than -O0; -O2 is as fast as -O3, -Os & -O1 are slightly slower.
gcc is 8% slower than clang. The -O3 option is x7 faster than -O0.
Other options: flto as no impact for clang and a slightly negative one for gcc, which is expected as all the C files are gathered into a single one. Disabling PGO cost 3% for clang and 0.4% for gcc.
The -march option as a big impact too, as the engine contains dedicated code & algorithms depending on the targeted cpu.
Code: Select all
x86-64-v3 100.0 (avx2)
x86-64-v2 113.5 (popcount, ...)
x86-64 119.6 (sse)
Richard Delorme
-
- Posts: 3040
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: How much speedup do you get by compiler optimizations?
Just retested with explicit -O0 flag in gcc and arch native via bench command (single thread):
-O0
-O3
With my machine and setup Zeta Dva gets x3.45 NPS speedup by -O3 and CFish similar x3.49.
--
Srdja
-O0
Code: Select all
Zeta Dva 0402: 510K NPS
CFish 13: 504K NPS
Code: Select all
Zeta Dva 0402: 1,76M NPS
CFish 13: 1,763M NPS
--
Srdja
-
- Posts: 3040
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: How much speedup do you get by compiler optimizations?
Code: Select all
Zeta Dva 0402: 1,76M NPS
CFish 13: 1,763M NPS
--
Srdja
-
- Posts: 12143
- Joined: Thu Mar 09, 2006 12:57 am
- Location: Birmingham UK
- Full name: Graham Laight
Re: How much speedup do you get by compiler optimizations?
I know it's futile to ask, but everyone should use commas for 000 separation.
Clear: 3,146,213.431
Ambiguous: 3.146.213.431 (1) 3 million or 3 billion? (2) looks like an IP address
Want to attract exceptional people? Be exceptional.
-
- Posts: 3040
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: How much speedup do you get by compiler optimizations?
In Germany we write 3.146.213,431 so dots and commas are vice versa compared to en/us notation.
--
Srdja
-
- Posts: 445
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: How much speedup do you get by compiler optimizations?
Same in French. The English way is the exception, not the common way...
Richard Delorme
-
- Posts: 539
- Joined: Tue Jul 03, 2018 10:19 am
- Full name: Folkert van Heusden
Re: How much speedup do you get by compiler optimizations?
don't forget all the other interesting flags that are not always enabled when using -O3.
I *think* e.g. -mpopcnt is such a flag.
-
- Posts: 175
- Joined: Tue Jun 27, 2017 11:01 pm
- Location: Lubumbashi
- Full name: Yves De Billoëz
Re: How much speedup do you get by compiler optimizations?
Well, it would be an error in most of the world. Why don't we impose IN notation, with LAHK lacking in the rest of the world .... Would even be more fun.I know it's futile to ask, but everyone should use commas for 000 separation.
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
Once owner of a Mephisto I, II, challenger, ... chess computer.
-
- Posts: 529
- Joined: Sat Mar 02, 2013 11:31 pm
Re: How much speedup do you get by compiler optimizations?
I ran tests on Mayhem. To my surprise -O1 was the fastest by big margin. But -Os and -O0 were the slowest as suspected.
I used speed command which runs long benchmarks. All started at the same time.
Computer:
I used speed command which runs long benchmarks. All started at the same time.
Computer:
Code: Select all
ThinkPad-E14-Gen-2:~$ nicenux.py
KERNEL: 6.8.0-51-generic
OS: Linux Mint 22
ARCH: x86_64
CPU: AMD Ryzen 7 4800U with Radeon Graphics 16 @ 1.41GHz ( 2.70% )
RAM: 14.85 GiB / 4.44 GiB ( 32.50% )
DISK: 233.18 GiB / 83.04 GiB ( 35.61% )
Code: Select all
Mayhem optimizations:
1: -O1:
Result: 70 / 70
Nodes: 5370172030
Time(ms): 581425
NPS: 9236224
2: -O3:
Result: 70 / 70
Nodes: 4946284003
Time(ms): 581344
NPS: 8508359
3: -Ofast:
Result: 70 / 70
Nodes: 4939251519
Time(ms): 581686
NPS: 8491267
4: -O2
Result: 70 / 70
Nodes: 4802257250
Time(ms): 582135
NPS: 8249387
5: -Os
Result: 70 / 70
Nodes: 4021213865
Time(ms): 583850
NPS: 6887409
6: -O0
Result: 68 / 70
Nodes: 1376407130
Time(ms): 590548
NPS: 2330728