How much speedup do you get by compiler optimizations?

Discussion of chess software programming and technical issues.

Moderator: Ras

smatovic
Posts: 3040
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

How much speedup do you get by compiler optimizations?

Post by smatovic »

Zeta Dva v0402* gets an x2 NPS speedup by -O3, CFish 13 x3, both with GCC and Intel i5-6500 with AVX2.

*acknowledging that it is a quite simple engine compared to the fish.

--
Srdja
abulmo2
Posts: 445
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: How much speedup do you get by compiler optimizations?

Post by abulmo2 »

As I am playing with my Othello engine (Edax), here are the impact of various compilation options:

Code: Select all

clang -O3  100.0 ± 0.2
clang -O2   99.9 ± 0.2
clang -Os  100.6 ± 0.2
clang -O1  102.2 ± 0.2
clang -O0 1041.3 ± 0.2
gcc   -O3  108.0 ± 0.2
gcc   -O2  109.4 ± 0.2
gcc   -O1  121.5 ± 0.2
gcc   -Os  123.3 ± 0.2
gcc   -O0  747.3 ± 0.2
I display time to do a benchmark rescaled to 100 for clang -O3.
So, for this program and clang, -O3 is 10× faster than -O0; -O2 is as fast as -O3, -Os & -O1 are slightly slower.
gcc is 8% slower than clang. The -O3 option is x7 faster than -O0.
Other options: flto as no impact for clang and a slightly negative one for gcc, which is expected as all the C files are gathered into a single one. Disabling PGO cost 3% for clang and 0.4% for gcc.
The -march option as a big impact too, as the engine contains dedicated code & algorithms depending on the targeted cpu.

Code: Select all

x86-64-v3 100.0 (avx2)
x86-64-v2 113.5 (popcount, ...)
x86-64    119.6 (sse)
Richard Delorme
smatovic
Posts: 3040
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: How much speedup do you get by compiler optimizations?

Post by smatovic »

Just retested with explicit -O0 flag in gcc and arch native via bench command (single thread):

-O0

Code: Select all

Zeta Dva 0402: 510K NPS
CFish 13: 504K NPS
-O3

Code: Select all

Zeta Dva 0402: 1,76M NPS
CFish 13: 1,763M NPS
With my machine and setup Zeta Dva gets x3.45 NPS speedup by -O3 and CFish similar x3.49.

--
Srdja
smatovic
Posts: 3040
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: How much speedup do you get by compiler optimizations?

Post by smatovic »

Code: Select all

Zeta Dva 0402: 1,76M NPS
CFish 13: 1,763M NPS
Aeh, German-En typo, dot instead comma.

--
Srdja
User avatar
towforce
Posts: 12143
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: How much speedup do you get by compiler optimizations?

Post by towforce »

smatovic wrote: Tue Jan 07, 2025 2:39 pmAeh, German-En typo, dot instead comma.

I know it's futile to ask, but everyone should use commas for 000 separation.

Clear: 3,146,213.431

Ambiguous: 3.146.213.431 (1) 3 million or 3 billion? (2) looks like an IP address
Want to attract exceptional people? Be exceptional.
smatovic
Posts: 3040
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: How much speedup do you get by compiler optimizations?

Post by smatovic »

towforce wrote: Tue Jan 07, 2025 4:53 pm Ambiguous: 3.146.213.431 (1) 3 million or 3 billion? (2) looks like an IP address
In Germany we write 3.146.213,431 so dots and commas are vice versa compared to en/us notation.

--
Srdja
abulmo2
Posts: 445
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: How much speedup do you get by compiler optimizations?

Post by abulmo2 »

smatovic wrote: Tue Jan 07, 2025 5:25 pm
towforce wrote: Tue Jan 07, 2025 4:53 pm Ambiguous: 3.146.213.431 (1) 3 million or 3 billion? (2) looks like an IP address
In Germany we write 3.146.213,431 so dots and commas are vice versa compared to en/us notation.

--
Srdja
Same in French. The English way is the exception, not the common way... :wink:
Richard Delorme
User avatar
flok
Posts: 539
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: How much speedup do you get by compiler optimizations?

Post by flok »

smatovic wrote: Sun Jan 05, 2025 9:21 am Zeta Dva v0402* gets an x2 NPS speedup by -O3, CFish 13 x3, both with GCC and Intel i5-6500 with AVX2.

*acknowledging that it is a quite simple engine compared to the fish.

--
Srdja
don't forget all the other interesting flags that are not always enabled when using -O3.
I *think* e.g. -mpopcnt is such a flag.
ydebilloez
Posts: 175
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Re: How much speedup do you get by compiler optimizations?

Post by ydebilloez »

I know it's futile to ask, but everyone should use commas for 000 separation.
Well, it would be an error in most of the world. Why don't we impose IN notation, with LAHK lacking in the rest of the world .... Would even be more fun.
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
JohnWoe
Posts: 529
Joined: Sat Mar 02, 2013 11:31 pm

Re: How much speedup do you get by compiler optimizations?

Post by JohnWoe »

I ran tests on Mayhem. To my surprise -O1 was the fastest by big margin. But -Os and -O0 were the slowest as suspected.
I used speed command which runs long benchmarks. All started at the same time.

Computer:

Code: Select all

ThinkPad-E14-Gen-2:~$ nicenux.py 
KERNEL: 6.8.0-51-generic
OS:     Linux Mint 22
ARCH:   x86_64
CPU:    AMD Ryzen 7 4800U with Radeon Graphics 16 @ 1.41GHz ( 2.70% )
RAM:    14.85 GiB / 4.44 GiB ( 32.50% )
DISK:   233.18 GiB / 83.04 GiB ( 35.61% )

Code: Select all

Mayhem optimizations:

1: -O1:
Result:   70 / 70
Nodes:    5370172030
Time(ms): 581425
NPS:      9236224

2: -O3:
Result:   70 / 70
Nodes:    4946284003
Time(ms): 581344
NPS:      8508359

3: -Ofast:
Result:   70 / 70
Nodes:    4939251519
Time(ms): 581686
NPS:      8491267

4: -O2
Result:   70 / 70
Nodes:    4802257250
Time(ms): 582135
NPS:      8249387

5: -Os
Result:   70 / 70
Nodes:    4021213865
Time(ms): 583850
NPS:      6887409

6: -O0
Result:   68 / 70
Nodes:    1376407130
Time(ms): 590548
NPS:      2330728