How much speedup do you get by compiler optimizations?

smatovic · Post by **smatovic** » Sun Jan 05, 2025 9:21 am

Zeta Dva v0402* gets an x2 NPS speedup by -O3, CFish 13 x3, both with GCC and Intel i5-6500 with AVX2.

*acknowledging that it is a quite simple engine compared to the fish.

--
Srdja

abulmo2 · Post by **abulmo2** » Mon Jan 06, 2025 12:54 pm

As I am playing with my Othello engine (Edax), here are the impact of various compilation options:

Code: Select all

clang -O3  100.0 ± 0.2
clang -O2   99.9 ± 0.2
clang -Os  100.6 ± 0.2
clang -O1  102.2 ± 0.2
clang -O0 1041.3 ± 0.2
gcc   -O3  108.0 ± 0.2
gcc   -O2  109.4 ± 0.2
gcc   -O1  121.5 ± 0.2
gcc   -Os  123.3 ± 0.2
gcc   -O0  747.3 ± 0.2

I display time to do a benchmark rescaled to 100 for clang -O3.
So, for this program and clang, -O3 is 10× faster than -O0; -O2 is as fast as -O3, -Os & -O1 are slightly slower.
gcc is 8% slower than clang. The -O3 option is x7 faster than -O0.
Other options: flto as no impact for clang and a slightly negative one for gcc, which is expected as all the C files are gathered into a single one. Disabling PGO cost 3% for clang and 0.4% for gcc.
The -march option as a big impact too, as the engine contains dedicated code & algorithms depending on the targeted cpu.

Code: Select all

x86-64-v3 100.0 (avx2)
x86-64-v2 113.5 (popcount, ...)
x86-64    119.6 (sse)

smatovic · Post by **smatovic** » Tue Jan 07, 2025 1:06 pm

Just retested with explicit -O0 flag in gcc and arch native via bench command (single thread):

-O0

Code: Select all

Zeta Dva 0402: 510K NPS
CFish 13: 504K NPS

-O3

Code: Select all

Zeta Dva 0402: 1,76M NPS
CFish 13: 1,763M NPS

With my machine and setup Zeta Dva gets x3.45 NPS speedup by -O3 and CFish similar x3.49.

--
Srdja

smatovic · Post by **smatovic** » Tue Jan 07, 2025 2:39 pm

Code: Select all

Zeta Dva 0402: 1,76M NPS
CFish 13: 1,763M NPS

Aeh, German-En typo, dot instead comma.

--
Srdja

towforce · Post by **towforce** » Tue Jan 07, 2025 4:53 pm

smatovic wrote: ↑Tue Jan 07, 2025 2:39 pmAeh, German-En typo, dot instead comma.

I know it's futile to ask, but everyone should use commas for 000 separation.

Clear: 3,146,213.431

Ambiguous: 3.146.213.431 (1) 3 million or 3 billion? (2) looks like an IP address

smatovic · Post by **smatovic** » Tue Jan 07, 2025 5:25 pm

towforce wrote: ↑Tue Jan 07, 2025 4:53 pm Ambiguous: 3.146.213.431 (1) 3 million or 3 billion? (2) looks like an IP address

In Germany we write 3.146.213,431 so dots and commas are vice versa compared to en/us notation.

--
Srdja

abulmo2 · Post by **abulmo2** » Tue Jan 07, 2025 10:58 pm

smatovic wrote: ↑Tue Jan 07, 2025 5:25 pm
towforce wrote: ↑Tue Jan 07, 2025 4:53 pm Ambiguous: 3.146.213.431 (1) 3 million or 3 billion? (2) looks like an IP address
In Germany we write 3.146.213,431 so dots and commas are vice versa compared to en/us notation.

--
Srdja

Same in French. The English way is the exception, not the common way...

flok · Post by **flok** » Thu Jan 09, 2025 5:52 pm

smatovic wrote: ↑Sun Jan 05, 2025 9:21 am Zeta Dva v0402* gets an x2 NPS speedup by -O3, CFish 13 x3, both with GCC and Intel i5-6500 with AVX2.

*acknowledging that it is a quite simple engine compared to the fish.

--
Srdja

don't forget all the other interesting flags that are not always enabled when using -O3.
I *think* e.g. -mpopcnt is such a flag.

ydebilloez · Post by **ydebilloez** » Fri Jan 10, 2025 8:17 am

I know it's futile to ask, but everyone should use commas for 000 separation.

Well, it would be an error in most of the world. Why don't we impose IN notation, with LAHK lacking in the rest of the world .... Would even be more fun.

JohnWoe · Post by **JohnWoe** » Wed Jan 15, 2025 11:27 pm

I ran tests on Mayhem. To my surprise -O1 was the fastest by big margin. But -Os and -O0 were the slowest as suspected.
I used speed command which runs long benchmarks. All started at the same time.

Computer:

Code: Select all

ThinkPad-E14-Gen-2:~$ nicenux.py 
KERNEL: 6.8.0-51-generic
OS:     Linux Mint 22
ARCH:   x86_64
CPU:    AMD Ryzen 7 4800U with Radeon Graphics 16 @ 1.41GHz ( 2.70% )
RAM:    14.85 GiB / 4.44 GiB ( 32.50% )
DISK:   233.18 GiB / 83.04 GiB ( 35.61% )

Code: Select all

Mayhem optimizations:

1: -O1:
Result:   70 / 70
Nodes:    5370172030
Time(ms): 581425
NPS:      9236224

2: -O3:
Result:   70 / 70
Nodes:    4946284003
Time(ms): 581344
NPS:      8508359

3: -Ofast:
Result:   70 / 70
Nodes:    4939251519
Time(ms): 581686
NPS:      8491267

4: -O2
Result:   70 / 70
Nodes:    4802257250
Time(ms): 582135
NPS:      8249387

5: -Os
Result:   70 / 70
Nodes:    4021213865
Time(ms): 583850
NPS:      6887409

6: -O0
Result:   68 / 70
Nodes:    1376407130
Time(ms): 590548
NPS:      2330728

How much speedup do you get by compiler optimizations?

How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?

Re: How much speedup do you get by compiler optimizations?