Performance loss when removing unused function

Discussion of chess software programming and technical issues.

Moderator: Ras

OliverBr
Posts: 865
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Performance loss when removing unused function

Post by OliverBr »

maksimKorzh wrote: Thu Oct 29, 2020 2:54 am
OliverBr wrote: Thu Oct 29, 2020 12:08 am there is a performance boost from

Code: Select all

Nodes: 95905038 cs: 2975 knps: 4123
to

Code: Select all

Nodes: 95905038 cs: 2922 knps: 4198
Oliver, would it be the same if compile like:
gcc -Ofast -fomit-frame-pointer olithink.c -o olithink?
Hi :)
The effect is similar. With the dummy function+call it's faster than without:

Code: Select all

Nodes: 95905038 cs: 3738 knps: 3281
increases to

Code: Select all

Nodes: 95905038 cs: 3495 knps: 3510
And, as you see clang is much faster than gcc, but this has always been with my engine.
and how do you test knps? running search from starting position?
It's a fen position and a fixed search depth (see below).
How can I reproduce this behavior in exact way on my side?
I am not sure, because it depends a lot on your hardware.
On a AMD EPYC 7502P 32-Core Processor it is a much bigger effect than on other CPUs. Try to add some dummy code and see that the performance varies. On this 32-Core AMD it varies notably with every code change that changes alignment.

PS Here is the test position and the complete log:

[d]8/5k1p/1p1pRp2/3P4/PpP3Pp/6bP/6K1/8 w - - 0 2 bm c4c5

Code: Select all

bin/olithink589d -sd 31 "8/5k1p/1p1pRp2/3P4/PpP3Pp/6bP/6K1/8 w - - 0 2 bm c4c5"
 1   167      0        14  e6e2 
 2   120      0        87  g2f3 g3e5 
 3   133      0       256  e6e2 g3e5 g2f3 
 4   115      0       722  e6e3 g3e5 g2f1 h7h6 
 5   128      0      1062  e6e3 g3e5 g2f1 h7h6 e3e2 
 6   112      0      2059  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 
 7   144      0      3352  e6e3 g3e5 g2f3 h7h6 e3b3 e5c3 b3b1 
 8   132      0      4243  e6e3 g3e5 g2f3 h7h6 e3b3 e5c3 b3b1 c3d2 
 9   132      0      5333  e6e3 g3e5 g2f3 h7h6 e3b3 e5c3 b3b1 c3d2 f3e4 
10   117      0     12009  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 b3b1 f6f5 g4f5 g6f5 b1g1 
11   112      0     16585  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 h7h5 b3f3 
12   121      1     21616  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 h7h5 b3b2 d2c3 b2f2 
13   121      1     28307  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4f3 f6f5 
14   118      1     39605  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 b1c1 h7h5 c1g1 
15   125      2     66413  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1d1 d2c3 
16   115      2     82161  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1d1 d2c3 d1f1 
17   124      3    100923  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 
18   124      3    121952  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 f1f7 
19   117      4    156887  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 f1g1 g5f5 e3f3 
20   114      6    245239  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 e3d3 g5h5 f1g1 
21   108     10    358956  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 e3d3 h7h5 f1g1 g5f4 g1f1 f4g3 
22   111     20    761523  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 f6f5 g4f5 g5f5 b1f1 f5g6 d3c2 h7h5 c2b3 g6g5 f1f7 
23   110     22    840732  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 g5f4 b1g1 h7h6 d3c2 f6f5 g1f1 f4g3 g4f5 g3h3 
24    99     25    958165  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 g5f4 b1g1 h7h6 d3c2 c3d4 g1b1 d4c3 b1d1 f4g3 d1d3 g3f4 c2b3 f6f5 
25   108     33   1280198  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 b3b1 h7h5 g4h5 g6h5 f3e4 h5g6 b1g1 g6f7 e4d3 f6f5 d3c2 f7f6 c2b3 f5f4 g1g4 f6f5 g4h4 c3f6 
26   112     47   1817183  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 f6f5 g4f5 g5f5 b1f1 f5g6 d3c2 h7h5 f1g1 g6f5 c2b3 f5f6 g1g8 f6f5 g8b8 f5e4 
27   125     81   3215364  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6f7 e4d3 f7g6 b1f1 h7h5 g4h5 g6h5 d3c2 c3e5 c2b3 e5c3 f1f5 h5g6 f5f4 g6h5 f4f1 h5h6 f1g1 f6f5 
28   125     91   3647064  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6f7 e4d3 f7g6 b1f1 h7h5 g4h5 g6h5 d3c2 c3e5 c2b3 e5c3 f1f5 h5g6 f5f4 g6h5 f4f1 h5h6 f1g1 f6f5 g1g2 
29   131    145   5898469  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 h7h6 e4d3 f6f5 g4f5 g6f5 b1f1 f5g6 f1g1 g6h7 d3c2 h6h5 c2b3 c3d2 g1f1 h7g7 f1f2 d2c3 f2f4 g7g6 f4h4 c3f6 
30   134    196   8090947  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 h7h6 e4d3 h6h5 b1g1 c3e5 g4h5 g6h5 d3c2 e5d4 g1g4 d4c5 g4g8 f6f5 c2b3 h5h6 g8h8 h6g5 h8h7 f5f4 h7g7 g5f5 
31   388   2922 122685877  c4c5 b4b3 e6e3 d6c5 e3b3 c5c4 b3b6 g3e1 b6b7 f7e8 b7c7 e1a5 c7c4 e8e7 c4c6 h7h6 c6a6 a5b4 a6e6 e7f7 e6b6 b4a5 b6b7 f7e8 d5d6 e8d8 b7h7 a5d2 g2f3 d2b4 d6d7 b4d6 
move c4c5
kibitz W: 388 Nodes: 95905038 QNodes: 26780839 Evals: 69142269 cs: 2922 knps: 4198
OliThink GitHub: https://github.com/olithink
Nice arcticle about OlIThink: https://www.chessengeria.eu/post/olithink-oldie-goldie
Chess Engine OliThink Homepage: http://brausch.org/home/chess
User avatar
maksimKorzh
Posts: 775
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Performance loss when removing unused function

Post by maksimKorzh »

OliverBr wrote: Thu Oct 29, 2020 12:21 pm
maksimKorzh wrote: Thu Oct 29, 2020 2:54 am
OliverBr wrote: Thu Oct 29, 2020 12:08 am there is a performance boost from

Code: Select all

Nodes: 95905038 cs: 2975 knps: 4123
to

Code: Select all

Nodes: 95905038 cs: 2922 knps: 4198
Oliver, would it be the same if compile like:
gcc -Ofast -fomit-frame-pointer olithink.c -o olithink?
Hi :)
The effect is similar. With the dummy function+call it's faster than without:

Code: Select all

Nodes: 95905038 cs: 3738 knps: 3281
increases to

Code: Select all

Nodes: 95905038 cs: 3495 knps: 3510
And, as you see clang is much faster than gcc, but this has always been with my engine.
and how do you test knps? running search from starting position?
It's a fen position and a fixed search depth (see below).
How can I reproduce this behavior in exact way on my side?
I am not sure, because it depends a lot on your hardware.
On a AMD EPYC 7502P 32-Core Processor it is a much bigger effect than on other CPUs. Try to add some dummy code and see that the performance varies. On this 32-Core AMD it varies notably with every code change that changes alignment.

PS Here is the test position and the complete log:

[d]8/5k1p/1p1pRp2/3P4/PpP3Pp/6bP/6K1/8 w - - 0 2 bm c4c5

Code: Select all

bin/olithink589d -sd 31 "8/5k1p/1p1pRp2/3P4/PpP3Pp/6bP/6K1/8 w - - 0 2 bm c4c5"
 1   167      0        14  e6e2 
 2   120      0        87  g2f3 g3e5 
 3   133      0       256  e6e2 g3e5 g2f3 
 4   115      0       722  e6e3 g3e5 g2f1 h7h6 
 5   128      0      1062  e6e3 g3e5 g2f1 h7h6 e3e2 
 6   112      0      2059  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 
 7   144      0      3352  e6e3 g3e5 g2f3 h7h6 e3b3 e5c3 b3b1 
 8   132      0      4243  e6e3 g3e5 g2f3 h7h6 e3b3 e5c3 b3b1 c3d2 
 9   132      0      5333  e6e3 g3e5 g2f3 h7h6 e3b3 e5c3 b3b1 c3d2 f3e4 
10   117      0     12009  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 b3b1 f6f5 g4f5 g6f5 b1g1 
11   112      0     16585  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 h7h5 b3f3 
12   121      1     21616  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 h7h5 b3b2 d2c3 b2f2 
13   121      1     28307  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4f3 f6f5 
14   118      1     39605  e6e3 g3e5 g2f3 f7g6 e3b3 e5c3 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 b1c1 h7h5 c1g1 
15   125      2     66413  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1d1 d2c3 
16   115      2     82161  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1d1 d2c3 d1f1 
17   124      3    100923  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 
18   124      3    121952  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 f1f7 
19   117      4    156887  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 f1g1 g5f5 e3f3 
20   114      6    245239  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 e3d3 g5h5 f1g1 
21   108     10    358956  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3e4 g6g5 b3b1 g5g6 e4f4 c3d2 f4f3 f6f5 b1f1 d2c3 g4f5 g6f5 f3e3 f5g5 e3d3 h7h5 f1g1 g5f4 g1f1 f4g3 
22   111     20    761523  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 f6f5 g4f5 g5f5 b1f1 f5g6 d3c2 h7h5 c2b3 g6g5 f1f7 
23   110     22    840732  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 g5f4 b1g1 h7h6 d3c2 f6f5 g1f1 f4g3 g4f5 g3h3 
24    99     25    958165  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 g5f4 b1g1 h7h6 d3c2 c3d4 g1b1 d4c3 b1d1 f4g3 d1d3 g3f4 c2b3 f6f5 
25   108     33   1280198  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 b3b1 h7h5 g4h5 g6h5 f3e4 h5g6 b1g1 g6f7 e4d3 f6f5 d3c2 f7f6 c2b3 f5f4 g1g4 f6f5 g4h4 c3f6 
26   112     47   1817183  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6g5 e4d3 f6f5 g4f5 g5f5 b1f1 f5g6 d3c2 h7h5 f1g1 g6f5 c2b3 f5f6 g1g8 f6f5 g8b8 f5e4 
27   125     81   3215364  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6f7 e4d3 f7g6 b1f1 h7h5 g4h5 g6h5 d3c2 c3e5 c2b3 e5c3 f1f5 h5g6 f5f4 g6h5 f4f1 h5h6 f1g1 f6f5 
28   125     91   3647064  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 g6f7 e4d3 f7g6 b1f1 h7h5 g4h5 g6h5 d3c2 c3e5 c2b3 e5c3 f1f5 h5g6 f5f4 g6h5 f4f1 h5h6 f1g1 f6f5 g1g2 
29   131    145   5898469  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 h7h6 e4d3 f6f5 g4f5 g6f5 b1f1 f5g6 f1g1 g6h7 d3c2 h6h5 c2b3 c3d2 g1f1 h7g7 f1f2 d2c3 f2f4 g7g6 f4h4 c3f6 
30   134    196   8090947  e6e3 g3e5 e3b3 e5c3 g2f3 f7g6 f3f4 c3d2 f4e4 d2c3 b3b1 h7h6 e4d3 h6h5 b1g1 c3e5 g4h5 g6h5 d3c2 e5d4 g1g4 d4c5 g4g8 f6f5 c2b3 h5h6 g8h8 h6g5 h8h7 f5f4 h7g7 g5f5 
31   388   2922 122685877  c4c5 b4b3 e6e3 d6c5 e3b3 c5c4 b3b6 g3e1 b6b7 f7e8 b7c7 e1a5 c7c4 e8e7 c4c6 h7h6 c6a6 a5b4 a6e6 e7f7 e6b6 b4a5 b6b7 f7e8 d5d6 e8d8 b7h7 a5d2 g2f3 d2b4 d6d7 b4d6 
move c4c5
kibitz W: 388 Nodes: 95905038 QNodes: 26780839 Evals: 69142269 cs: 2922 knps: 4198
Thanks for explanations, now it's clear.