AMD makefile tweak for Ethereal

Discussion of chess software programming and technical issues.

Moderator: Ras

Dann Corbit
Posts: 12814
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: AMD makefile tweak for Ethereal

Post by Dann Corbit »

syzygy wrote: Fri Sep 04, 2020 11:43 pm
Dann Corbit wrote: Thu Sep 03, 2020 6:32 am I am not seeing a huge difference now.
I got for 1 thread from the root node 26 ply search):
AVX:
nps 2364000
No popcount:
nps 1867000
Modern:
nps 2360000

I have a lot of things going on on this machine, like database servers, so I think I must have been fooled by something.
I got a 40% boost for Cfish.
I thought I saw a good boost for Ethereal, and when it started solving effectively a batch of difficult mates, I thought I had found something.
Sorry for the trouble.

Something I did differently with Cfish is that I did not use march=native, but instead mtune=native.
-march=native implies -mtune=native.

The simplest thing is:

Code: Select all

make pgo
This compiles Cfish with profile-guided optimization and automatically selects the best flags for your CPU (and adds -march=native).
Yes, I know, but I tend to compile with mtune=native which will allow the program to run on other platforms.
Howerver, having chosen AVX2, I have already severely limited the target platforms in this case, so I should problably just follow your advice.
I also do several platform specific builds that have low requirements so that I can run it on old machines. I don't want to install complers on all of them.

There is a caution with march=native in that the CPU reports BMI and if BMI instructions are used on AMD threadripper, the results go straight in the crapper. So I will test a few configurations.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
syzygy
Posts: 5829
Joined: Tue Feb 28, 2012 11:56 pm

Re: AMD makefile tweak for Ethereal

Post by syzygy »

Dann Corbit wrote: Sat Sep 05, 2020 1:12 am Yes, I know, but I tend to compile with mtune=native which will allow the program to run on other platforms.
OK, I see now. -mtune instead of -march.
Howerver, having chosen AVX2, I have already severely limited the target platforms in this case, so I should problably just follow your advice.
I also do several platform specific builds that have low requirements so that I can run it on old machines. I don't want to install complers on all of them.
To have a minimum number of builds covering almost everything, it seems best to have one AVX2 build and one regular x86-64 build (which enables SSE2). The SSE2 NNUE code in Cfish is pretty fast now. In between the two there could be a popcnt build.
There is a caution with march=native in that the CPU reports BMI and if BMI instructions are used on AMD threadripper, the results go straight in the crapper. So I will test a few configurations.
As long as you prevent -DUSE_PEXT, I would be very surprised if gcc generated pext or pdep instructions. I don't think gcc can recognise situations where using them makes sense.

Perhaps I am wrong (I would have thought the same about popcnt, but gcc is able to compile a software popcount to the popcnt instruction). If I am wrong, I would be very interested in an example.

Btw, the Cfish Makefile makes sure not to add -DUSE_PEXT on Zen if you do "make pgo" or "make pgo ARCH=auto".
Dann Corbit
Posts: 12814
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: AMD makefile tweak for Ethereal

Post by Dann Corbit »

syzygy wrote: Sat Sep 05, 2020 1:46 am To have a minimum number of builds covering almost everything, it seems best to have one AVX2 build and one regular x86-64 build (which enables SSE2). The SSE2 NNUE code in Cfish is pretty fast now. In between the two there could be a popcnt build.
Most of the time, that is exactly what I do.
I have one binary I tag with "crusty" for old CPUs, one called "modern" for popcount+sse and one for AVX2.

I do have one new Intel machine with BMI/PEXT but I build for that machine on that machine (after all, profile builds are better)
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
jdart
Posts: 4420
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: AMD makefile tweak for Ethereal

Post by jdart »

Adding -mavx2 for Arasan gave about 4% NPS increase (Linux, gcc 7.5, Intel 2690v3).

--Jon