AMD makefile tweak for Ethereal

Discussion of chess software programming and technical issues.

Moderator: Ras

Dann Corbit
Posts: 12808
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

AMD makefile tweak for Ethereal

Post by Dann Corbit »

It makes a big difference for modern AMD architecture.

Code: Select all

# Ethereal is a UCI chess playing engine authored by Andrew Grant.
# <https://github.com/AndyGrant/Ethereal>     <andrew@grantnet.us>
#
# Ethereal is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Ethereal is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

CC      = gcc
SRC     = *.c pyrrhic/tbprobe.c
LIBS    = -lpthread -lm
EXE     = Ethereal

WFLAGS = -std=gnu11 -Wall -Wextra -Wshadow
CFLAGS = -O3 $(WFLAGS) -DNDEBUG -flto -march=native
RFLAGS = -O3 $(WFLAGS) -DNDEBUG -flto -static
TFLAGS = -O3 $(WFLAGS) -DNDEBUG -flto -march=native -fopenmp -DTUNE
PFLAGS = -O0 $(WFLAGS) -DNDEBUG -p -pg
DFLAGS = -O0 $(WFLAGS)

POPCNTFLAGS = -DUSE_POPCNT -msse3 -mpopcnt
AMDFLAGS    = -DUSE_POPCNT -msse -msse3 -mpopcnt -mavx2 -msse4.1 -mssse3 -msse2 
PEXTFLAGS   = $(POPCNTFLAGS) -DUSE_PEXT -mbmi2

ARMV8FLAGS  = -O3 $(WFLAGS) -DNDEBUG -flto -march=armv8-a -m64
ARMV7FLAGS  = -O3 $(WFLAGS) -DNDEBUG -flto -march=armv7-a -m32
ARMV7FLAGS += -mfloat-abi=softfp -mfpu=vfpv3-d16 -mthumb -Wl,--fix-cortex-a8

popcnt:
	$(CC) $(CFLAGS) $(SRC) $(LIBS) $(POPCNTFLAGS) -o $(EXE)

amd:
	$(CC) $(CFLAGS) $(SRC) $(LIBS) $(AMDFLAGS) -o $(EXE)

nopopcnt:
	$(CC) $(CFLAGS) $(SRC) $(LIBS) -o $(EXE)

pext:
	$(CC) $(CFLAGS) $(SRC) $(LIBS) $(PEXTFLAGS) -o $(EXE)

release:
	mkdir ../dist
	$(CC) $(RFLAGS) $(SRC) $(LIBS) -o ../dist/$(EXE)$(VER)-x64-nopopcnt.exe
	$(CC) $(RFLAGS) $(SRC) $(LIBS) $(POPCNTFLAGS) -o ../dist/$(EXE)$(VER)-x64-popcnt.exe
	$(CC) $(RFLAGS) $(SRC) $(LIBS) $(AMDFLAGS) -o ../dist/$(EXE)$(VER)-x64-avx2.exe
	$(CC) $(RFLAGS) $(SRC) $(LIBS) $(PEXTFLAGS) -o ../dist/$(EXE)$(VER)-x64-pext.exe

tune:
	$(CC) $(TFLAGS) $(SRC) $(LIBS) $(POPCNT) -o $(EXE)

profile:
	$(CC) $(PFLAGS) $(SRC) $(LIBS) $(POPCNT) -o $(EXE)

debug:
	$(CC) $(DFLAGS) $(SRC) $(LIBS) $(POPCNT) -o $(EXE)

armv8:
	$(CC) $(ARMV8FLAGS) $(SRC) -lm -o $(EXE)

armv7:
	$(CC) $(ARMV7FLAGS) $(SRC) -lm -o $(EXE)
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12808
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: AMD makefile tweak for Ethereal

Post by Dann Corbit »

The current Ethereal is really good at solving mates on my AMD 3970x.
It solved a cluster of difficult mates that several other famous solvers had problems with.
All of the mates are in the 11 to 25 ply range
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
AndrewGrant
Posts: 1963
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: AMD makefile tweak for Ethereal

Post by AndrewGrant »

I get similar results as you with my Ryzens, but I'm confused. In the makefile I set "-march=native". According to "gcc -march=native -Q --help=target", this will enable/disable the following flags:

Code: Select all

  -mavx                        [enabled]
  -mavx2                       [enabled]
  -msse                        [enabled]
  -msse2                       [enabled]
  -msse2avx                    [disabled]
  -msse3                       [enabled]
  -msse4                       [enabled]
  -msse4.1                     [enabled]
  -msse4.2                     [enabled]
  -msse4a                      [enabled]
  -msse5
  -msseregparm                 [disabled]
  -mssse3                      [enabled]
It appears that ever single one of the flags you added, are expressly added already. What am I missing?
AndrewGrant
Posts: 1963
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: AMD makefile tweak for Ethereal

Post by AndrewGrant »

Dann, I'm trying to track down why this is with Cucumber. Can you tell us your CPU, and your gcc --version, as well as your OS?
AndrewGrant
Posts: 1963
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: AMD makefile tweak for Ethereal

Post by AndrewGrant »

AndrewGrant wrote: Thu Sep 03, 2020 4:53 am I get similar results as you with my Ryzens
Wrong. The binarys are identical, whether I use mtune=native or not, and whether I toss in your flags or not.

Is it possible you found what you were looking for here? And that there is no real difference, just a bias in observation or a low sample size?
Dann Corbit
Posts: 12808
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: AMD makefile tweak for Ethereal

Post by Dann Corbit »

I am not seeing a huge difference now.
I got for 1 thread from the root node 26 ply search):
AVX:
nps 2364000
No popcount:
nps 1867000
Modern:
nps 2360000

I have a lot of things going on on this machine, like database servers, so I think I must have been fooled by something.
I got a 40% boost for Cfish.
I thought I saw a good boost for Ethereal, and when it started solving effectively a batch of difficult mates, I thought I had found something.
Sorry for the trouble.

Something I did differently with Cfish is that I did not use march=native, but instead mtune=native.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
syzygy
Posts: 5823
Joined: Tue Feb 28, 2012 11:56 pm

Re: AMD makefile tweak for Ethereal

Post by syzygy »

Dann Corbit wrote: Thu Sep 03, 2020 6:32 am I am not seeing a huge difference now.
I got for 1 thread from the root node 26 ply search):
AVX:
nps 2364000
No popcount:
nps 1867000
Modern:
nps 2360000

I have a lot of things going on on this machine, like database servers, so I think I must have been fooled by something.
I got a 40% boost for Cfish.
I thought I saw a good boost for Ethereal, and when it started solving effectively a batch of difficult mates, I thought I had found something.
Sorry for the trouble.

Something I did differently with Cfish is that I did not use march=native, but instead mtune=native.
-march=native implies -mtune=native.

The simplest thing is:

Code: Select all

make pgo
This compiles Cfish with profile-guided optimization and automatically selects the best flags for your CPU (and adds -march=native).
AndrewGrant
Posts: 1963
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: AMD makefile tweak for Ethereal

Post by AndrewGrant »

syzygy wrote: Fri Sep 04, 2020 11:43 pm -march=native implies -mtune=native.
So this actually has some caveats: https://lemire.me/blog/2018/07/25/it-is ... ch-in-gcc/
syzygy
Posts: 5823
Joined: Tue Feb 28, 2012 11:56 pm

Re: AMD makefile tweak for Ethereal

Post by syzygy »

AndrewGrant wrote: Fri Sep 04, 2020 11:49 pm
syzygy wrote: Fri Sep 04, 2020 11:43 pm -march=native implies -mtune=native.
So this actually has some caveats: https://lemire.me/blog/2018/07/25/it-is ... ch-in-gcc/
Interesting!
So -march=xyz implies -mtune=xyz unless xyz=native. Very logical...

However, my gcc-10.2 on this laptop does set -march=skylake -mtune=skylake when I do -march=native.
syzygy
Posts: 5823
Joined: Tue Feb 28, 2012 11:56 pm

Re: AMD makefile tweak for Ethereal

Post by syzygy »

syzygy wrote: Sat Sep 05, 2020 12:06 am
AndrewGrant wrote: Fri Sep 04, 2020 11:49 pm
syzygy wrote: Fri Sep 04, 2020 11:43 pm -march=native implies -mtune=native.
So this actually has some caveats: https://lemire.me/blog/2018/07/25/it-is ... ch-in-gcc/
Interesting!
So -march=xyz implies -mtune=xyz unless xyz=native. Very logical...

However, my gcc-10.2 on this laptop does set -march=skylake -mtune=skylake when I do -march=native.
It seems -march=native in fact does imply -mtune=native.
However, if gcc is too old to know about your CPU, -march=native may set arch to something at least close to your current CPU, whereas -mtune=native will set tune to "generic". This makes sense because the new cpu will normally support the instruction sets of the older cpu (and I guess gcc will explicitly check for this by checking the cpu flags), but the optimal scheduling rules for the new CPU cannot be known.

So -march=native and is probably always equivalent to -march=native -mtune=native. On x86.

It seems gcc does not support -march on e.g. PowerPC. There you have to use -mcpu=native (which does seem to imply -mtune=native).

On ARM, it seems -march does not imply -mtune. But there you can do -mcpu which implies both -march and -mtune (unless those are set separately).