Stockfish PGO and popcnt
Moderators: hgm, Dann Corbit, Harvey Williamson
-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Stockfish PGO and popcnt
When I compile Stockfish using "make build" the resulting binary contains the popcntq instruction. But when I compile with "make profile-build" it doesn't. This is with gcc-4.8.1 on OS X.
Seems odd to me. Any ideas? Thanks.
Seems odd to me. Any ideas? Thanks.
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Stockfish PGO and popcnt
Please write the full make command.zullil wrote:When I compile Stockfish using "make build" the resulting binary contains the popcntq instruction. But when I compile with "make profile-build" it doesn't. This is with gcc-4.8.1 on OS X.
Seems odd to me. Any ideas? Thanks.
-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish PGO and popcnt
Hi Marco,mcostalba wrote:Please write the full make command.zullil wrote:When I compile Stockfish using "make build" the resulting binary contains the popcntq instruction. But when I compile with "make profile-build" it doesn't. This is with gcc-4.8.1 on OS X.
Seems odd to me. Any ideas? Thanks.
I'm using the source code from the c++11 branch, but I see the same behavior using the master branch.
My make command is
Code: Select all
make profile-build ARCH=osx-x86-64Code: Select all
make build ARCH=osx-x86-64Makefile:
Code: Select all
# Stockfish, a UCI chess playing engine derived from Glaurung 2.1
# Copyright (C) 2004-2008 Tord Romstad (Glaurung author)
# Copyright (C) 2008-2013 Marco Costalba, Joona Kiiski, Tord Romstad
#
# Stockfish is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Stockfish is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
### ==========================================================================
### Section 1. General Configuration
### ==========================================================================
### Establish the operating system name
UNAME = $(shell uname)
### Executable name
EXE = stockfish
### Installation dir definitions
PREFIX = /usr/local
# Haiku has a non-standard filesystem layout
ifeq ($(UNAME),Haiku)
PREFIX=/boot/common
endif
BINDIR = $(PREFIX)/bin
### Built-in benchmark for pgo-builds
PGOBENCH = ./$(EXE) bench 32 1 10 default depth
### Object files
OBJS = benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o \
material.o misc.o movegen.o movepick.o notation.o pawns.o position.o \
search.o thread.o timeman.o tt.o uci.o ucioption.o
### ==========================================================================
### Section 2. High-level Configuration
### ==========================================================================
#
# flag --- Comp switch --- Description
# ----------------------------------------------------------------------------
#
# debug = yes/no --- -DNDEBUG --- Enable/Disable debug mode
# optimize = yes/no --- (-O3/-fast etc.) --- Enable/Disable optimizations
# arch = (name) --- (-arch) --- Target architecture
# os = (name) --- --- Target operating system
# bits = 64/32 --- -DIS_64BIT --- 64-/32-bit operating system
# prefetch = yes/no --- -DUSE_PREFETCH --- Use prefetch x86 asm-instruction
# bsfq = yes/no --- -DUSE_BSFQ --- Use bsfq x86_64 asm-instruction (only
# with GCC and ICC 64-bit)
# popcnt = yes/no --- -DUSE_POPCNT --- Use popcnt x86_64 asm-instruction
# sse = yes/no --- -msse --- Use Intel Streaming SIMD Extensions
#
# Note that Makefile is space sensitive, so when adding new architectures
# or modifying existing flags, you have to make sure there are no extra spaces
# at the end of the line for flag values.
### 2.1. General
debug = no
optimize = yes
### 2.2 Architecture specific
# General-section
ifeq ($(ARCH),general-64)
arch = any
os = any
bits = 64
prefetch = no
bsfq = no
popcnt = no
sse = no
endif
ifeq ($(ARCH),general-32)
arch = any
os = any
bits = 32
prefetch = no
bsfq = no
popcnt = no
sse = no
endif
# x86-section
ifeq ($(ARCH),x86-64)
arch = x86_64
os = any
bits = 64
prefetch = yes
bsfq = yes
popcnt = no
sse = yes
endif
ifeq ($(ARCH),x86-64-modern)
arch = x86_64
os = any
bits = 64
prefetch = yes
bsfq = yes
popcnt = yes
sse = yes
endif
ifeq ($(ARCH),x86-32)
arch = i386
os = any
bits = 32
prefetch = yes
bsfq = no
popcnt = no
sse = yes
endif
ifeq ($(ARCH),x86-32-old)
arch = i386
os = any
bits = 32
prefetch = no
bsfq = no
popcnt = no
sse = no
endif
#arm section
ifeq ($(ARCH),armv7)
arch = armv7
os = any
bits = 32
prefetch = yes
bsfq = yes
popcnt = no
sse = no
endif
# osx-section
ifeq ($(ARCH),osx-ppc-64)
arch = ppc64
os = osx
bits = 64
prefetch = no
bsfq = no
popcnt = no
sse = no
endif
ifeq ($(ARCH),osx-ppc-32)
arch = ppc
os = osx
bits = 32
prefetch = no
bsfq = no
popcnt = no
sse = no
endif
ifeq ($(ARCH),osx-x86-64)
arch = x86_64
os = osx
bits = 64
prefetch = yes
bsfq = yes
popcnt = yes
sse = yes
endif
ifeq ($(ARCH),osx-x86-32)
arch = i386
os = osx
bits = 32
prefetch = yes
bsfq = no
popcnt = no
sse = yes
endif
### ==========================================================================
### Section 3. Low-level configuration
### ==========================================================================
### 3.1 Selecting compiler (default = gcc)
ifeq ($(COMP),)
COMP=gcc
endif
ifeq ($(COMP),mingw)
comp=mingw
CXX=g++
profile_prepare = gcc-profile-prepare
profile_make = gcc-profile-make
profile_use = gcc-profile-use
profile_clean = gcc-profile-clean
endif
ifeq ($(COMP),gcc)
comp=gcc
CXX=g++
profile_prepare = gcc-profile-prepare
profile_make = gcc-profile-make
profile_use = gcc-profile-use
profile_clean = gcc-profile-clean
endif
ifeq ($(COMP),icc)
comp=icc
CXX=icpc
profile_prepare = icc-profile-prepare
profile_make = icc-profile-make
profile_use = icc-profile-use
profile_clean = icc-profile-clean
endif
ifeq ($(COMP),clang)
comp=clang
CXX=clang++
profile_prepare = gcc-profile-prepare
profile_make = gcc-profile-make
profile_use = gcc-profile-use
profile_clean = gcc-profile-clean
endif
### 3.2 General compiler settings
CXXFLAGS = -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 $(EXTRACXXFLAGS)
ifeq ($(comp),gcc)
CXXFLAGS += -pedantic -Wno-long-long -Wextra -Wshadow
endif
ifeq ($(comp),mingw)
CXXFLAGS += -Wextra -Wshadow
endif
ifeq ($(comp),icc)
CXXFLAGS += -wd383,981,1418,1419,1476,10187,10188,11505,11503 -Wcheck -Wabi -Wdeprecated -strict-ansi
endif
ifeq ($(comp),clang)
CXXFLAGS += -pedantic -Wno-long-long -Wextra -Wshadow
endif
ifeq ($(os),osx)
CXXFLAGS +=
endif
### 3.3 General linker settings
LDFLAGS = $(EXTRALDFLAGS)
ifeq ($(comp),mingw)
LDFLAGS += -static-libstdc++ -static-libgcc
endif
### On mingw use Windows threads, otherwise POSIX
ifneq ($(comp),mingw)
# Haiku has pthreads in its libroot, so only link it in on other platforms
ifneq ($(UNAME),Haiku)
LDFLAGS += -lpthread
endif
endif
ifeq ($(os),osx)
LDFLAGS +=
endif
### 3.4 Debugging
ifeq ($(debug),no)
CXXFLAGS += -DNDEBUG
else
CXXFLAGS += -g
endif
### 3.5 Optimization
ifeq ($(optimize),yes)
ifeq ($(comp),gcc)
CXXFLAGS += -O3 -fno-tree-pre
ifeq ($(os),osx)
ifeq ($(arch),i386)
CXXFLAGS += -mdynamic-no-pic
endif
ifeq ($(arch),x86_64)
CXXFLAGS +=
endif
endif
ifeq ($(arch),armv7)
CXXFLAGS += -fno-gcse
endif
endif
ifeq ($(comp),mingw)
CXXFLAGS += -O3
endif
ifeq ($(comp),icc)
ifeq ($(os),osx)
CXXFLAGS += -fast -mdynamic-no-pic
else
CXXFLAGS += -O3
endif
endif
ifeq ($(comp),clang)
### -O4 requires a linker that supports LLVM's LTO
CXXFLAGS += -O3
ifeq ($(os),osx)
ifeq ($(arch),i386)
CXXFLAGS += -mdynamic-no-pic
endif
ifeq ($(arch),x86_64)
CXXFLAGS += -mdynamic-no-pic
endif
endif
endif
endif
### 3.6. Bits
ifeq ($(bits),64)
CXXFLAGS += -DIS_64BIT
endif
### 3.7 prefetch
ifeq ($(prefetch),yes)
ifeq ($(sse),yes)
CXXFLAGS += -msse
DEPENDFLAGS += -msse
endif
else
CXXFLAGS += -DNO_PREFETCH
endif
### 3.8 bsfq
ifeq ($(bsfq),yes)
CXXFLAGS += -DUSE_BSFQ
endif
### 3.9 popcnt
ifeq ($(popcnt),yes)
CXXFLAGS += -msse3 -DUSE_POPCNT
endif
### 3.10 Link Time Optimization, it works since gcc 4.5 but not on mingw.
### This is a mix of compile and link time options because the lto link phase
### needs access to the optimization flags.
ifeq ($(comp),gcc)
ifeq ($(optimize),yes)
ifeq ($(debug),no)
GCC_MAJOR := `$(CXX) -dumpversion | cut -f1 -d.`
GCC_MINOR := `$(CXX) -dumpversion | cut -f2 -d.`
ifeq (1,$(shell expr \( $(GCC_MAJOR) \> 4 \) \| \( $(GCC_MAJOR) \= 4 \& $(GCC_MINOR) \>= 5 \)))
CXXFLAGS += -flto
LDFLAGS += $(CXXFLAGS)
endif
endif
endif
endif
### ==========================================================================
### Section 4. Public targets
### ==========================================================================
help:
@echo ""
@echo "To compile stockfish, type: "
@echo ""
@echo "make target ARCH=arch [COMP=comp]"
@echo ""
@echo "Supported targets:"
@echo ""
@echo "build > Build unoptimized version"
@echo "profile-build > Build PGO-optimized version"
@echo "strip > Strip executable"
@echo "install > Install executable"
@echo "clean > Clean up"
@echo "testrun > Make sample run"
@echo ""
@echo "Supported archs:"
@echo ""
@echo "x86-64 > x86 64-bit"
@echo "x86-64-modern > x86 64-bit with runtime support for popcnt instruction"
@echo "x86-32 > x86 32-bit excluding old hardware without SSE-support"
@echo "x86-32-old > x86 32-bit including also very old hardware"
@echo "osx-ppc-64 > PPC-Mac OS X 64 bit"
@echo "osx-ppc-32 > PPC-Mac OS X 32 bit"
@echo "osx-x86-64 > x86-Mac OS X 64 bit"
@echo "osx-x86-32 > x86-Mac OS X 32 bit"
@echo "armv7 > ARMv7 32 bit"
@echo "general-64 > unspecified 64-bit"
@echo "general-32 > unspecified 32-bit"
@echo ""
@echo "Supported comps:"
@echo ""
@echo "gcc > Gnu compiler (default)"
@echo "icc > Intel compiler"
@echo "mingw > Gnu compiler with MinGW under Windows"
@echo "clang > LLVM Clang compiler"
@echo ""
@echo "Non-standard targets:"
@echo ""
@echo "make hpux > Compile for HP-UX. Compiler = aCC"
@echo ""
@echo "Examples. If you don't know what to do, you likely want to run: "
@echo ""
@echo "make profile-build ARCH=x86-64 (This is for 64-bit systems)"
@echo "make profile-build ARCH=x86-32 (This is for 32-bit systems)"
@echo ""
build:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) config-sanity
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) all
profile-build:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) config-sanity
@echo ""
@echo "Step 0/4. Preparing for profile build."
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) $(profile_prepare)
@echo ""
@echo "Step 1/4. Building executable for benchmark ..."
@touch *.cpp *.h
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) $(profile_make)
@echo ""
@echo "Step 2/4. Running benchmark for pgo-build ..."
@$(PGOBENCH) > /dev/null
@echo ""
@echo "Step 3/4. Building final executable ..."
@touch *.cpp
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) $(profile_use)
@echo ""
@echo "Step 4/4. Deleting profile data ..."
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) $(profile_clean)
strip:
strip $(EXE)
install:
-mkdir -p -m 755 $(BINDIR)
-cp $(EXE) $(BINDIR)
-strip $(BINDIR)/$(EXE)
clean:
$(RM) $(EXE) $(EXE).exe *.o .depend *~ core bench.txt *.gcda
testrun:
@$(PGOBENCH)
default:
help
### ==========================================================================
### Section 5. Private targets
### ==========================================================================
all: $(EXE) .depend
config-sanity:
@echo ""
@echo "Config:"
@echo "debug: '$(debug)'"
@echo "optimize: '$(optimize)'"
@echo "arch: '$(arch)'"
@echo "os: '$(os)'"
@echo "bits: '$(bits)'"
@echo "prefetch: '$(prefetch)'"
@echo "bsfq: '$(bsfq)'"
@echo "popcnt: '$(popcnt)'"
@echo "sse: '$(sse)'"
@echo ""
@echo "Flags:"
@echo "CXX: $(CXX)"
@echo "CXXFLAGS: $(CXXFLAGS)"
@echo "LDFLAGS: $(LDFLAGS)"
@echo ""
@echo "Testing config sanity. If this fails, try 'make help' ..."
@echo ""
@test "$(debug)" = "yes" || test "$(debug)" = "no"
@test "$(optimize)" = "yes" || test "$(optimize)" = "no"
@test "$(arch)" = "any" || test "$(arch)" = "x86_64" || test "$(arch)" = "i386" || \
test "$(arch)" = "ppc64" || test "$(arch)" = "ppc" || test "$(arch)" = "armv7"
@test "$(os)" = "any" || test "$(os)" = "osx"
@test "$(bits)" = "32" || test "$(bits)" = "64"
@test "$(prefetch)" = "yes" || test "$(prefetch)" = "no"
@test "$(bsfq)" = "yes" || test "$(bsfq)" = "no"
@test "$(popcnt)" = "yes" || test "$(popcnt)" = "no"
@test "$(sse)" = "yes" || test "$(sse)" = "no"
@test "$(comp)" = "gcc" || test "$(comp)" = "icc" || test "$(comp)" = "mingw" || test "$(comp)" = "clang"
$(EXE): $(OBJS)
$(CXX) -o $@ $(OBJS) $(LDFLAGS)
gcc-profile-prepare:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) gcc-profile-clean
gcc-profile-make:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
EXTRACXXFLAGS='-fprofile-generate' \
EXTRALDFLAGS='-lgcov' \
all
gcc-profile-use:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
EXTRACXXFLAGS='-fprofile-use' \
EXTRALDFLAGS='-lgcov' \
all
gcc-profile-clean:
@rm -rf *.gcda *.gcno bench.txt
icc-profile-prepare:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) icc-profile-clean
@mkdir profdir
icc-profile-make:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
EXTRACXXFLAGS='-prof-gen=srcpos -prof_dir ./profdir' \
all
icc-profile-use:
$(MAKE) ARCH=$(ARCH) COMP=$(COMP) \
EXTRACXXFLAGS='-prof_use -prof_dir ./profdir' \
all
icc-profile-clean:
@rm -rf profdir bench.txt
.depend:
-@$(CXX) $(DEPENDFLAGS) -MM $(OBJS:.o=.cpp) > $@ 2> /dev/null
-include .depend
### ==========================================================================
### Section 6. Non-standard targets
### ==========================================================================
hpux:
$(MAKE) \
CXX='/opt/aCC/bin/aCC -AA +hpxstd98 -mt +O3 -DNDEBUG -DNO_PREFETCH' \
CXXFLAGS="" \
LDFLAGS="" \
all
Code: Select all
LZsMacPro-OSX6: ~/Documents/Chess/Stockfish/src] g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin10/4.8.1/lto-wrapper
Target: x86_64-apple-darwin10
Configured with: ../gcc-4.8.1/configure --prefix=/opt/local --build=x86_64-apple-darwin10 --enable-languages=c,c++,objc,obj-c++,lto,fortran,java --libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48 --infodir=/opt/local/share/info --mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local --with-system-zlib --disable-nls --program-suffix=-mp-4.8 --with-gxx-include-dir=/opt/local/include/gcc48/c++/ --with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local --with-ppl=/opt/local --with-cloog=/opt/local --enable-cloog-backend=isl --disable-cloog-version-check --enable-stage1-checking --disable-multilib --enable-lto --enable-libstdcxx-time --with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld --with-ar=/opt/local/bin/ar --with-bugurl=https://trac.macports.org/newticket --with-pkgversion='MacPorts gcc48 4.8.1_1'
Thread model: posix
gcc version 4.8.1 (MacPorts gcc48 4.8.1_1)
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Stockfish PGO and popcnt
Could you please post the full output from when you run:
make profile-build ARCH=osx-x86-64
make profile-build ARCH=osx-x86-64
-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish PGO and popcnt
OK. I ranmcostalba wrote:Could you please post the full output from when you run:
make profile-build ARCH=osx-x86-64
Code: Select all
make profile-build ARCH=osx-x86-64 > makelog.txtCode: Select all
make ARCH=osx-x86-64 COMP=gcc config-sanity
Config:
debug: 'no'
optimize: 'yes'
arch: 'x86_64'
os: 'osx'
bits: '64'
prefetch: 'yes'
bsfq: 'yes'
popcnt: 'yes'
sse: 'yes'
Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto
LDFLAGS: -lpthread -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto
Testing config sanity. If this fails, try 'make help' ...
Step 0/4. Preparing for profile build.
make ARCH=osx-x86-64 COMP=gcc gcc-profile-prepare
make ARCH=osx-x86-64 COMP=gcc gcc-profile-clean
Step 1/4. Building executable for benchmark ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-make
make ARCH=osx-x86-64 COMP=gcc \
EXTRACXXFLAGS='-fprofile-generate' \
EXTRALDFLAGS='-lgcov' \
all
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o book.o book.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o endgame.o endgame.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o evaluate.o evaluate.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o notation.o notation.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o pawns.o pawns.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o position.o position.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o ucioption.o ucioption.cpp
g++ -o stockfish benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o notation.o pawns.o position.o search.o thread.o timeman.o tt.o uci.o ucioption.o -lgcov -lpthread -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto
Step 2/4. Running benchmark for pgo-build ...
Step 3/4. Building final executable ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-use
make ARCH=osx-x86-64 COMP=gcc \
EXTRACXXFLAGS='-fprofile-use' \
EXTRALDFLAGS='-lgcov' \
all
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o book.o book.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o endgame.o endgame.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o evaluate.o evaluate.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o notation.o notation.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o pawns.o pawns.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o position.o position.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o ucioption.o ucioption.cpp
g++ -o stockfish benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o notation.o pawns.o position.o search.o thread.o timeman.o tt.o uci.o ucioption.o -lgcov -lpthread -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto
Step 4/4. Deleting profile data ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-clean
-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish PGO and popcnt --- compiler issue
If I use gcc-4.6 rather than gcc-4.8, the binary that results from
does contain the popcnt instruction (and is a bit faster than the gcc-4.8 binary). Since gcc-4.6 doesn't support c++11, I switched to the master branch. I also needed to disable LTO, since I was getting an error at link time using gcc-4.6.
Code: Select all
make profile-build ARCH=osx-x86-64Code: Select all
LZsMacPro-OSX6: ~/Documents/Chess/Test/Stockfish/src] ./stockfish bench
===========================
Total time (ms) : 3578
Nodes searched : 4558173
Nodes/second : 1273944
Code: Select all
LZsMacPro-OSX6: ~/Documents/Chess/Test/Stockfish/src] otool -tvQ stockfish | grep popcnt
0000000100002923 popcnt %r8,%rsi
0000000100002b21 popcnt %r10,%rax
0000000100004650 popcnt %rax,%r9
0000000100007aac popcnt %rax,%rax
0000000100007ce3 popcnt %r10,%r10
0000000100008145 popcnt %rax,%rax
0000000100008474 popcnt %rax,%rax
00000001000085fe popcnt %rdx,%rdx
0000000100008820 popcnt %r8,%r8
0000000100008b56 popcnt %rdx,%r14
0000000100008d61 popcnt %rdx,%r9
0000000100008f07 popcnt %r9,%r9
00000001000096c7 popcnt %r15,%r13
00000001000096d3 popcnt %r8,%rdx
0000000100009b9f popcnt %r13,%r13
0000000100009ccb popcnt %rsi,%r8
0000000100009cdf popcnt %rcx,%rsi
0000000100009da9 popcnt %rcx,%rcx
0000000100009e82 popcnt %rcx,%rdx
000000010000a119 popcnt %rcx,%r14
000000010000a1a9 popcnt %rsi,%r14
000000010000a263 popcnt %rcx,%r12
000000010000a4a5 popcnt %r9,%r14
000000010000a4b7 popcnt %rdx,%r14
000000010000a52d popcnt %rcx,%rax
000000010000a7c9 popcnt %rcx,%rcx
000000010000a80d popcnt %rcx,%rax
000000010000aac5 popcnt %r8,%r8
000000010000aad0 popcnt %rax,%r15
000000010000abc5 popcnt %r10,%rdx
000000010000ac13 popcnt %r14,%rdx
000000010000aca6 popcnt %rcx,%r14
000000010000acbf popcnt %r9,%r9
000000010000ad36 popcnt %rcx,%r10
000000010000ad4d popcnt %r8,%r8
000000010000ada0 popcnt %rsi,%r11
000000010000ae3e popcnt %rdx,%r14
000000010000af74 popcnt %r10,%r10
000000010000b03f popcnt %rdx,%r11
000000010000b10a popcnt %rcx,%rcx
000000010000b17b popcnt %rdx,%rax
000000010000b1b5 popcnt %rcx,%rcx
000000010000b1de popcnt %rdi,%rdi
000000010000b2c9 popcnt %rcx,%rcx
000000010000b2f3 popcnt %r9,%r8
000000010000b305 popcnt %r10,%r10
000000010000b3af popcnt %rcx,%rsi
000000010000b463 popcnt %rcx,%rcx
000000010000ba09 popcnt %rdx,%rax
000000010000baac popcnt %rdx,%rax
000000010000bb89 popcnt %rsi,%rsi
000000010000bba6 popcnt %rax,%rax
000000010000bd5d popcnt %rcx,%r11
000000010000bd6f popcnt %rdx,%rdx
000000010000c083 popcnt %rcx,%rcx
000000010000c095 popcnt %r11,%r11
000000010000c119 popcnt %r15,%rax
000000010000c284 popcnt %rcx,%r9
000000010000c2a2 popcnt %rax,%rax
000000010000c30a popcnt %rsi,%r15
000000010000c420 popcnt %rcx,%rcx
000000010000c439 popcnt %rdx,%rdx
000000010000c5d3 popcnt %rcx,%r15
000000010000c5ec popcnt %rdx,%r15
000000010000c869 popcnt %rcx,%r13
000000010000c883 popcnt %r11,%r11
000000010000c901 popcnt %r13,%rax
000000010000ca74 popcnt %rcx,%r8
000000010000ca92 popcnt %rax,%rax
000000010000caf2 popcnt %rsi,%r11
000000010000cc4a popcnt %rsi,%r9
000000010000ccbc popcnt %rax,%rdi
000000010000cd2e popcnt %rcx,%r12
000000010000cdea popcnt %rsi,%r14
000000010000ce05 popcnt %rdi,%rdi
000000010000ce1d popcnt %rcx,%r9
000000010000ce43 popcnt %rdx,%rdx
000000010000cf95 popcnt %rcx,%r9
000000010000d006 popcnt %rdx,%r11
000000010000d072 popcnt %rcx,%rcx
000000010000d138 popcnt %r9,%r11
000000010000d153 popcnt %r10,%r10
000000010000d16d popcnt %rcx,%rcx
000000010000d192 popcnt %rdx,%rsi
00000001000163ee popcnt %r12,%r11
0000000100016684 popcnt %r15,%r12
000000010001694f popcnt %r15,%r15
0000000100016961 popcnt %rbp,%r13
00000001000169ac popcnt %r13,%rbp
00000001000169b4 popcnt %rbx,%r10-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish PGO and popcnt --- compiler issue
gcc-4.7 behaves like gcc-4.8 --- no popcnt in the resulting binary if PGO is used.
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Stockfish PGO and popcnt
I see nothing wrong in the log output, in particular each file is compiled with USE_POPCNT flag defined, as it should be:
Could you please post the output of:
Code: Select all
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto -c -o ucioption.o ucioption.cpp
Code: Select all
make build ARCH=osx-x86-64-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Stockfish PGO and popcnt --- compiler issue
For the record, are you talking about in the PGO executable, or the re-compile after the PGO has completed???zullil wrote:gcc-4.7 behaves like gcc-4.8 --- no popcnt in the resulting binary if PGO is used.
Also, how does stockfish access the popcnt instruction? An intrinsic or inline asm ala' crafty???
-
zullil
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Stockfish PGO and popcnt --- compiler issue
There is no popcnt instruction in the final executable, the one that results from compiling with -fprofile-use. I assume the popcnt instruction is present in the initial executable, but I'll check anyway.bob wrote:For the record, are you talking about in the PGO executable, or the re-compile after the PGO has completed???zullil wrote:gcc-4.7 behaves like gcc-4.8 --- no popcnt in the resulting binary if PGO is used.
Also, how does stockfish access the popcnt instruction? An intrinsic or inline asm ala' crafty???
Inline assembly:
Code: Select all
__asm__("popcnt %1, %0" : "=r" (b) : "r" (b));