Stockfish PGO and popcnt

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Stockfish PGO and popcnt

Post by zullil »

When I compile Stockfish using "make build" the resulting binary contains the popcntq instruction. But when I compile with "make profile-build" it doesn't. This is with gcc-4.8.1 on OS X.

Seems odd to me. Any ideas? Thanks.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish PGO and popcnt

Post by mcostalba »

zullil wrote:When I compile Stockfish using "make build" the resulting binary contains the popcntq instruction. But when I compile with "make profile-build" it doesn't. This is with gcc-4.8.1 on OS X.

Seems odd to me. Any ideas? Thanks.
Please write the full make command.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish PGO and popcnt

Post by zullil »

mcostalba wrote:
zullil wrote:When I compile Stockfish using "make build" the resulting binary contains the popcntq instruction. But when I compile with "make profile-build" it doesn't. This is with gcc-4.8.1 on OS X.

Seems odd to me. Any ideas? Thanks.
Please write the full make command.
Hi Marco,

I'm using the source code from the c++11 branch, but I see the same behavior using the master branch.

My make command is

Code: Select all

make profile-build ARCH=osx-x86-64
There is no popcnt command in the resulting binary. But there is if I use

Code: Select all

make build ARCH=osx-x86-64

Makefile:

Code: Select all

# Stockfish, a UCI chess playing engine derived from Glaurung 2.1
# Copyright (C) 2004-2008 Tord Romstad (Glaurung author)
# Copyright (C) 2008-2013 Marco Costalba, Joona Kiiski, Tord Romstad
#
# Stockfish is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Stockfish is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http&#58;//www.gnu.org/licenses/>.


### ==========================================================================
### Section 1. General Configuration
### ==========================================================================

### Establish the operating system name
UNAME = $&#40;shell uname&#41;

### Executable name
EXE = stockfish

### Installation dir definitions
PREFIX = /usr/local
# Haiku has a non-standard filesystem layout
ifeq ($&#40;UNAME&#41;,Haiku&#41;
        PREFIX=/boot/common
endif
BINDIR = $&#40;PREFIX&#41;/bin

### Built-in benchmark for pgo-builds
PGOBENCH = ./$&#40;EXE&#41; bench 32 1 10 default depth

### Object files
OBJS = benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o \
        material.o misc.o movegen.o movepick.o notation.o pawns.o position.o \
        search.o thread.o timeman.o tt.o uci.o ucioption.o

### ==========================================================================
### Section 2. High-level Configuration
### ==========================================================================
#
# flag                --- Comp switch --- Description
# ----------------------------------------------------------------------------
#
# debug = yes/no      --- -DNDEBUG         --- Enable/Disable debug mode
# optimize = yes/no   --- (-O3/-fast etc.) --- Enable/Disable optimizations
# arch = &#40;name&#41;       --- (-arch&#41;          --- Target architecture
# os = &#40;name&#41;         ---                  --- Target operating system
# bits = 64/32        --- -DIS_64BIT       --- 64-/32-bit operating system
# prefetch = yes/no   --- -DUSE_PREFETCH   --- Use prefetch x86 asm-instruction
# bsfq = yes/no       --- -DUSE_BSFQ       --- Use bsfq x86_64 asm-instruction &#40;only
#                                              with GCC and ICC 64-bit&#41;
# popcnt = yes/no     --- -DUSE_POPCNT     --- Use popcnt x86_64 asm-instruction
# sse = yes/no        --- -msse            --- Use Intel Streaming SIMD Extensions
#
# Note that Makefile is space sensitive, so when adding new architectures
# or modifying existing flags, you have to make sure there are no extra spaces
# at the end of the line for flag values.

### 2.1. General
debug = no
optimize = yes

### 2.2 Architecture specific

# General-section
ifeq ($&#40;ARCH&#41;,general-64&#41;
        arch = any
        os = any
        bits = 64
        prefetch = no
        bsfq = no
        popcnt = no
        sse = no
endif

ifeq ($&#40;ARCH&#41;,general-32&#41;
        arch = any
        os = any
        bits = 32
        prefetch = no
        bsfq = no
        popcnt = no
        sse = no
endif

# x86-section
ifeq ($&#40;ARCH&#41;,x86-64&#41;
        arch = x86_64
        os = any
        bits = 64
        prefetch = yes
        bsfq = yes
        popcnt = no
        sse = yes
endif

ifeq ($&#40;ARCH&#41;,x86-64-modern&#41;
        arch = x86_64
        os = any
        bits = 64
        prefetch = yes
        bsfq = yes
        popcnt = yes
        sse = yes
endif

ifeq ($&#40;ARCH&#41;,x86-32&#41;
        arch = i386
        os = any
        bits = 32
        prefetch = yes
        bsfq = no
        popcnt = no
        sse = yes
endif

ifeq ($&#40;ARCH&#41;,x86-32-old&#41;
        arch = i386
        os = any
        bits = 32
        prefetch = no
        bsfq = no
        popcnt = no
        sse = no
endif

#arm section
ifeq ($&#40;ARCH&#41;,armv7&#41;
        arch = armv7
        os = any
        bits = 32
        prefetch = yes
        bsfq = yes
        popcnt = no
        sse = no
endif

# osx-section
ifeq ($&#40;ARCH&#41;,osx-ppc-64&#41;
        arch = ppc64
        os = osx
        bits = 64
        prefetch = no
        bsfq = no
        popcnt = no
        sse = no
endif

ifeq ($&#40;ARCH&#41;,osx-ppc-32&#41;
        arch = ppc
        os = osx
        bits = 32
        prefetch = no
        bsfq = no
        popcnt = no
        sse = no
endif

ifeq ($&#40;ARCH&#41;,osx-x86-64&#41;
        arch = x86_64
        os = osx
        bits = 64
        prefetch = yes
        bsfq = yes
        popcnt = yes
        sse = yes
endif

ifeq ($&#40;ARCH&#41;,osx-x86-32&#41;
        arch = i386
        os = osx
        bits = 32
        prefetch = yes
        bsfq = no
        popcnt = no
        sse = yes
endif


### ==========================================================================
### Section 3. Low-level configuration
### ==========================================================================

### 3.1 Selecting compiler &#40;default = gcc&#41;
ifeq ($&#40;COMP&#41;,)
        COMP=gcc
endif

ifeq ($&#40;COMP&#41;,mingw&#41;
        comp=mingw
        CXX=g++
        profile_prepare = gcc-profile-prepare
        profile_make = gcc-profile-make
        profile_use = gcc-profile-use
        profile_clean = gcc-profile-clean
endif

ifeq ($&#40;COMP&#41;,gcc&#41;
        comp=gcc
        CXX=g++
        profile_prepare = gcc-profile-prepare
        profile_make = gcc-profile-make
        profile_use = gcc-profile-use
        profile_clean = gcc-profile-clean
endif

ifeq ($&#40;COMP&#41;,icc&#41;
        comp=icc
        CXX=icpc
        profile_prepare = icc-profile-prepare
        profile_make = icc-profile-make
        profile_use = icc-profile-use
        profile_clean = icc-profile-clean
endif

ifeq ($&#40;COMP&#41;,clang&#41;
        comp=clang
        CXX=clang++
        profile_prepare = gcc-profile-prepare
        profile_make = gcc-profile-make
        profile_use = gcc-profile-use
        profile_clean = gcc-profile-clean
endif

### 3.2 General compiler settings
CXXFLAGS = -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 $&#40;EXTRACXXFLAGS&#41;

ifeq ($&#40;comp&#41;,gcc&#41;
        CXXFLAGS += -pedantic -Wno-long-long -Wextra -Wshadow
endif

ifeq ($&#40;comp&#41;,mingw&#41;
        CXXFLAGS += -Wextra -Wshadow
endif

ifeq ($&#40;comp&#41;,icc&#41;
        CXXFLAGS += -wd383,981,1418,1419,1476,10187,10188,11505,11503 -Wcheck -Wabi -Wdeprecated -strict-ansi
endif

ifeq ($&#40;comp&#41;,clang&#41;
        CXXFLAGS += -pedantic -Wno-long-long -Wextra -Wshadow
endif

ifeq ($&#40;os&#41;,osx&#41;
        CXXFLAGS +=
endif

### 3.3 General linker settings
LDFLAGS = $&#40;EXTRALDFLAGS&#41;

ifeq ($&#40;comp&#41;,mingw&#41;
        LDFLAGS += -static-libstdc++ -static-libgcc
endif

### On mingw use Windows threads, otherwise POSIX
ifneq ($&#40;comp&#41;,mingw&#41;
        # Haiku has pthreads in its libroot, so only link it in on other platforms
        ifneq ($&#40;UNAME&#41;,Haiku&#41;
                LDFLAGS += -lpthread
        endif
endif

ifeq ($&#40;os&#41;,osx&#41;
        LDFLAGS +=
endif

### 3.4 Debugging
ifeq ($&#40;debug&#41;,no&#41;
        CXXFLAGS += -DNDEBUG
else
        CXXFLAGS += -g 
endif

### 3.5 Optimization
ifeq ($&#40;optimize&#41;,yes&#41;

        ifeq ($&#40;comp&#41;,gcc&#41;
                CXXFLAGS += -O3 -fno-tree-pre

                ifeq ($&#40;os&#41;,osx&#41;
                        ifeq ($&#40;arch&#41;,i386&#41;
                                CXXFLAGS += -mdynamic-no-pic
                        endif
                        ifeq ($&#40;arch&#41;,x86_64&#41;
                                CXXFLAGS +=
                        endif
                endif

                ifeq ($&#40;arch&#41;,armv7&#41;
                        CXXFLAGS += -fno-gcse
                endif
        endif

        ifeq ($&#40;comp&#41;,mingw&#41;
                CXXFLAGS += -O3
        endif

        ifeq ($&#40;comp&#41;,icc&#41;
                ifeq ($&#40;os&#41;,osx&#41;
                        CXXFLAGS += -fast -mdynamic-no-pic
                else
                        CXXFLAGS += -O3
                endif
        endif

        ifeq ($&#40;comp&#41;,clang&#41;
                ### -O4 requires a linker that supports LLVM's LTO
                CXXFLAGS += -O3

                ifeq ($&#40;os&#41;,osx&#41;
                        ifeq ($&#40;arch&#41;,i386&#41;
                                CXXFLAGS += -mdynamic-no-pic
                        endif
                        ifeq ($&#40;arch&#41;,x86_64&#41;
                                CXXFLAGS += -mdynamic-no-pic
                        endif
                endif
        endif
endif

### 3.6. Bits
ifeq ($&#40;bits&#41;,64&#41;
        CXXFLAGS += -DIS_64BIT
endif

### 3.7 prefetch
ifeq ($&#40;prefetch&#41;,yes&#41;
        ifeq ($&#40;sse&#41;,yes&#41;
                CXXFLAGS += -msse
                DEPENDFLAGS += -msse
        endif
else
        CXXFLAGS += -DNO_PREFETCH
endif

### 3.8 bsfq
ifeq ($&#40;bsfq&#41;,yes&#41;
        CXXFLAGS += -DUSE_BSFQ
endif

### 3.9 popcnt
ifeq ($&#40;popcnt&#41;,yes&#41;
        CXXFLAGS += -msse3 -DUSE_POPCNT
endif

### 3.10 Link Time Optimization, it works since gcc 4.5 but not on mingw.
### This is a mix of compile and link time options because the lto link phase
### needs access to the optimization flags.
ifeq ($&#40;comp&#41;,gcc&#41;
        ifeq ($&#40;optimize&#41;,yes&#41;
        ifeq ($&#40;debug&#41;,no&#41;
                GCC_MAJOR &#58;= `$&#40;CXX&#41; -dumpversion | cut -f1 -d.`
                GCC_MINOR &#58;= `$&#40;CXX&#41; -dumpversion | cut -f2 -d.`
                ifeq &#40;1,$&#40;shell expr \( $&#40;GCC_MAJOR&#41; \> 4 \) \| \( $&#40;GCC_MAJOR&#41; \= 4 \& $&#40;GCC_MINOR&#41; \>= 5 \)))
                        CXXFLAGS += -flto
                        LDFLAGS += $&#40;CXXFLAGS&#41;
                endif
        endif
        endif
endif

### ==========================================================================
### Section 4. Public targets
### ==========================================================================

help&#58;
        @echo ""
        @echo "To compile stockfish, type&#58; "
        @echo ""
        @echo "make target ARCH=arch &#91;COMP=comp&#93;"
        @echo ""
        @echo "Supported targets&#58;"
        @echo ""
        @echo "build                > Build unoptimized version"
        @echo "profile-build        > Build PGO-optimized version"
        @echo "strip                > Strip executable"
        @echo "install              > Install executable"
        @echo "clean                > Clean up"
        @echo "testrun              > Make sample run"
        @echo ""
        @echo "Supported archs&#58;"
        @echo ""
        @echo "x86-64               > x86 64-bit"
        @echo "x86-64-modern        > x86 64-bit with runtime support for popcnt instruction"
        @echo "x86-32               > x86 32-bit excluding old hardware without SSE-support"
        @echo "x86-32-old           > x86 32-bit including also very old hardware"
        @echo "osx-ppc-64           > PPC-Mac OS X 64 bit"
        @echo "osx-ppc-32           > PPC-Mac OS X 32 bit"
        @echo "osx-x86-64           > x86-Mac OS X 64 bit"
        @echo "osx-x86-32           > x86-Mac OS X 32 bit"
        @echo "armv7                > ARMv7 32 bit"
        @echo "general-64           > unspecified 64-bit"
        @echo "general-32           > unspecified 32-bit"
        @echo ""
        @echo "Supported comps&#58;"
        @echo ""
        @echo "gcc                  > Gnu compiler &#40;default&#41;"
        @echo "icc                  > Intel compiler"
        @echo "mingw                > Gnu compiler with MinGW under Windows"
        @echo "clang                > LLVM Clang compiler"
        @echo ""
        @echo "Non-standard targets&#58;"
        @echo ""
        @echo "make hpux           >  Compile for HP-UX. Compiler = aCC"
        @echo ""
        @echo "Examples. If you don't know what to do, you likely want to run&#58; "
        @echo ""
        @echo "make profile-build ARCH=x86-64    &#40;This is for 64-bit systems&#41;"
        @echo "make profile-build ARCH=x86-32    &#40;This is for 32-bit systems&#41;"
        @echo ""

build&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; config-sanity
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; all

profile-build&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; config-sanity
        @echo ""
        @echo "Step 0/4. Preparing for profile build."
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; $&#40;profile_prepare&#41;
        @echo ""
        @echo "Step 1/4. Building executable for benchmark ..."
        @touch *.cpp *.h
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; $&#40;profile_make&#41;
        @echo ""
        @echo "Step 2/4. Running benchmark for pgo-build ..."
        @$&#40;PGOBENCH&#41; > /dev/null
        @echo ""
        @echo "Step 3/4. Building final executable ..."
        @touch *.cpp
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; $&#40;profile_use&#41;
        @echo ""
        @echo "Step 4/4. Deleting profile data ..."
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; $&#40;profile_clean&#41;

strip&#58;
        strip $&#40;EXE&#41;

install&#58;
        -mkdir -p -m 755 $&#40;BINDIR&#41;
        -cp $&#40;EXE&#41; $&#40;BINDIR&#41;
        -strip $&#40;BINDIR&#41;/$&#40;EXE&#41;

clean&#58;
        $&#40;RM&#41; $&#40;EXE&#41; $&#40;EXE&#41;.exe *.o .depend *~ core bench.txt *.gcda

testrun&#58;
        @$&#40;PGOBENCH&#41;

default&#58;
        help

### ==========================================================================
### Section 5. Private targets
### ==========================================================================

all&#58; $&#40;EXE&#41; .depend

config-sanity&#58;
        @echo ""
        @echo "Config&#58;"
        @echo "debug&#58; '$&#40;debug&#41;'"
        @echo "optimize&#58; '$&#40;optimize&#41;'"
        @echo "arch&#58; '$&#40;arch&#41;'"
        @echo "os&#58; '$&#40;os&#41;'"
        @echo "bits&#58; '$&#40;bits&#41;'"
        @echo "prefetch&#58; '$&#40;prefetch&#41;'"
        @echo "bsfq&#58; '$&#40;bsfq&#41;'"
        @echo "popcnt&#58; '$&#40;popcnt&#41;'"
        @echo "sse&#58; '$&#40;sse&#41;'"
        @echo ""
        @echo "Flags&#58;"
        @echo "CXX&#58; $&#40;CXX&#41;"
        @echo "CXXFLAGS&#58; $&#40;CXXFLAGS&#41;"
        @echo "LDFLAGS&#58; $&#40;LDFLAGS&#41;"
        @echo ""
        @echo "Testing config sanity. If this fails, try 'make help' ..."
        @echo ""
        @test "$&#40;debug&#41;" = "yes" || test "$&#40;debug&#41;" = "no"
        @test "$&#40;optimize&#41;" = "yes" || test "$&#40;optimize&#41;" = "no"
        @test "$&#40;arch&#41;" = "any" || test "$&#40;arch&#41;" = "x86_64" || test "$&#40;arch&#41;" = "i386" || \
         test "$&#40;arch&#41;" = "ppc64" || test "$&#40;arch&#41;" = "ppc" || test "$&#40;arch&#41;" = "armv7"
        @test "$&#40;os&#41;" = "any" || test "$&#40;os&#41;" = "osx"
        @test "$&#40;bits&#41;" = "32" || test "$&#40;bits&#41;" = "64"
        @test "$&#40;prefetch&#41;" = "yes" || test "$&#40;prefetch&#41;" = "no"
        @test "$&#40;bsfq&#41;" = "yes" || test "$&#40;bsfq&#41;" = "no"
        @test "$&#40;popcnt&#41;" = "yes" || test "$&#40;popcnt&#41;" = "no"
        @test "$&#40;sse&#41;" = "yes" || test "$&#40;sse&#41;" = "no"
        @test "$&#40;comp&#41;" = "gcc" || test "$&#40;comp&#41;" = "icc" || test "$&#40;comp&#41;" = "mingw" || test "$&#40;comp&#41;" = "clang"

$&#40;EXE&#41;&#58; $&#40;OBJS&#41;
        $&#40;CXX&#41; -o $@ $&#40;OBJS&#41; $&#40;LDFLAGS&#41;

gcc-profile-prepare&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; gcc-profile-clean

gcc-profile-make&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; \
        EXTRACXXFLAGS='-fprofile-generate' \
        EXTRALDFLAGS='-lgcov' \
        all

gcc-profile-use&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; \
        EXTRACXXFLAGS='-fprofile-use' \
        EXTRALDFLAGS='-lgcov' \
        all

gcc-profile-clean&#58;
        @rm -rf *.gcda *.gcno bench.txt

icc-profile-prepare&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; icc-profile-clean
        @mkdir profdir

icc-profile-make&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; \
        EXTRACXXFLAGS='-prof-gen=srcpos -prof_dir ./profdir' \
        all

icc-profile-use&#58;
        $&#40;MAKE&#41; ARCH=$&#40;ARCH&#41; COMP=$&#40;COMP&#41; \
        EXTRACXXFLAGS='-prof_use -prof_dir ./profdir' \
        all

icc-profile-clean&#58;
        @rm -rf profdir bench.txt

.depend&#58;
        -@$&#40;CXX&#41; $&#40;DEPENDFLAGS&#41; -MM $&#40;OBJS&#58;.o=.cpp&#41; > $@ 2> /dev/null

-include .depend


### ==========================================================================
### Section 6. Non-standard targets
### ==========================================================================

hpux&#58;
        $&#40;MAKE&#41; \
        CXX='/opt/aCC/bin/aCC -AA +hpxstd98 -mt +O3 -DNDEBUG -DNO_PREFETCH' \
        CXXFLAGS="" \
        LDFLAGS="" \
        all

Compiler info:

Code: Select all

LZsMacPro-OSX6&#58; ~/Documents/Chess/Stockfish/src&#93;  g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin10/4.8.1/lto-wrapper
Target&#58; x86_64-apple-darwin10
Configured with&#58; ../gcc-4.8.1/configure --prefix=/opt/local --build=x86_64-apple-darwin10 --enable-languages=c,c++,objc,obj-c++,lto,fortran,java --libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48 --infodir=/opt/local/share/info --mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local --with-system-zlib --disable-nls --program-suffix=-mp-4.8 --with-gxx-include-dir=/opt/local/include/gcc48/c++/ --with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local --with-ppl=/opt/local --with-cloog=/opt/local --enable-cloog-backend=isl --disable-cloog-version-check --enable-stage1-checking --disable-multilib --enable-lto --enable-libstdcxx-time --with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld --with-ar=/opt/local/bin/ar --with-bugurl=https&#58;//trac.macports.org/newticket --with-pkgversion='MacPorts gcc48 4.8.1_1'
Thread model&#58; posix
gcc version 4.8.1 &#40;MacPorts gcc48 4.8.1_1&#41; 
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish PGO and popcnt

Post by mcostalba »

Could you please post the full output from when you run:

make profile-build ARCH=osx-x86-64
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish PGO and popcnt

Post by zullil »

mcostalba wrote:Could you please post the full output from when you run:

make profile-build ARCH=osx-x86-64
OK. I ran

Code: Select all

make profile-build ARCH=osx-x86-64 > makelog.txt
Here's makelog.txt:

Code: Select all

make ARCH=osx-x86-64 COMP=gcc config-sanity

Config&#58;
debug&#58; 'no'
optimize&#58; 'yes'
arch&#58; 'x86_64'
os&#58; 'osx'
bits&#58; '64'
prefetch&#58; 'yes'
bsfq&#58; 'yes'
popcnt&#58; 'yes'
sse&#58; 'yes'

Flags&#58;
CXX&#58; g++
CXXFLAGS&#58; -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto
LDFLAGS&#58;  -lpthread  -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11  -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto

Testing config sanity. If this fails, try 'make help' ...


Step 0/4. Preparing for profile build.
make ARCH=osx-x86-64 COMP=gcc gcc-profile-prepare
make ARCH=osx-x86-64 COMP=gcc gcc-profile-clean

Step 1/4. Building executable for benchmark ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-make
make ARCH=osx-x86-64 COMP=gcc \
        EXTRACXXFLAGS='-fprofile-generate' \
        EXTRALDFLAGS='-lgcov' \
        all
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o book.o book.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o endgame.o endgame.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o evaluate.o evaluate.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o notation.o notation.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o pawns.o pawns.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o position.o position.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o ucioption.o ucioption.cpp
g++ -o stockfish benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o notation.o pawns.o position.o search.o thread.o timeman.o tt.o uci.o ucioption.o -lgcov -lpthread  -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-generate -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto

Step 2/4. Running benchmark for pgo-build ...

Step 3/4. Building final executable ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-use
make ARCH=osx-x86-64 COMP=gcc \
        EXTRACXXFLAGS='-fprofile-use' \
        EXTRALDFLAGS='-lgcov' \
        all
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o bitboard.o bitboard.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o book.o book.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o endgame.o endgame.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o evaluate.o evaluate.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o notation.o notation.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o pawns.o pawns.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o position.o position.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o ucioption.o ucioption.cpp
g++ -o stockfish benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o notation.o pawns.o position.o search.o thread.o timeman.o tt.o uci.o ucioption.o -lgcov -lpthread  -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto

Step 4/4. Deleting profile data ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-clean
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish PGO and popcnt --- compiler issue

Post by zullil »

If I use gcc-4.6 rather than gcc-4.8, the binary that results from

Code: Select all

make profile-build ARCH=osx-x86-64
does contain the popcnt instruction (and is a bit faster than the gcc-4.8 binary). Since gcc-4.6 doesn't support c++11, I switched to the master branch. I also needed to disable LTO, since I was getting an error at link time using gcc-4.6.

Code: Select all

LZsMacPro-OSX6&#58; ~/Documents/Chess/Test/Stockfish/src&#93; ./stockfish bench

===========================
Total time &#40;ms&#41; &#58; 3578
Nodes searched  &#58; 4558173
Nodes/second    &#58; 1273944

Code: Select all

LZsMacPro-OSX6&#58; ~/Documents/Chess/Test/Stockfish/src&#93; otool -tvQ stockfish | grep popcnt
0000000100002923	popcnt	%r8,%rsi
0000000100002b21	popcnt	%r10,%rax
0000000100004650	popcnt	%rax,%r9
0000000100007aac	popcnt	%rax,%rax
0000000100007ce3	popcnt	%r10,%r10
0000000100008145	popcnt	%rax,%rax
0000000100008474	popcnt	%rax,%rax
00000001000085fe	popcnt	%rdx,%rdx
0000000100008820	popcnt	%r8,%r8
0000000100008b56	popcnt	%rdx,%r14
0000000100008d61	popcnt	%rdx,%r9
0000000100008f07	popcnt	%r9,%r9
00000001000096c7	popcnt	%r15,%r13
00000001000096d3	popcnt	%r8,%rdx
0000000100009b9f	popcnt	%r13,%r13
0000000100009ccb	popcnt	%rsi,%r8
0000000100009cdf	popcnt	%rcx,%rsi
0000000100009da9	popcnt	%rcx,%rcx
0000000100009e82	popcnt	%rcx,%rdx
000000010000a119	popcnt	%rcx,%r14
000000010000a1a9	popcnt	%rsi,%r14
000000010000a263	popcnt	%rcx,%r12
000000010000a4a5	popcnt	%r9,%r14
000000010000a4b7	popcnt	%rdx,%r14
000000010000a52d	popcnt	%rcx,%rax
000000010000a7c9	popcnt	%rcx,%rcx
000000010000a80d	popcnt	%rcx,%rax
000000010000aac5	popcnt	%r8,%r8
000000010000aad0	popcnt	%rax,%r15
000000010000abc5	popcnt	%r10,%rdx
000000010000ac13	popcnt	%r14,%rdx
000000010000aca6	popcnt	%rcx,%r14
000000010000acbf	popcnt	%r9,%r9
000000010000ad36	popcnt	%rcx,%r10
000000010000ad4d	popcnt	%r8,%r8
000000010000ada0	popcnt	%rsi,%r11
000000010000ae3e	popcnt	%rdx,%r14
000000010000af74	popcnt	%r10,%r10
000000010000b03f	popcnt	%rdx,%r11
000000010000b10a	popcnt	%rcx,%rcx
000000010000b17b	popcnt	%rdx,%rax
000000010000b1b5	popcnt	%rcx,%rcx
000000010000b1de	popcnt	%rdi,%rdi
000000010000b2c9	popcnt	%rcx,%rcx
000000010000b2f3	popcnt	%r9,%r8
000000010000b305	popcnt	%r10,%r10
000000010000b3af	popcnt	%rcx,%rsi
000000010000b463	popcnt	%rcx,%rcx
000000010000ba09	popcnt	%rdx,%rax
000000010000baac	popcnt	%rdx,%rax
000000010000bb89	popcnt	%rsi,%rsi
000000010000bba6	popcnt	%rax,%rax
000000010000bd5d	popcnt	%rcx,%r11
000000010000bd6f	popcnt	%rdx,%rdx
000000010000c083	popcnt	%rcx,%rcx
000000010000c095	popcnt	%r11,%r11
000000010000c119	popcnt	%r15,%rax
000000010000c284	popcnt	%rcx,%r9
000000010000c2a2	popcnt	%rax,%rax
000000010000c30a	popcnt	%rsi,%r15
000000010000c420	popcnt	%rcx,%rcx
000000010000c439	popcnt	%rdx,%rdx
000000010000c5d3	popcnt	%rcx,%r15
000000010000c5ec	popcnt	%rdx,%r15
000000010000c869	popcnt	%rcx,%r13
000000010000c883	popcnt	%r11,%r11
000000010000c901	popcnt	%r13,%rax
000000010000ca74	popcnt	%rcx,%r8
000000010000ca92	popcnt	%rax,%rax
000000010000caf2	popcnt	%rsi,%r11
000000010000cc4a	popcnt	%rsi,%r9
000000010000ccbc	popcnt	%rax,%rdi
000000010000cd2e	popcnt	%rcx,%r12
000000010000cdea	popcnt	%rsi,%r14
000000010000ce05	popcnt	%rdi,%rdi
000000010000ce1d	popcnt	%rcx,%r9
000000010000ce43	popcnt	%rdx,%rdx
000000010000cf95	popcnt	%rcx,%r9
000000010000d006	popcnt	%rdx,%r11
000000010000d072	popcnt	%rcx,%rcx
000000010000d138	popcnt	%r9,%r11
000000010000d153	popcnt	%r10,%r10
000000010000d16d	popcnt	%rcx,%rcx
000000010000d192	popcnt	%rdx,%rsi
00000001000163ee	popcnt	%r12,%r11
0000000100016684	popcnt	%r15,%r12
000000010001694f	popcnt	%r15,%r15
0000000100016961	popcnt	%rbp,%r13
00000001000169ac	popcnt	%r13,%rbp
00000001000169b4	popcnt	%rbx,%r10
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish PGO and popcnt --- compiler issue

Post by zullil »

gcc-4.7 behaves like gcc-4.8 --- no popcnt in the resulting binary if PGO is used.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Stockfish PGO and popcnt

Post by mcostalba »

I see nothing wrong in the log output, in particular each file is compiled with USE_POPCNT flag defined, as it should be:

Code: Select all

g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -std=c++11 -fprofile-use -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre  -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT -flto   -c -o ucioption.o ucioption.cpp
Could you please post the output of:

Code: Select all

make build ARCH=osx-x86-64
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Stockfish PGO and popcnt --- compiler issue

Post by bob »

zullil wrote:gcc-4.7 behaves like gcc-4.8 --- no popcnt in the resulting binary if PGO is used.
For the record, are you talking about in the PGO executable, or the re-compile after the PGO has completed???

Also, how does stockfish access the popcnt instruction? An intrinsic or inline asm ala' crafty???
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Stockfish PGO and popcnt --- compiler issue

Post by zullil »

bob wrote:
zullil wrote:gcc-4.7 behaves like gcc-4.8 --- no popcnt in the resulting binary if PGO is used.
For the record, are you talking about in the PGO executable, or the re-compile after the PGO has completed???

Also, how does stockfish access the popcnt instruction? An intrinsic or inline asm ala' crafty???
There is no popcnt instruction in the final executable, the one that results from compiling with -fprofile-use. I assume the popcnt instruction is present in the initial executable, but I'll check anyway.

Inline assembly:

Code: Select all

  __asm__("popcnt %1, %0" &#58; "=r" &#40;b&#41; &#58; "r" &#40;b&#41;);
See https://github.com/mcostalba/Stockfish/ ... bitcount.h