Stockfish PGO and popcnt

Discussion of chess software programming and technical issues.

Moderator: Ras

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Completely baffling

Post by zullil »

syzygy wrote:
zullil wrote:No need for the snarky "thanks for wasting our time" either.
Well, you said you were "summarizing", not that you were trying another approach.

I still don't see how it can fail when using the __asm__ implementation of popcount.

What you could try is to include a line #warning USING POPCNT in bitcount.h, say between lines 99 and 100. If during compilation you don't see USING POPCNT warnings, then somehow the __asm__ is not being used.
OK, I was unclear.

Thanks for the suggestion. Will try it and see.
syzygy
Posts: 5869
Joined: Tue Feb 28, 2012 11:56 pm

Re: Completely baffling

Post by syzygy »

What you could also try as a last resort is removing all alternative implementations from bitcount.h:

Code: Select all

/*
  Stockfish, a UCI chess playing engine derived from Glaurung 2.1
  Copyright (C) 2004-2008 Tord Romstad (Glaurung author)
  Copyright (C) 2008-2013 Marco Costalba, Joona Kiiski, Tord Romstad

  Stockfish is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.


  Stockfish is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/

#if !defined(BITCOUNT_H_INCLUDED)
#define BITCOUNT_H_INCLUDED

#include <cassert>
#include "types.h"

enum BitCountType {
  CNT_64,
  CNT_64_MAX15,
  CNT_32,
  CNT_32_MAX15,
  CNT_HW_POPCNT
};

/// Determine at compile time the best popcount<> specialization according if
/// platform is 32 or 64 bits, to the maximum number of nonzero bits to count
/// and if hardware popcnt instruction is available.
const BitCountType Full  = CNT_HW_POPCNT;
const BitCountType Max15 = CNT_HW_POPCNT;

/// popcount() counts the number of nonzero bits in a bitboard
template<BitCountType type>
inline int popcount(Bitboard b) {
  __asm__("popcnt %1, %0" : "=r" (b) : "r" (b));
  return b;
}

#endif // !defined(BITCOUNT_H_INCLUDED)
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Completely baffling

Post by zullil »

It appears that popcnt is being used. The question now becomes why can't I find a popcnt instruction or opcode in the binary!

Code: Select all

LZsMacPro-OSX6: ~/Documents/Chess/Stockfish/src] make profile-build ARCH=osx-x86-64
make ARCH=osx-x86-64 COMP=gcc config-sanity

Config:
debug: 'no'
optimize: 'yes'
arch: 'x86_64'
os: 'osx'
bits: '64'
prefetch: 'yes'
bsfq: 'yes'
popcnt: 'yes'
sse: 'yes'

Flags:
CXX: g++
CXXFLAGS: -Wall -Wcast-qual -fno-exceptions -fno-rtti  -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT 
LDFLAGS:  -lpthread  -Wall -Wcast-qual -fno-exceptions -fno-rtti  -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT 

Testing config sanity. If this fails, try 'make help' ...


Step 0/4. Preparing for profile build.
make ARCH=osx-x86-64 COMP=gcc gcc-profile-prepare
make ARCH=osx-x86-64 COMP=gcc gcc-profile-clean

Step 1/4. Building executable for benchmark ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-make
make ARCH=osx-x86-64 COMP=gcc \
	EXTRACXXFLAGS='-fprofile-generate' \
	EXTRALDFLAGS='-lgcov' \
	all
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o bitboard.o bitboard.cpp
In file included from bitboard.cpp:25:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o book.o book.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o endgame.o endgame.cpp
In file included from endgame.cpp:24:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o evaluate.o evaluate.cpp
In file included from evaluate.cpp:25:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o notation.o notation.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o pawns.o pawns.cpp
In file included from pawns.cpp:24:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o position.o position.cpp
In file included from position.cpp:27:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o ucioption.o ucioption.cpp
g++ -o stockfish benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o notation.o pawns.o position.o search.o thread.o timeman.o tt.o uci.o ucioption.o -lgcov -lpthread  -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-generate -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT 

Step 2/4. Running benchmark for pgo-build ...

Position: 1/16

Position: 2/16

Position: 3/16

Position: 4/16

Position: 5/16

Position: 6/16

Position: 7/16

Position: 8/16

Position: 9/16

Position: 10/16

Position: 11/16

Position: 12/16

Position: 13/16

Position: 14/16

Position: 15/16

Position: 16/16

===========================
Total time (ms) : 27141
Nodes searched  : 20189706
Nodes/second    : 743882

Step 3/4. Building final executable ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-use
make ARCH=osx-x86-64 COMP=gcc \
	EXTRACXXFLAGS='-fprofile-use' \
	EXTRALDFLAGS='-lgcov' \
	all
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o benchmark.o benchmark.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o bitbase.o bitbase.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o bitboard.o bitboard.cpp
In file included from bitboard.cpp:25:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o book.o book.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o endgame.o endgame.cpp
In file included from endgame.cpp:24:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o evaluate.o evaluate.cpp
In file included from evaluate.cpp:25:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o main.o main.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o material.o material.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o misc.o misc.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o movegen.o movegen.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o movepick.o movepick.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o notation.o notation.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o pawns.o pawns.cpp
In file included from pawns.cpp:24:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o position.o position.cpp
In file included from position.cpp:27:0:
bitcount.h:100:2: warning: #warning is a GCC extension [enabled by default]
 #warning USING POPCNT
  ^
bitcount.h:100:2: warning: #warning USING POPCNT [-Wcpp]
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o search.o search.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o thread.o thread.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o timeman.o timeman.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o tt.o tt.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o uci.o uci.cpp
g++ -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT    -c -o ucioption.o ucioption.cpp
g++ -o stockfish benchmark.o bitbase.o bitboard.o book.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o notation.o pawns.o position.o search.o thread.o timeman.o tt.o uci.o ucioption.o -lgcov -lpthread  -Wall -Wcast-qual -fno-exceptions -fno-rtti -fprofile-use -ansi -pedantic -Wno-long-long -Wextra -Wshadow  -DNDEBUG -O3 -fno-tree-pre -DIS_64BIT -msse -DUSE_BSFQ -msse3 -DUSE_POPCNT 

Step 4/4. Deleting profile data ...
make ARCH=osx-x86-64 COMP=gcc gcc-profile-clean
LZsMacPro-OSX6: ~/Documents/Chess/Stockfish/src] otool -tvQ stockfish | grep popcnt
LZsMacPro-OSX6: ~/Documents/Chess/Stockfish/src] 
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Completely baffling

Post by zullil »

syzygy wrote:What you could also try as a last resort is removing all alternative implementations from bitcount.h:

Code: Select all

/*
  Stockfish, a UCI chess playing engine derived from Glaurung 2.1
  Copyright (C) 2004-2008 Tord Romstad (Glaurung author)
  Copyright (C) 2008-2013 Marco Costalba, Joona Kiiski, Tord Romstad

  Stockfish is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.


  Stockfish is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/

#if !defined(BITCOUNT_H_INCLUDED)
#define BITCOUNT_H_INCLUDED

#include <cassert>
#include "types.h"

enum BitCountType {
  CNT_64,
  CNT_64_MAX15,
  CNT_32,
  CNT_32_MAX15,
  CNT_HW_POPCNT
};

/// Determine at compile time the best popcount<> specialization according if
/// platform is 32 or 64 bits, to the maximum number of nonzero bits to count
/// and if hardware popcnt instruction is available.
const BitCountType Full  = CNT_HW_POPCNT;
const BitCountType Max15 = CNT_HW_POPCNT;

/// popcount() counts the number of nonzero bits in a bitboard
template<BitCountType type>
inline int popcount(Bitboard b) {
  __asm__("popcnt %1, %0" : "=r" (b) : "r" (b));
  return b;
}

#endif // !defined(BITCOUNT_H_INCLUDED)
Yes, I tried that too. :D
syzygy
Posts: 5869
Joined: Tue Feb 28, 2012 11:56 pm

Re: Completely baffling

Post by syzygy »

zullil wrote:It appears that popcnt is being used. The question now becomes why can't I find a popcnt instruction or opcode in the binary!
My only explanation is that otool is somehow not seeing the code sections that have the popcnt instructions. It may not be a very good explanation, but I don't see any other.

On my (linux) system the equivalent tool is objdump. Maybe you have that too and you could try objdump -d ./stockfish | grep popcnt?
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Completely baffling

Post by zullil »

syzygy wrote:
zullil wrote:It appears that popcnt is being used. The question now becomes why can't I find a popcnt instruction or opcode in the binary!
My only explanation is that otool is somehow not seeing the code sections that have the popcnt instructions. It may not be a very good explanation, but I don't see any other.

On my (linux) system the equivalent tool is objdump. Maybe you have that too and you could try objdump -d ./stockfish | grep popcnt?
Yes, I considered doing that. But I did search the binary with a hex editor and couldn't locate the popcnt opcodes. (I had no problem locating them in the non-pgo'd binary.) Still mysterious.

Thanks for the help.
syzygy
Posts: 5869
Joined: Tue Feb 28, 2012 11:56 pm

Re: Completely baffling

Post by syzygy »

zullil wrote:
syzygy wrote:What you could also try as a last resort is removing all alternative implementations from bitcount.h:

Code: Select all

/*
  Stockfish, a UCI chess playing engine derived from Glaurung 2.1
  Copyright (C) 2004-2008 Tord Romstad (Glaurung author)
  Copyright (C) 2008-2013 Marco Costalba, Joona Kiiski, Tord Romstad

  Stockfish is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.


  Stockfish is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/

#if !defined(BITCOUNT_H_INCLUDED)
#define BITCOUNT_H_INCLUDED

#include <cassert>
#include "types.h"

enum BitCountType {
  CNT_64,
  CNT_64_MAX15,
  CNT_32,
  CNT_32_MAX15,
  CNT_HW_POPCNT
};

/// Determine at compile time the best popcount<> specialization according if
/// platform is 32 or 64 bits, to the maximum number of nonzero bits to count
/// and if hardware popcnt instruction is available.
const BitCountType Full  = CNT_HW_POPCNT;
const BitCountType Max15 = CNT_HW_POPCNT;

/// popcount() counts the number of nonzero bits in a bitboard
template<BitCountType type>
inline int popcount(Bitboard b) {
  __asm__("popcnt %1, %0" : "=r" (b) : "r" (b));
  return b;
}

#endif // !defined(BITCOUNT_H_INCLUDED)
Yes, I tried that too. :D
If you've tried this and the executable worked correctly, then it must have the popcnt instruction.

Just for fun you could comment out the __asm__ line and confirm that stockfish crashes.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Completely baffling

Post by zullil »

syzygy wrote:
zullil wrote:
syzygy wrote:What you could also try as a last resort is removing all alternative implementations from bitcount.h:

Code: Select all

/*
  Stockfish, a UCI chess playing engine derived from Glaurung 2.1
  Copyright (C) 2004-2008 Tord Romstad (Glaurung author)
  Copyright (C) 2008-2013 Marco Costalba, Joona Kiiski, Tord Romstad

  Stockfish is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.


  Stockfish is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see <http://www.gnu.org/licenses/>.
*/

#if !defined(BITCOUNT_H_INCLUDED)
#define BITCOUNT_H_INCLUDED

#include <cassert>
#include "types.h"

enum BitCountType {
  CNT_64,
  CNT_64_MAX15,
  CNT_32,
  CNT_32_MAX15,
  CNT_HW_POPCNT
};

/// Determine at compile time the best popcount<> specialization according if
/// platform is 32 or 64 bits, to the maximum number of nonzero bits to count
/// and if hardware popcnt instruction is available.
const BitCountType Full  = CNT_HW_POPCNT;
const BitCountType Max15 = CNT_HW_POPCNT;

/// popcount() counts the number of nonzero bits in a bitboard
template<BitCountType type>
inline int popcount(Bitboard b) {
  __asm__("popcnt %1, %0" : "=r" (b) : "r" (b));
  return b;
}

#endif // !defined(BITCOUNT_H_INCLUDED)
Yes, I tried that too. :D
If you've tried this and the executable worked correctly, then it must have the popcnt instruction.
Yes, I knew it had to be there. But I couldn't/can't find it! I even ran the binary on my core 2 duo Macbook, which complained about an "illegal instruction", since that cpu doesn't have popcnt.
syzygy
Posts: 5869
Joined: Tue Feb 28, 2012 11:56 pm

Re: Completely baffling

Post by syzygy »

To see where the popcnt instructions go, let's insert one in a place where we can easily find it back.

Add a few lines to main.c:

Code: Select all

(...)
#include "bitcount.h"

int main(int argc, char* argv[]) {

  std::cout << "popcount(123456) = " << popcount<Full>(123456) << std::endl;

  std::cout << engine_info() << std::endl;

(...)
Now compile and dump the output of otool to a text file (otool stockfish >assembly.txt).
Load the text file into an editor and look either for main or for $0x1e240 (which is 123456):

Code: Select all

000000000040b820 <main>:
  40b820:	55                   	push   %rbp
  40b821:	48 89 e5             	mov    %rsp,%rbp
  40b824:	41 57                	push   %r15
  40b826:	41 56                	push   %r14
  40b828:	41 55                	push   %r13
  40b82a:	41 54                	push   %r12
  40b82c:	53                   	push   %rbx
  40b82d:	bb 40 e2 01 00       	mov    $0x1e240,%ebx
  40b832:	f3 48 0f b8 db       	popcnt %rbx,%rbx
This is from objdump, so with otool it will look a bit different, but it shouldn't be hard to locate this part.

With make profile-build I get (gcc 4.8.1, on Linux):

Code: Select all

000000000040b1a0 <main>:
  40b1a0:	55                   	push   %rbp
  40b1a1:	48 89 e5             	mov    %rsp,%rbp
  40b1a4:	41 57                	push   %r15
  40b1a6:	41 56                	push   %r14
  40b1a8:	41 55                	push   %r13
  40b1aa:	41 54                	push   %r12
  40b1ac:	53                   	push   %rbx
  40b1ad:	bb 40 e2 01 00       	mov    $0x1e240,%ebx
  40b1b2:	f3 4c 0f b8 e3       	popcnt %rbx,%r12
So practically the same.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Completely baffling

Post by bob »

syzygy wrote:
zullil wrote:It appears that popcnt is being used. The question now becomes why can't I find a popcnt instruction or opcode in the binary!
My only explanation is that otool is somehow not seeing the code sections that have the popcnt instructions. It may not be a very good explanation, but I don't see any other.

On my (linux) system the equivalent tool is objdump. Maybe you have that too and you could try objdump -d ./stockfish | grep popcnt?
Or you can manually type the gcc compile command (including prof_use) and adding the -s and -c flags. Then you should get a formatted asm file where you can see if there are any popcnt's to be found...