fast mobility count through hashing

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: fast mobility count through hashing

Post by bob »

mcostalba wrote:
Pradu wrote:
bob wrote:Actually I believe it is supported on any processor that has the sse 4.2 flag set when you use the CPUID instruction.
There's even a seperate flag just for popcnt so that it can work with CPUs that have popcnt but not SSE4:

Code: Select all

#ifdef _MSC_VER
    #include <intrin.h>
#elif defined&#40; __GNUC__ )
    static __inline__ __attribute__(&#40;always_inline&#41;) void __cpuid&#40;int CPUInfo&#91;&#93;, const int InfoType&#41;
    &#123;
        __asm__ __volatile__
        (
            "cpuid"
            &#58; "=a" &#40;CPUInfo&#91;0&#93;), "=b" &#40;CPUInfo&#91;1&#93;), "=c" &#40;CPUInfo&#91;2&#93;), "=d" &#40;CPUInfo&#91;3&#93;)
            &#58; "a" &#40;InfoType&#41;
        );
    &#125;
#else
    #error Unsupported Compiler
#endif

...

int CPUInfo&#91;4&#93;;
__cpuid&#40;CPUInfo, 1&#41;;
int has_popcnt = &#40;CPUInfo&#91;2&#93;>>23&#41;&1;
I have one question regarding runtime check of CPU capabilities.

How this can be used for a chess engine that is supposed to be released as a binary and downloaded and used by anyone on any computer ?

I mean, I cannot use a function pointer to redirect at runtime on a fallback standard C code implementation if host CPU does not support POPCNT. That would be far too slow. This kind of functions must be inlined. So, when I check that host CPU has or not has the popcnt capability what can I do ?

Chess engine is not supposed to be compiled on _any_ pc that will run it, but is compiled once with the best optimization and distributed.

The only possibility I foreseen is two create two compiles, one for CPU with POPCNT and another for CPU without POPCNT, but considering that we need also another two versions for 32 and 64 bits we are ramping up fast on this combinatorial escalation.
I agree with you that this would not be useful inside the engine. But you could then release two binaries, one using hardware popcnt and one using the traditional c=0; while (p &= p-1) c++; type loop. Then you could, in the popcnt binary, at least verify that it will work properly as the popcnt instruction probably won't immediately crash your program on a non-popcnt machine, as opposed to just starting a search and going nuts due to no popcnt in hardware.

Much nicer for a program to tell the user he is running a program that needs popcnt, but is not using hardware that supports it, rather than just crashing and burning mysteriously.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

bob wrote: I agree with you that this would not be useful inside the engine. But you could then release two binaries, one using hardware popcnt and one using the traditional c=0; while (p &= p-1) c++; type loop. Then you could, in the popcnt binary, at least verify that it will work properly as the popcnt instruction probably won't immediately crash your program on a non-popcnt machine, as opposed to just starting a search and going nuts due to no popcnt in hardware.
Yes, as a possible compromise between two separates binaries one with popcnt and the other without and your option, a bundle of the two binaries in one package we could think of a runtime check with redirection done at much higher level then the bit count routine.

As example in the same source we could have an evaluation function like this

Code: Select all

int evaluate&#40;)
&#123;
    return has_popcnt&#40;) ? evaluate_popcnt&#40;) &#58; evaluate_std&#40;);
&#125;
In this case the check is done at the evaluate function call level, so its cost is almost zero and below that you have two identical evaluate_xxxx() functions that differ only at the bottom level where one will call popcnt() intrinsic and the other the C version. It works...but is not nice...and personally I don't like to replicate 99% of code of evaluation in two functions....perhaps templetizing the evaluation you don't need to write in two places every time you change something in your evaluation:

Code: Select all

template<bool HasPopCnt>
int do_evaluate&#40;);

int evaluate&#40;)
&#123;
    return has_popcnt&#40;) ? do_evaluate<true>() &#58; do_evaluate<false>();
&#125;
But in the meantime I have, perhaps ;-) , found a way to quickly count rook mobility bitboards.

Stay tuned....
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: fast mobility count through hashing

Post by mcostalba »

Gerd Isenberg wrote: As a first exercise you may try to find a fast hash function in 32-bit mode simply for some 12 first rank attacks of a center rook.
Here we go....

This is for rooks attacks without taboo squares.

Code: Select all

#include <iostream>
#include <stdlib.h>
#include <sstream>
#include <string>
#include <vector>

using namespace std;

typedef unsigned long long uint64;


// Core hashing function, here you can tweak the hashing formula

#define USE_32_BIT_MULTIPLICATIONS

int transform&#40;uint64 b, uint64 magic, int bits&#41; &#123;
#if defined&#40;USE_32_BIT_MULTIPLICATIONS&#41;
        return
                &#40;unsigned&#41;(&#40;int&#41;b*&#40;int&#41;magic ^ &#40;int&#41;&#40;b>>32&#41;*&#40;int&#41;&#40;magic>>32&#41;) >> &#40;32-bits&#41;;
#else
        return &#40;int&#41;(&#40;b * magic&#41; >> &#40;64 - bits&#41;);
#endif
&#125;


// Variables

struct Mobility &#123;
    Mobility&#40;uint64 key, int v&#41; &#58; b&#40;key&#41;, value&#40;v&#41; &#123;&#125;
    uint64 b;
    int value;
&#125;;

vector<Mobility> mobilityVec;
vector<int> hashes;
uint64 ClearMaskBB&#91;65&#93;, SetMaskBB&#91;65&#93;;


// Functions

uint64 random_uint64&#40;) &#123;
    uint64 u1, u2, u3, u4;
    u1 = &#40;uint64&#41;&#40;random&#40;)) & 0xFFFF; u2 = &#40;uint64&#41;&#40;random&#40;)) & 0xFFFF;
    u3 = &#40;uint64&#41;&#40;random&#40;)) & 0xFFFF; u4 = &#40;uint64&#41;&#40;random&#40;)) & 0xFFFF;
    return u1 | &#40;u2 << 16&#41; | &#40;u3 << 32&#41; | &#40;u4 << 48&#41;;
&#125;


uint64 random_uint64_fewbits&#40;) &#123;
    return random_uint64&#40;) & random_uint64&#40;) & random_uint64&#40;);
&#125;


int count_1s&#40;uint64 b&#41; &#123;
    int r;
    for&#40;r = 0; b; r++, b &= b - 1&#41; &#123;&#125;
    return r;
&#125;


void init_masks&#40;) &#123;

    for &#40;int s = 0; s <= 64; s++)
    &#123;
        SetMaskBB&#91;s&#93; = &#40;1ULL << s&#41;;
        ClearMaskBB&#91;s&#93; = ~SetMaskBB&#91;s&#93;;
    &#125;
&#125;


void clear_bit&#40;uint64 *b, int s&#41; &#123;
    *b &= ClearMaskBB&#91;s&#93;;
&#125;


void init_mobility_vector&#40;int sq&#41; &#123;

    int rk = sq/8, fl = sq%8;

    mobilityVec.clear&#40;);
    uint64 b1, b2, b3, b4, start;
    uint64 pivot1, pivot2, pivot3, pivot4;

    b1 = b2 = b3 = b4 = 0ULL;
    pivot4 = start = 1 << &#40;rk*8 + fl&#41;;

    for &#40;int rup = rk; rup < 8; rup++)
    &#123;
        b4 |= pivot4;
        pivot4 |= &#40;pivot4 << 8&#41;;
        b3 = b4;
        pivot3 = start;
        for &#40;int rdw = rk; rdw >= 0; rdw--)
        &#123;
            b3 |= pivot3;
            pivot3 |= &#40;pivot3 >> 8&#41;;
            b2 = b3;
            pivot2 = start;
            for &#40;int fr = fl; fr < 8; fr++)
            &#123;
                b2 |= pivot2;
                pivot2 |= &#40;pivot2 << 1&#41;;
                b1 = b2;
                pivot1 = start;
                for &#40;int fle = fl; fle >= 0; fle--)
                &#123;
                    b1 |= pivot1;
                    pivot1 |= &#40;pivot1 >> 1&#41;;

                    clear_bit&#40;&b1, sq&#41;; // Remove our square
                    if (!b1&#41;
                        continue;

                    // Finally save mobility bitboard
                    mobilityVec.push_back&#40;Mobility&#40;b1, count_1s&#40;b1&#41;));
                &#125;
            &#125;
        &#125;
    &#125;
&#125;


uint init&#40;int sq, int idx_size&#41; &#123;

    init_masks&#40;);
    init_mobility_vector&#40;sq&#41;;
    hashes.resize&#40;1 << idx_size&#41;;
    hashes.clear&#40;);
    return mobilityVec.size&#40;);
&#125;


uint64 find_magic&#40;int idx_size&#41; &#123;

  uint64 magic;
  int k, failed;
  uint i, longest = 0;

  for &#40;k = 0; k < 100000000; k++)
  &#123;
      magic = random_uint64_fewbits&#40;);

      hashes.clear&#40;);
      for &#40;i = 0, failed = 0; !failed && i < mobilityVec.size&#40;); i++)
      &#123;
          int hash = transform&#40;mobilityVec&#91;i&#93;.b, magic, idx_size&#41;;

          // Verify hash is new or belongs to mobilities with same bit count
          if (!hashes&#91;hash&#93;)
              hashes&#91;hash&#93; = mobilityVec&#91;i&#93;.value;
          else
              failed = &#40;hashes&#91;hash&#93; != mobilityVec&#91;i&#93;.value&#41;;
      &#125;
      if &#40;i > longest&#41;
      &#123;
          longest = i;
          cout << "Longest after " << k << " iterations is " << longest << endl;
      &#125;
      if (!failed&#41;
          return magic;
  &#125;
  return 0ULL;
&#125;

int main&#40;int argc, char *argv&#91;&#93;) &#123;

  int square, idx_size;
  uint64 magic;

  if &#40;argc != 3&#41;
  &#123;
      cout << "\nUsage&#58; mc <square> <index size>\n" << endl;
      return 0;
  &#125;
  istringstream is1&#40;argv&#91;1&#93;);
  istringstream is2&#40;argv&#91;2&#93;);
  is1 >> square;
  is2 >> idx_size;

  if &#40;square > 63 || idx_size > 63 || square < 0 || idx_size < 0&#41;
  &#123;
      cout << "\nSquare and index size must be positive < 64\n" << endl;
      return 0;
  &#125;

  uint vecSize = init&#40;square, idx_size&#41;;

  cout << "\nFinding magic for square " << square
       << " with index size " << idx_size << " vector size is "
       << vecSize << "...\n" << std&#58;&#58;endl;

  magic = find_magic&#40;idx_size&#41;;

  if &#40;magic&#41;
      cout << "\nMagic is 0x" << hex << magic << endl << endl;
  else
      cout << "\nSorry, magic not found." << endl << endl;

  return 0;
&#125;
Compile and test run...

Code: Select all

$ g++ main.cpp -O3 -o mc
$ ./mc 56 10

Finding magic for square 56 with index size 10 vector size is 64...

Longest after 0 iterations is 16
Longest after 34 iterations is 64

Magic is 0x12002020b001

$ ./mc 32 8

Finding magic for square 32 with index size 8 vector size is 160...

Longest after 0 iterations is 160

Magic is 0x4020100800c40401

$