Rewriting RomiChess from scratch for SMP

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Rewriting RomiChess from scratch for SMP

Post by Michael Sherwin » Tue Apr 07, 2020 5:37 pm

Something I've wanted to do for 15 years but was unable to do for reasons that there is no need to rehash. Anyway, if I do not do it now I never will. I have at least 3 months of food on hand so I do not have to leave the house. The code will be much cleaner or so that is the goal. The move generation will have a different philosophy. It will be based on, 'search one ply then a quiescence search. This should have the benefits of not requiring a legality check for ply 1 gen_all_moves as the previous move is guaranteed to be legal and each move will have a score. This seems efficient as this has to be done on all leaf nodes regardless. Doing it for all internal nodes seems trivial especially since the scores can help in move ordering. It will be using SISSY bitboards (https://www.chessprogramming.org/SISSY_Bitboards. At first the slightly faster 7 lookup SISSY queen will drive Q,R and B, just because that is what I have working so far. Once the Bishop and fancy rook folding is implemented I hope that it will be quite competitive with fancy magic. The rook will then only be 4 lookups and the bishop 3 lookups. In Q and P endgames I believe it will already outperform fancy magic but that remains to be seen. I've just started. Once the rewrite is further along it will be put on Github or whatever. Also the first working version will be single threaded but will be setup for multithreaded from the start. This is what I have so far. Any comments appreciated!

Code: Select all

// King Hunter
// It is written for MSVC 2019 Community

#include <intrin.h>

#define s08 signed char;
#define u08 unsigned char
#define s32 int
#define u32 unsigned long
#define u64 unsigned long long

#define one 1ull
#define OOO 0

enum { BLACK, WHITE };

enum { OO, WP, WN, WB, WR, WQ, WC, WK, BP, BN, BB, BR, BQ, BC, BK };

enum { UNO, TWO, CEP };

struct threads {
  s32 wtm;
  u64 ebb;
  u64 pieces[2];
  u64 brdbb[64];
  s32 board[64];
};

struct moves {
  u08 fs;
  u08 ts;
  u08 ty;
  u08 id;
  s32 sc;
  s32 st;
};

threads t[32];

s32 wptyp[] = { OOO, OOO, OOO, OOO, OOO, OOO, OOO, OOO,
                TWO, TWO, TWO, TWO, TWO, TWO, TWO, TWO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                CEP, CEP, CEP, CEP, CEP, CEP, CEP, CEP,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO };

s32 bptyp[] = { OOO, OOO, OOO, OOO, OOO, OOO, OOO, OOO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                CEP, CEP, CEP, CEP, CEP, CEP, CEP, CEP,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                TWO, TWO, TWO, TWO, TWO, TWO, TWO, TWO };

u64 wpmoves[64];
u64 wpcapts[64];
u64 bpmoves[64];
u64 bpcapts[64];
u64 ktmoves[64];
u64 kgmoves[64];
u64 above[64];
u64 below[64];

u64 qss[64][256][8];
u64 bob[64];
u64 rob[64];

u64 wcastle(s32 ti) {
  u64 bb;

  return bb;
};

u64 bcastle(s32 ti) {
  u64 bb;

  return bb;
};

void GenMoves(s32 ti) {
  u64 pieces, wpieces, bpieces, apieces, empty, notme; 
  u32 typ, fs, sq;

  pieces = t[ti].pieces[t[ti].wtm];
  wpieces = t[ti].pieces[WHITE];
  bpieces = t[ti].pieces[BLACK];
  apieces = wpieces | bpieces;
  empty = 0xffffffffffffffff ^ apieces;
  notme = 0xffffffffffffffff ^ pieces;

  do {
    _BitScanForward64(&fs, pieces);
    pieces ^= one << fs;
    typ = t[ti].board[fs];
    switch (typ) {
      case OO:
        break;
      case WP:
        typ = wptyp[fs];
        switch (typ) {
          case UNO:
            t[ti].brdbb[fs] = (wpmoves[fs] & empty)
                            | (wpcapts[fs] & bpieces);
            break;
          case TWO:
            _BitScanForward64(&sq, (wpmoves[fs] & apieces));
            t[ti].brdbb[fs] = (wpmoves[fs] & below[sq])
                            | (wpcapts[fs] & bpieces);
            break;
          case CEP:
            t[ti].brdbb[fs] = (wpmoves[fs] & empty)
                            | (wpcapts[fs] & bpieces | t[ti].ebb);
            break;
        } 
        break;
      case WN:
        t[ti].brdbb[fs] = ktmoves[fs] & notme;
        break;
      case WB:
        t[ti].brdbb[fs] = qss[fs][(apieces >> (fs & 56)) & 127][0]
                        & qss[fs][(apieces >> 8)  & 255][1]
                        & qss[fs][(apieces >> 16) & 255][2]
                        & qss[fs][(apieces >> 24) & 255][3]
                        & qss[fs][(apieces >> 32) & 255][4]
                        & qss[fs][(apieces >> 40) & 255][5]
                        & qss[fs][(apieces >> 48) & 255][6]
                        & bob[fs]
                        & notme;
        break;
      case WR:
        t[ti].brdbb[fs] = qss[fs][(apieces >> (fs & 56)) & 127][0]
                        & qss[fs][(apieces >> 8)  & 255][1]
                        & qss[fs][(apieces >> 16) & 255][2]
                        & qss[fs][(apieces >> 24) & 255][3]
                        & qss[fs][(apieces >> 32) & 255][4]
                        & qss[fs][(apieces >> 40) & 255][5]
                        & qss[fs][(apieces >> 48) & 255][6]
                        & rob[fs]
                        & notme;
        break;
      case WQ:
        t[ti].brdbb[fs] = qss[fs][(apieces >> (fs & 56)) & 127][0]
                        & qss[fs][(apieces >>  8) & 255][1]
                        & qss[fs][(apieces >> 16) & 255][2]
                        & qss[fs][(apieces >> 24) & 255][3]
                        & qss[fs][(apieces >> 32) & 255][4]
                        & qss[fs][(apieces >> 40) & 255][5]
                        & qss[fs][(apieces >> 48) & 255][6]
                        & notme;
        break;
      case WC:
        t[ti].brdbb[fs] = (kgmoves[fs] & notme) | wcastle(ti);
        break;
      case WK:
        t[ti].brdbb[fs] = (kgmoves[fs] & notme);
        break;
      case BP:
        typ = bptyp[fs];
        switch (typ) {
          case UNO:
            t[ti].brdbb[fs] = (bpmoves[fs] & empty)
                            | (bpcapts[fs] & bpieces);
            break;
          case TWO:
            _BitScanForward64(&sq, (bpmoves[fs] & apieces));
            t[ti].brdbb[fs] = (bpmoves[fs] & below[sq])
                            | (bpcapts[fs] & bpieces);
            break;
          case CEP:
            t[ti].brdbb[fs] = (bpmoves[fs] & empty)
                            | (bpcapts[fs] & bpieces | t[ti].ebb);
            break;
        }
        break;
      case BN:
        t[ti].brdbb[fs] = ktmoves[fs] & notme;
        break;
      case BB:
        t[ti].brdbb[fs] = qss[fs][(apieces >> (fs & 56)) & 127][0]
                        & qss[fs][(apieces >>  8) & 255][1]
                        & qss[fs][(apieces >> 16) & 255][2]
                        & qss[fs][(apieces >> 24) & 255][3]
                        & qss[fs][(apieces >> 32) & 255][4]
                        & qss[fs][(apieces >> 40) & 255][5]
                        & qss[fs][(apieces >> 48) & 255][6]
                        & bob[fs]
                        & notme;
        break;
      case BR:
        t[ti].brdbb[fs] = qss[fs][(apieces >> (fs & 56)) & 127][0]
                        & qss[fs][(apieces >>  8) & 255][1]
                        & qss[fs][(apieces >> 16) & 255][2]
                        & qss[fs][(apieces >> 24) & 255][3]
                        & qss[fs][(apieces >> 32) & 255][4]
                        & qss[fs][(apieces >> 40) & 255][5]
                        & qss[fs][(apieces >> 48) & 255][6]
                        & rob[fs]
                        & notme;
        break;
      case BQ:
        t[ti].brdbb[fs] = qss[fs][(apieces >> (fs & 56)) & 127][0]
                        & qss[fs][(apieces >>  8) & 255][1]
                        & qss[fs][(apieces >> 16) & 255][2]
                        & qss[fs][(apieces >> 24) & 255][3]
                        & qss[fs][(apieces >> 32) & 255][4]
                        & qss[fs][(apieces >> 40) & 255][5]
                        & qss[fs][(apieces >> 48) & 255][6]
                        & notme;
        break;
      case BC:
        t[ti].brdbb[fs] = (kgmoves[fs] & notme) | bcastle(ti);
        break;
      case BK:
        t[ti].brdbb[fs] = (kgmoves[fs] & notme);
        break;
    }
  } while (pieces);
}

void InitializeQSS() {
  u08 sq, sqr, i;
  u08 x, y, dx, dy;
  u64 bb, b;

  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    for (i = 0; i < 128; i++) {
      bb = 0;
      b = (u64)i << (sq & 56);
      for (dx = +1, dy = +1; x + dx < +8 && y + dy < +8; dx++, dy++) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = -1, dy = +1; x + dx > -1 && y + dy < +8; dx--, dy++) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = +1, dy = -1; x + dx < +8 && y + dy > -1; dx++, dy--) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = -1, dy = -1; x + dx > -1 && y + dy > -1; dx--, dy--) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = -1; x + dx > -1; dx--) {
        sqr = (y << 3) + x + dx;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = +1; x + dx < +8; dx++) {
        sqr = (y << 3) + x + dx;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dy = +1; y + dy < +8; dy++) {
        sqr = ((y + dy) << 3) + x;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dy = -1; y + dy > -1; dy--) {
        sqr = ((y + dy) << 3) + x;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      qss[sq][i][0] = bb;
    }
  }
}

void Initialize() {
  InitializeQSS();
}

s32 main() {
  s32 ti = 0;
  Initialize();
  return 0;
}
EDIT: I'm not sure the init code is the exact correct code. When it is sorted out there will be an update.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

Ras
Posts: 1393
Joined: Tue Aug 30, 2016 6:19 pm
Full name: Rasmus Althoff
Contact:

Re: Rewriting RomiChess from scratch for SMP

Post by Ras » Tue Apr 07, 2020 6:21 pm

Michael Sherwin wrote:
Tue Apr 07, 2020 5:37 pm
This seems efficient as this has to be done on all leaf nodes regardless.
This is horribly inefficient because if you get a fail-high on the first move of a leaf node, then you have done the computation for all the other moves in vain. I'd go with a pseudo legal move generator and do the legality check plus QS only when needed.
Rasmus Althoff
https://www.ct800.net

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Rewriting RomiChess from scratch for SMP

Post by Michael Sherwin » Tue Apr 07, 2020 6:57 pm

Ras wrote:
Tue Apr 07, 2020 6:21 pm
Michael Sherwin wrote:
Tue Apr 07, 2020 5:37 pm
This seems efficient as this has to be done on all leaf nodes regardless.
This is horribly inefficient because if you get a fail-high on the first move of a leaf node, then you have done the computation for all the other moves in vain. I'd go with a pseudo legal move generator and do the legality check plus QS only when needed.
Okay, I understand that. And looking back in my notes I see that I had considered that. The trouble is as soon as I look away from my notes I forget. In my notes my solution is once I have the BB for a piece is to iterate through the bits and call MakeMove() and then call Qsearch from MakeMove to get a score. Then I can bail out. Now I have to think on a redesign to treat interior nodes different than leaf nodes. Does this sound better or should I just abandon the idea? Thanks.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

Ras
Posts: 1393
Joined: Tue Aug 30, 2016 6:19 pm
Full name: Rasmus Althoff
Contact:

Re: Rewriting RomiChess from scratch for SMP

Post by Ras » Tue Apr 07, 2020 7:05 pm

Michael Sherwin wrote:
Tue Apr 07, 2020 6:57 pm
Now I have to think on a redesign to treat interior nodes different than leaf nodes. Does this sound better or should I just abandon the idea? Thanks.
The inefficiency is the same for interior nodes, it's just that there are fewer of them so it won't do as much harm. I still think it's best to compute only what you need when you need it so that you save time in case you find out you don't need it. The only exception to this rule would be wasting a little time on average to improve the worst case behaviour - doing internal iterative deepening would be an example.
Rasmus Althoff
https://www.ct800.net

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Rewriting RomiChess from scratch for SMP

Post by Michael Sherwin » Tue Apr 07, 2020 7:52 pm

I had it in my head that the initialization had been combined into one function. No it is two functions. Added second function. Also redesigned to check for fail high at leaf nodes and return FAILHIGH upon detection. The immediate additional cost is only one compound if statement. That does not seem to be too high of a cost to keep the advantage of a scored list of legal moves for internal move ordering?

Code: Select all

// King Hunter
#include <intrin.h>

#define s08 signed char;
#define u08 unsigned char
#define s32 int
#define u32 unsigned long
#define u64 unsigned long long

#define one 1ull
#define OOO 0
#define FAILHIGH 0
#define OKAY 1

enum { BLACK, WHITE };

enum { OO, WP, WN, WB, WR, WQ, WC, WK, BP, BN, BB, BR, BQ, BC, BK };

enum { UNO, TWO, CEP };

struct threads {
  s32 wtm;
  u64 ebb;
  u64 pieces[2];
  s32 board[64];
};

struct moves {
  u08 fs;
  u08 ts;
  u08 ty;
  u08 id;
  s32 sc;
  s32 st;
};

threads t[32];

s32 wptyp[] = { OOO, OOO, OOO, OOO, OOO, OOO, OOO, OOO,
                TWO, TWO, TWO, TWO, TWO, TWO, TWO, TWO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                CEP, CEP, CEP, CEP, CEP, CEP, CEP, CEP,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO };

s32 bptyp[] = { OOO, OOO, OOO, OOO, OOO, OOO, OOO, OOO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                CEP, CEP, CEP, CEP, CEP, CEP, CEP, CEP,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                UNO, UNO, UNO, UNO, UNO, UNO, UNO, UNO,
                TWO, TWO, TWO, TWO, TWO, TWO, TWO, TWO };

u64 wpmoves[64];
u64 wpcapts[64];
u64 bpmoves[64];
u64 bpcapts[64];
u64 ktmoves[64];
u64 kgmoves[64];
u64 above[64];
u64 below[64];

u64 qss[64][256][8];
u64 bob[64];
u64 rob[64];

u64 wcastle(s32 ti) {
  u64 bb;

  return bb;
};

u64 bcastle(s32 ti) {
  u64 bb;

  return bb;
};

s32 MakeMoveQ(s32 ti, s32 fs, s32 ts, s32 depth) {
  s32 score;

  return score;
}

s32 GenMoves(s32 ti, s32 depth) {
  u64 pieces, wpieces, bpieces, apieces, empty, notme, bb; 
  u32 typ, fs, sq;
  s32 score;
  bool fh;

  pieces = t[ti].pieces[t[ti].wtm];
  wpieces = t[ti].pieces[WHITE];
  bpieces = t[ti].pieces[BLACK];
  apieces = wpieces | bpieces;
  empty = 0xffffffffffffffff ^ (wpieces | bpieces);
  notme = 0xffffffffffffffff ^ pieces;

  do {
    _BitScanForward64(&fs, pieces);
    pieces ^= one << fs;
    typ = t[ti].board[fs];
    switch (typ) {
      case OO:
        break;
      case WP:
        typ = wptyp[fs];
        switch (typ) {
          case UNO:
            bb = (wpmoves[fs] & empty)
               | (wpcapts[fs] & bpieces);
            break;
          case TWO:
            _BitScanForward64(&sq, (wpmoves[fs] & apieces));
            bb = (wpmoves[fs] & below[sq])
               | (wpcapts[fs] & bpieces);
            break;
          case CEP:
            bb = (wpmoves[fs] & empty)
               | (wpcapts[fs] & bpieces | t[ti].ebb);
            break;
        } 
        break;
      case WN:
        bb = ktmoves[fs] & notme;
        break;
      case WB:
        bb = qss[fs][(apieces >> (fs & 56)) & 127][0]
           & qss[fs][(apieces >> 8)  & 255][1]
           & qss[fs][(apieces >> 16) & 255][2]
           & qss[fs][(apieces >> 24) & 255][3]
           & qss[fs][(apieces >> 32) & 255][4]
           & qss[fs][(apieces >> 40) & 255][5]
           & qss[fs][(apieces >> 48) & 255][6]
           & bob[fs]
           & notme;
        break;
      case WR:
        bb = qss[fs][(apieces >> (fs & 56)) & 127][0]
           & qss[fs][(apieces >> 8)  & 255][1]
           & qss[fs][(apieces >> 16) & 255][2]
           & qss[fs][(apieces >> 24) & 255][3]
           & qss[fs][(apieces >> 32) & 255][4]
           & qss[fs][(apieces >> 40) & 255][5]
           & qss[fs][(apieces >> 48) & 255][6]
           & rob[fs]
           & notme;
        break;
      case WQ:
        bb = qss[fs][(apieces >> (fs & 56)) & 127][0]
           & qss[fs][(apieces >>  8) & 255][1]
           & qss[fs][(apieces >> 16) & 255][2]
           & qss[fs][(apieces >> 24) & 255][3]
           & qss[fs][(apieces >> 32) & 255][4]
           & qss[fs][(apieces >> 40) & 255][5]
           & qss[fs][(apieces >> 48) & 255][6]
           & notme;
        break;
      case WC:
        bb = (kgmoves[fs] & notme) | wcastle(ti);
        break;
      case WK:
        bb = (kgmoves[fs] & notme);
        break;
      case BP:
        typ = bptyp[fs];
        switch (typ) {
          case UNO:
            bb = (bpmoves[fs] & empty)
               | (bpcapts[fs] & bpieces);
            break;
          case TWO:
            _BitScanForward64(&sq, (bpmoves[fs] & apieces));
            bb = (bpmoves[fs] & below[sq])
               | (bpcapts[fs] & bpieces);
            break;
          case CEP:
            bb = (bpmoves[fs] & empty)
               | (bpcapts[fs] & bpieces | t[ti].ebb);
            break;
        }
        break;
      case BN:
        bb = ktmoves[fs] & notme;
        break;
      case BB:
        bb = qss[fs][(apieces >> (fs & 56)) & 127][0]
           & qss[fs][(apieces >>  8) & 255][1]
           & qss[fs][(apieces >> 16) & 255][2]
           & qss[fs][(apieces >> 24) & 255][3]
           & qss[fs][(apieces >> 32) & 255][4]
           & qss[fs][(apieces >> 40) & 255][5]
           & qss[fs][(apieces >> 48) & 255][6]
           & bob[fs]
           & notme;
        break;
      case BR:
        bb = qss[fs][(apieces >> (fs & 56)) & 127][0]
           & qss[fs][(apieces >>  8) & 255][1]
           & qss[fs][(apieces >> 16) & 255][2]
           & qss[fs][(apieces >> 24) & 255][3]
           & qss[fs][(apieces >> 32) & 255][4]
           & qss[fs][(apieces >> 40) & 255][5]
           & qss[fs][(apieces >> 48) & 255][6]
           & rob[fs]
           & notme;
        break;
      case BQ:
        bb = qss[fs][(apieces >> (fs & 56)) & 127][0]
           & qss[fs][(apieces >>  8) & 255][1]
           & qss[fs][(apieces >> 16) & 255][2]
           & qss[fs][(apieces >> 24) & 255][3]
           & qss[fs][(apieces >> 32) & 255][4]
           & qss[fs][(apieces >> 40) & 255][5]
           & qss[fs][(apieces >> 48) & 255][6]
           & notme;
        break;
      case BC:
        bb = (kgmoves[fs] & notme) | bcastle(ti);
        break;
      case BK:
        bb = (kgmoves[fs] & notme);
        break;
    }
    while (bb) {
      _BitScanForward64(&sq, bb);
      bb ^= one << sq;
      fh = MakeMoveQ(ti, fs, sq, depth);
      if (fh && depth == 0) return FAILHIGH;
    }
  } while (pieces);
  return OKAY;
}

void InitializeQSS() {
  u08 sq, sqr, k, l;
  u08 x, y, dx, dy;
  s32 i;
  u64 b, bb;

  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    bob[sq] = 0;
    rob[sq] = 0;
    for (i = 0; i < 256; i++) {
      for (k = 0, l = 0; k <= 56; k += 8, l++) {
        bb = 0;
        b = (u64)i << k;
        for (dx = +1, dy = +1; x + dx < +8 && y + dy < +8; dx++, dy++) {
          sqr = (((y + dy) << 3) + x + dx);
          bb |= one << sqr;
          bob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        for (dx = -1, dy = +1; x + dx > -1 && y + dy < +8; dx--, dy++) {
          sqr = (((y + dy) << 3) + x + dx);
          bb |= one << sqr;
          bob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        for (dx = +1, dy = -1; x + dx < +8 && y + dy > -1; dx++, dy--) {
          sqr = (((y + dy) << 3) + x + dx);
          bb |= one << sqr;
          bob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        for (dx = -1, dy = -1; x + dx > -1 && y + dy > -1; dx--, dy--) {
          sqr = (((y + dy) << 3) + x + dx);
          bb |= one << sqr;
          bob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        for (dx = -1; x + dx > -1; dx--) {
          sqr = (y << 3) + x + dx;
          bb |= one << sqr;
          rob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        for (dx = +1; x + dx < +8; dx++) {
          sqr = (y << 3) + x + dx;
          bb |= one << sqr;
          rob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        for (dy = +1; y + dy < +8; dy++) {
          sqr = ((y + dy) << 3) + x;
          bb |= one << sqr;
          rob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        for (dy = -1; y + dy > -1; dy--) {
          sqr = ((y + dy) << 3) + x;
          bb |= one << sqr;
          rob[sq] |= one << sqr;
          if ((one << sqr) & b) break;
        }
        qss[sq][i][l] = bb;
      }
    }
  }
}

void InitializeRNK() {
  u08 sq, sqr, i;
  u08 x, y, dx, dy;
  u64 bb, b;

  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    for (i = 0; i < 128; i++) {
      bb = 0;
      b = (u64)i << (sq & 56);
      for (dx = +1, dy = +1; x + dx < +8 && y + dy < +8; dx++, dy++) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = -1, dy = +1; x + dx > -1 && y + dy < +8; dx--, dy++) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = +1, dy = -1; x + dx < +8 && y + dy > -1; dx++, dy--) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = -1, dy = -1; x + dx > -1 && y + dy > -1; dx--, dy--) {
        sqr = (((y + dy) << 3) + x + dx);
        bb |= one << sqr;
        bob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = -1; x + dx > -1; dx--) {
        sqr = (y << 3) + x + dx;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dx = +1; x + dx < +8; dx++) {
        sqr = (y << 3) + x + dx;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dy = +1; y + dy < +8; dy++) {
        sqr = ((y + dy) << 3) + x;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      for (dy = -1; y + dy > -1; dy--) {
        sqr = ((y + dy) << 3) + x;
        bb |= one << sqr;
        rob[sq] |= one << sqr;
        if ((one << sqr) & b) break;
      }
      qss[sq][i][0] = bb;
    }
  }
}

void Initialize() {
  InitializeQSS();
  InitializeRNK();
}

s32 main() {
  s32 ti = 0;
  Initialize();
  return 0;
}
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

Ras
Posts: 1393
Joined: Tue Aug 30, 2016 6:19 pm
Full name: Rasmus Althoff
Contact:

Re: Rewriting RomiChess from scratch for SMP

Post by Ras » Tue Apr 07, 2020 8:11 pm

Michael Sherwin wrote:
Tue Apr 07, 2020 7:52 pm
That does not seem to be too high of a cost to keep the advantage of a scored list of legal moves for internal move ordering?
Just measure and test. :-)

However, there are also some other things:
#define s08 signed char;
The semicolon is wrong. Actually, the whole define is wrong because you never use define for that, you use typedef. And even that is wrong because there are already standard types for that: uint8_t, int32_t, uint32_t, uint64_t.
#define s32 int
And that's why you use the standard types mentioned above because this isn't guaranteed to do what it's supposed to do. The C standard specifies only that ints must be able to at least store the value range from -32768 to 32767. Means, the compiler is free to implement int as anything with at least 16 bits. That can be 32 bits, or even 64 bits. By contrast, int32_t gives you a 32 bit integer, guaranteed.

These standard types are available via the include file <inttypes.h>.
Rasmus Althoff
https://www.ct800.net

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Rewriting RomiChess from scratch for SMP

Post by Michael Sherwin » Tue Apr 07, 2020 8:20 pm

Ras wrote:
Tue Apr 07, 2020 8:11 pm
Michael Sherwin wrote:
Tue Apr 07, 2020 7:52 pm
That does not seem to be too high of a cost to keep the advantage of a scored list of legal moves for internal move ordering?
Just measure and test. :-)

However, there are also some other things:
#define s08 signed char;
The semicolon is wrong. Actually, the whole define is wrong because you never use define for that, you use typedef. And even that is wrong because there are already standard types for that: uint8_t, int32_t, uint32_t, uint64_t.
#define s32 int
And that's why you use the standard types mentioned above because this isn't guaranteed to do what it's supposed to do. The C standard specifies only that ints must be able to at least store the value range from -32768 to 32767. Means, the compiler is free to implement int as anything with at least 16 bits. That can be 32 bits, or even 64 bits. By contrast, int32_t gives you a 32 bit integer, guaranteed.

These standard types are available via the include file <inttypes.h>.
Since it will be open source and someone might be using a different compiler with different integer size I will use the standard. Thanks!
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 1:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Rewriting RomiChess from scratch for SMP

Post by Michael Sherwin » Tue Apr 07, 2020 8:49 pm

The MSVC compiler has its inconsistencies. _BitScanForward64() requires an & unsigned long to store the index. When typefdef uint32_t u32; is used it flags it as an error and typedef unsigned long u32 it likes. Also typedef was not working in multi file source. Instead of #defines the compiler wanted it to be converted to constant expression and that also was flagged as an error. That is why I was just using #define. I'm not a youngster that needs to be taught the right way. I'm an old dog that learned it one way and as long as what I know still works I'm happy with it, lol. I can learn new tricks for about 5 minutes then I forget. For right now I have switched to using typedef but when I break the program into multi files if typedef stops working I'll be going back to using #define.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through

mar
Posts: 2122
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Rewriting RomiChess from scratch for SMP

Post by mar » Thu Apr 09, 2020 5:40 am

Ras wrote:
Tue Apr 07, 2020 6:21 pm
Michael Sherwin wrote:
Tue Apr 07, 2020 5:37 pm
This seems efficient as this has to be done on all leaf nodes regardless.
This is horribly inefficient because if you get a fail-high on the first move of a leaf node, then you have done the computation for all the other moves in vain. I'd go with a pseudo legal move generator and do the legality check plus QS only when needed.
Why "horribly inefficient"? :) Perhaps all your nodes are cut nodes?
Most of your nodes are in QS where (except for evasions) you only care about captures.
Demolito is a very strong engine and uses full legal movegen. I assume that's because it's "horribly inefficient".
So some prefer legal over staged (this is what you meant) to simplify code. Not a big deal, as strength really comes from elsewhere.
Martin Sedlak

Ras
Posts: 1393
Joined: Tue Aug 30, 2016 6:19 pm
Full name: Rasmus Althoff
Contact:

Re: Rewriting RomiChess from scratch for SMP

Post by Ras » Thu Apr 09, 2020 9:31 am

mar wrote:
Thu Apr 09, 2020 5:40 am
Demolito is a very strong engine and uses full legal movegen.
Are you sure that Demolito really calls QS on every move right after generation?
Rasmus Althoff
https://www.ct800.net

Post Reply