Help me destroy the universe

Discussion of chess software programming and technical issues.

Moderator: Ras

Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Help me destroy the universe

Post by Mike Sherwin »

There is one thing standing in my way. Rather than try to explain and confuse everyone I just show what I need done.

This is my move generator for the bishop.

Code: Select all

    case WB:
    case BB:
      blockers = aPieces & bob[fs];
      bb = bss[fs][(blockers >> 9) &  63][0]
         & bss[fs][(blockers >> 17) & 63][1]
         & bss[fs][(blockers >> 25) & 63][2]
         & bss[fs][(blockers >> 33) & 63][3]
         & bss[fs][(blockers >> 41) & 63][4]
         & bss[fs][(blockers >> 49) & 63][5]
         & notme;
      break;
It works with occupancy instead of blockers but I'm showing blockers here.

This is the initialization code.

Code: Select all

void InitializeBSS() {
  u08 sq, sqr, i, j, k, l;
  s08 x, dx, y, dy;
  u64 b, bb;

  for (sq = 0; sq < 64; sq++) {
    y = sq >> 3;
    x = sq & 7;
    for (i = 0; i < 128; i++) {
      if (i ^ 1) {
        j = i >> 1;
        for (k = 8, l = 0; k <= 48; k += 8, l++) {
          bb = 0;
          b = (u64)i << k;
          for (dx = +1, dy = +1; x + dx < +8 && y + dy < +8; dx++, dy++) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= one << sqr;
            if ((one << sqr) & b) break;
          }
          for (dx = -1, dy = +1; x + dx > -1 && y + dy < +8; dx--, dy++) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= one << sqr;
            if ((one << sqr) & b) break;
          }
          for (dx = +1, dy = -1; x + dx < +8 && y + dy > -1; dx++, dy--) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= one << sqr;
            if ((one << sqr) & b) break;
          }
          for (dx = -1, dy = -1; x + dx > -1 && y + dy > -1; dx--, dy--) {
            sqr = (((y + dy) << 3) + x + dx);
            bb |= one << sqr;
            if ((one << sqr) & b) break;
          }
          bss[sq][j][l] = bb;
        }
      }
    }
  }
}
It allows occupancy or blockers. I want it to just work with blockers. The reason is when I fold the occupancy down to two ranks occupancy causes collisions. Just storing supersets based only on blockers does not cause collisions. a bishop on e4.
00000000
0x000000
00x000x0
000x0x00
00000000
000x0x00
00x000x0
00000000

After shifting upper 32 bits down by 24.
0x000000
00xx0xx0
00xx0xx0
00000000

After shifting rank 4 down.
00000000
0xxx0xx0
00xx0xx0
00000000
All the bits share two ranks without collisions and therefor the bishop code would look like this.

Code: Select all

    case WB:
    case BB:
      blockers = aPieces & bob[fs];
      // fold the blockers down to two ranks
      bb = bss[fs][(blockers >> 9) &  63][0]
         & bss[fs][(blockers >> 17) & 63][1]
         & notme;
      break;
So a couple cheap folds and two lookups anded together. Final table 64 sq x 64 index x 4 bytes x 2 ranks = 32,768 bytes.

Will that be faster than magic bishops? If so then help me end the universe and produce something faster than magics, lol.
mar
Posts: 2667
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Help me destroy the universe

Post by mar »

I assume the bob table is what magics call relevant occupancy mask? Unless I'm mistaken, couldn't notme be baked into bss?
I mean notme is ~(1ull << fs), simply clearing the fs bit.

I was also thinking why hybrid SISSY queen + bishop/rook magics was actually slower, so I wonder if doing combined SISSY bishop + rook might be actually better? I still don't see how it helps, because if a bishop captures queen then occupancy changes (I thought that maybe splitting queen attacks may be more cache friendly, but I don't see how) - the funny part is that splitting queen attacks in magics is mandatory while in SISSY this is optional.

fingers crossed to make the new SISSY bishops work, 32kb LUT is pretty good, if it works. if you can solve this and provide the init code, I'm eager to test
mar
Posts: 2667
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Help me destroy the universe

Post by mar »

Mike Sherwin wrote: Thu Feb 25, 2021 12:26 am

Code: Select all

    case WB:
    case BB:
      blockers = aPieces & bob[fs];
      // fold the blockers down to two ranks
      bb = bss[fs][(blockers >> 9) &  63][0]
         & bss[fs][(blockers >> 17) & 63][1]
         & notme;
      break;
hmm, I'm probably missing something but wouldn't generating the final blocker mask involve more work? like more folding you describe above?
I mean, masking is fine, but the lookups only consider lower n bits, does it mean that blockers would actually require to maintain extra pre-folded bitboards?
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Help me destroy the universe

Post by Mike Sherwin »

mar wrote: Thu Feb 25, 2021 10:52 am
Mike Sherwin wrote: Thu Feb 25, 2021 12:26 am

Code: Select all

    case WB:
    case BB:
      blockers = aPieces & bob[fs];
      // fold the blockers down to two ranks
      bb = bss[fs][(blockers >> 9) &  63][0]
         & bss[fs][(blockers >> 17) & 63][1]
         & notme;
      break;
hmm, I'm probably missing something but wouldn't generating the final blocker mask involve more work? like more folding you describe above?
I mean, masking is fine, but the lookups only consider lower n bits, does it mean that blockers would actually require to maintain extra pre-folded bitboards?
I don't know from where the index to pre store the folded blockers would come from. The blockers once loaded are in a register. It would be very fast to make a couple of folds on a register to compress them down to two ranks to save 4 lookups and 4 ands. I have the vision but not the skill to do the initialization. Maybe I was not clear enough in my explanation. Let's just imagine one fold and three ranks. The fold would be as easy as shifting the upper 32 bits up by 8 and oring them with the lower 32 bits. That would save 3 lookups and 3 ands and those extra 3 shifts and the 3 & 63's, That is a lot of savings for one simple fold. BUT, the initialization has to match. Rank 2 and rank 5 supersets would need to be combined in rank 2 and the same for 3 & 6 as well as 4 & 7. That is the part I am asking help with. A two rank lookup would be great but a three rank lookup would probably be okay as well. But it would be a 48k byte in tables. All that needs done is the initialization. The rest is easy peasey. Sorry I'm tired. Time for sleep.