I'm buggy. About a week ago I came down with a bad case of the shingles on my back and stomach. The pain is excruciating. Again I used & instead of | this time in the initialization.dangi12012 wrote: ↑Fri Dec 10, 2021 1:09 pmThe code is still buggy - i use this to print a bitboard and can compare reference vs output:But there seems to be a bug: (after initialize was called!)Code: Select all
static std::string _map(uint64_t value) { static std::string str(64 + 8, 'o'); for (uint64_t i = 0, c = 0; i < 64; i++) { uint64_t bitmask = (1ull) << i; if ((bitmask & value) != 0) str[c++] = 'X'; else str[c++] = '.'; if ((i + 1) % 8 == 0) str[c++] = '\n'; } return str; }
OCC = 6822289156618217916, pos = 2also think of using _BitScanForward64 -> std::countr_zero. So its compatible with gcc and clang.Code: Select all
Compare_Engines: ..XXXX.X X..X.XX. .X.XXX.. XXX....X XXX.X.XX XX...X.X X.XX.X.X .XXXX.X. Reference XX.X.... .XXX.... X.X..... ..X..... ........ ........ ........ ........ Bob: XXX..... XXXX...X X.X.X.X. X.XX.X.X X.X.X.X. X.XX.X.X X.X.X.X. XXXX...X
Greetings!
Combining two of Bob's classic bitboard attack getters
Moderator: Ras
-
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Combining two of Bob's classic bitboard attack getters
-
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Combining two of Bob's classic bitboard attack getters
I think that the initialization is working now. I have to rest awhile I'll try do do more latter.
Code: Select all
// Bob.Mike
// Example of Robert Hyatt's and Michael Sherwin's classical bitboard approach
// to generate moves for the sliding pieces.
#include <cstdint>
#include <intrin.h>
enum { FILE1, FILE2, FILE3, FILE4, FILE5, FILE6, FILE7, FILE8 };
enum { RANK1, RANK2, RANK3, RANK4, RANK5, RANK6, RANK7, RANK8 };
struct Ray {
uint64_t rayNW;
uint64_t rayNN;
uint64_t rayNE;
uint64_t rayEE;
uint64_t raySE;
uint64_t raySS;
uint64_t raySW;
uint64_t rayWW;
uint64_t rwsNW;
uint64_t rwsNN;
uint64_t rwsNE;
uint64_t rwsEE;
uint64_t rwsSE;
uint64_t rwsSS;
uint64_t rwsSW;
uint64_t rwsWW;
};
Ray ray[64];
uint64_t Queen(int sq, uint64_t occ) {
unsigned long iNW, iNN, iNE, iEE, iSE, iSS, iSW, iWW;
uint64_t r1, r2, r3, r4, r5, r6;
r1 = ray[sq].rwsNW & occ;
r2 = ray[sq].rwsNN & occ;
r3 = ray[sq].rwsNE & occ;
r4 = ray[sq].rwsEE & occ;
_BitScanForward64(&iNW, r1);
_BitScanForward64(&iNN, r2);
_BitScanForward64(&iNE, r3);
_BitScanForward64(&iEE, r4);
r1 = ray[sq].rayNW ^ ray[iNW].rayNW;
r2 = ray[sq].rayNN ^ ray[iNN].rayNN;
r3 = ray[sq].rayNE ^ ray[iNE].rayNE;
r4 = ray[sq].rayEE ^ ray[iEE].rayEE;
r5 = r1 | r2;
r6 = r3 | r4;
r1 = ray[sq].rwsSE & occ;
r2 = ray[sq].rwsSS & occ;
r3 = ray[sq].rwsSW & occ;
r4 = ray[sq].rwsWW & occ;
_BitScanReverse64(&iSE, r1);
_BitScanReverse64(&iSS, r2);
_BitScanReverse64(&iSW, r3);
_BitScanReverse64(&iWW, r4);
r1 = ray[sq].raySE ^ ray[iSE].raySE;
r2 = ray[sq].raySS ^ ray[iSS].raySS;
r3 = ray[sq].raySW ^ ray[iSW].raySW;
r4 = ray[sq].rayWW ^ ray[iWW].rayWW;
r1 = r1 | r3;
r2 = r2 | r4;
r3 = r5 | r6;
return r1 | r2 | r3;
}
void Initialize() {
int sq, ts, file, rank, c;
for (sq = 0; sq <= 63; sq++) {
file = sq & 7;
rank = sq >> 3;
// Northwest
ray[sq].rayNW = 0;
for (c = 1, ts = sq + 7; file - c >= FILE1 && rank + c <= RANK8; c++, ts += 7) ray[sq].rayNW |= 1ull << ts;
ray[sq].rwsNW = ray[sq].rayNW | 0x8000000000000000;
// Northeast
ray[sq].rayNE = 0;
for (c = 1, ts = sq + 9; file + c <= FILE8 && rank + c <= RANK8; c++, ts += 9) ray[sq].rayNE |= 1ull << ts;
ray[sq].rwsNE = ray[sq].rayNE | 0x8000000000000000;
// Southeast
ray[sq].raySE = 0;
for (c = 1, ts = sq - 7; file + c <= FILE8 && rank - c >= RANK1; c++, ts -= 7) ray[sq].raySE |= 1ull << ts;
ray[sq].rwsSE = ray[sq].raySE | 0x0000000000000001;
// Southwest
ray[sq].raySW = 0;
for (c = 1, ts = sq - 9; file - c >= FILE1 && rank - c >= RANK1; c++, ts -= 9) ray[sq].raySW |= 1ull << ts;
ray[sq].rwsSW = ray[sq].raySW | 0x0000000000000001;
// North
ray[sq].rayNN = 0;
for (c = 1, ts = sq + 8; rank + c <= RANK8; c++, ts += 8) ray[sq].rayNN |= 1ull << ts;
ray[sq].rwsNN = ray[sq].rayNN | 0x8000000000000000;
// East
ray[sq].rayEE = 0;
for (c = 1, ts = sq + 1; file + c <= FILE8; c++, ts += 1) ray[sq].rayEE |= 1ull << ts;
ray[sq].rwsEE = ray[sq].rayEE | 0x8000000000000000;
// South
ray[sq].raySS = 0;
for (c = 1, ts = sq - 8; rank - c >= RANK1; c++, ts -= 8) ray[sq].raySS |= 1ull << ts;
ray[sq].rwsSS = ray[sq].raySS | 0x0000000000000001;
// West
ray[sq].rayWW = 0;
for (c = 1, ts = sq - 1; file - c >= FILE1; c++, ts -= 1) ray[sq].rayWW |= 1ull << ts;
ray[sq].rwsWW = ray[sq].rayWW | 0x0000000000000001;
}
}
int main() {
Initialize();
}-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Combining two of Bob's classic bitboard attack getters
Do the resting first. Its important for a clear mind. Still a bug somewhere:Mike Sherwin wrote: ↑Fri Dec 10, 2021 5:36 pm I think that the initialization is working now. I have to rest awhile I'll try do do more latter.
Code: Select all
BLOCKERS:
..XXXX.X
X..X.XX.
.X.XXX..
XXX....X
XXX.X.XX
XX...X.X
X.XX.X.X
.XXXX.X.
Reference queen:
.....XX.
......XX
.......X
.......X
........
........
........
........
Bob queen:
.XXXXXXX
.X....XX
..X....X
...X...X
....X...
.....X..
......X.
.......X
Code: Select all
r1 = ray[sq].raySE ^ ray[iSE].raySE;
r2 = ray[sq].raySS ^ ray[iSS].raySS;
r3 = ray[sq].raySW ^ ray[iSW].raySW;
r4 = ray[sq].rayWW ^ ray[iWW].rayWWWorlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Combining two of Bob's classic bitboard attack getters
The code can also be improved a little but by removing the RANKS from the loop and directly calculating "how many steps you can take in all 8 directions and looping for this many times"
which is dirc[x] in this snippet. idx0 = east and idx1 = northeast idx2 = ...
Directly calculate dircount:
Usage like that:
which is dirc[x] in this snippet. idx0 = east and idx1 = northeast idx2 = ...
Directly calculate dircount:
Code: Select all
void atk(int sq)
int x = sq & 7; int y = sq >> 3;
int dirc[8]; int o[8] = { 1, -7, -8, -9, -1, 7, 8, 9 };
int EA = dirc[0] = std::max(7 - x, 0); int WE = dirc[4] = x;
int SO = dirc[6] = std::max(7 - y, 0); int NO = dirc[2] = y;
int SE = dirc[7] = std::min(EA, SO); int NW = dirc[3] = std::min(WE, NO);
int SW = dirc[5] = std::min(WE, SO); int NE = dirc[1] = std::min(EA, NO); Code: Select all
for (int off = 0; off < dirc[dir]; off++) //dir 0...8
{
int index = sq + (off + 1) * o[dir]);
}Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
Mike Sherwin
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Combining two of Bob's classic bitboard attack getters
It will take me some time to understand that code. I slept in today and I have no idea when I got up. Anyway it seems to be working perfectly. The demo uses Raylib 4 (nuget package) to put a graphics board on the screen. The instructions are simple. Left click to add/remove the queen. Right click to add/remove blockers. Here is the MSVC 2022 code.dangi12012 wrote: ↑Fri Dec 10, 2021 9:52 pm The code can also be improved a little but by removing the RANKS from the loop and directly calculating "how many steps you can take in all 8 directions and looping for this many times"
which is dirc[x] in this snippet. idx0 = east and idx1 = northeast idx2 = ...
Directly calculate dircount:Usage like that:Code: Select all
void atk(int sq) int x = sq & 7; int y = sq >> 3; int dirc[8]; int o[8] = { 1, -7, -8, -9, -1, 7, 8, 9 }; int EA = dirc[0] = std::max(7 - x, 0); int WE = dirc[4] = x; int SO = dirc[6] = std::max(7 - y, 0); int NO = dirc[2] = y; int SE = dirc[7] = std::min(EA, SO); int NW = dirc[3] = std::min(WE, NO); int SW = dirc[5] = std::min(WE, SO); int NE = dirc[1] = std::min(EA, NO);Code: Select all
for (int off = 0; off < dirc[dir]; off++) //dir 0...8 { int index = sq + (off + 1) * o[dir]); }
Code: Select all
// Bob.Mike
// Example of Robert Hyatt's and Michael Sherwin's classical bitboard approach
// to generate moves for the sliding pieces.
#include <cstdint>
#include <intrin.h>
#include "raylib.h"
enum { FILE1, FILE2, FILE3, FILE4, FILE5, FILE6, FILE7, FILE8 };
enum { RANK1, RANK2, RANK3, RANK4, RANK5, RANK6, RANK7, RANK8 };
struct Rays {
uint64_t rayNW;
uint64_t rayNN;
uint64_t rayNE;
uint64_t rayEE;
uint64_t raySE;
uint64_t raySS;
uint64_t raySW;
uint64_t rayWW;
uint64_t rwsNW;
uint64_t rwsNN;
uint64_t rwsNE;
uint64_t rwsEE;
uint64_t rwsSE;
uint64_t rwsSS;
uint64_t rwsSW;
uint64_t rwsWW;
};
Rays ray[64];
Vector2 mousePoint = { 0.0f, 0.0f };
Rectangle chessBoard[64] = { 0 };
int queen = -1, blockers[64] = { 0 }, attacks[64] = { 0 };
Color color[2] = {
RAYWHITE,
DARKGREEN
};
uint64_t Queen(int sq, uint64_t occ) {
unsigned long iNW, iNN, iNE, iEE, iSE, iSS, iSW, iWW;
uint64_t r1, r2, r3, r4, r5, r6;
occ |= 0x8000000000000001;
r1 = ray[sq].rwsNW & occ;
r2 = ray[sq].rwsNN & occ;
r3 = ray[sq].rwsNE & occ;
r4 = ray[sq].rwsEE & occ;
_BitScanForward64(&iNW, r1);
_BitScanForward64(&iNN, r2);
_BitScanForward64(&iNE, r3);
_BitScanForward64(&iEE, r4);
r1 = ray[sq].rayNW ^ ray[iNW].rayNW;
r2 = ray[sq].rayNN ^ ray[iNN].rayNN;
r3 = ray[sq].rayNE ^ ray[iNE].rayNE;
r4 = ray[sq].rayEE ^ ray[iEE].rayEE;
r5 = r1 | r2;
r6 = r3 | r4;
r1 = ray[sq].rwsSE & occ;
r2 = ray[sq].rwsSS & occ;
r3 = ray[sq].rwsSW & occ;
r4 = ray[sq].rwsWW & occ;
_BitScanReverse64(&iSE, r1);
_BitScanReverse64(&iSS, r2);
_BitScanReverse64(&iSW, r3);
_BitScanReverse64(&iWW, r4);
r1 = ray[sq].raySE ^ ray[iSE].raySE;
r2 = ray[sq].raySS ^ ray[iSS].raySS;
r3 = ray[sq].raySW ^ ray[iSW].raySW;
r4 = ray[sq].rayWW ^ ray[iWW].rayWW;
r1 = r1 | r3;
r2 = r2 | r4;
r3 = r5 | r6;
return r1 | r2 | r3;
}
void Initialize() {
int sq, ts, file, rank, c;
for (sq = 0; sq <= 63; sq++) {
file = sq & 7;
rank = sq >> 3;
// Northwest
ray[sq].rayNW = 0;
for (c = 1, ts = sq + 7; file - c >= FILE1 && rank + c <= RANK8; c++, ts += 7) ray[sq].rayNW |= 1ull << ts;
ray[sq].rwsNW = ray[sq].rayNW | 0x8000000000000000;
// Northeast
ray[sq].rayNE = 0;
for (c = 1, ts = sq + 9; file + c <= FILE8 && rank + c <= RANK8; c++, ts += 9) ray[sq].rayNE |= 1ull << ts;
ray[sq].rwsNE = ray[sq].rayNE | 0x8000000000000000;
// Southeast
ray[sq].raySE = 0;
for (c = 1, ts = sq - 7; file + c <= FILE8 && rank - c >= RANK1; c++, ts -= 7) ray[sq].raySE |= 1ull << ts;
ray[sq].rwsSE = ray[sq].raySE | 0x0000000000000001;
// Southwest
ray[sq].raySW = 0;
for (c = 1, ts = sq - 9; file - c >= FILE1 && rank - c >= RANK1; c++, ts -= 9) ray[sq].raySW |= 1ull << ts;
ray[sq].rwsSW = ray[sq].raySW | 0x0000000000000001;
// North
ray[sq].rayNN = 0;
for (c = 1, ts = sq + 8; rank + c <= RANK8; c++, ts += 8) ray[sq].rayNN |= 1ull << ts;
ray[sq].rwsNN = ray[sq].rayNN | 0x8000000000000000;
// East
ray[sq].rayEE = 0;
for (c = 1, ts = sq + 1; file + c <= FILE8; c++, ts += 1) ray[sq].rayEE |= 1ull << ts;
ray[sq].rwsEE = ray[sq].rayEE | 0x8000000000000000;
// South
ray[sq].raySS = 0;
for (c = 1, ts = sq - 8; rank - c >= RANK1; c++, ts -= 8) ray[sq].raySS |= 1ull << ts;
ray[sq].rwsSS = ray[sq].raySS | 0x0000000000000001;
// West
ray[sq].rayWW = 0;
for (c = 1, ts = sq - 1; file - c >= FILE1; c++, ts -= 1) ray[sq].rayWW |= 1ull << ts;
ray[sq].rwsWW = ray[sq].rayWW | 0x0000000000000001;
}
}
void InitGraphics() {
InitWindow(600, 600, "Queen Move Generator Test");
SetTargetFPS(60);
for (int i = 0; i <= 63; i++) {
chessBoard[i].x = 60.0f + 60.0f * (i % 8);
chessBoard[i].y = 60.0f + 60.0f * (i / 8);
chessBoard[i].width = 60.0f;
chessBoard[i].height = 60.0f;
}
}
int main() {
int i, x, y, f, r, sq, c, get;
uint64_t b, occ;
unsigned long index;
Initialize();
InitGraphics();
while (!WindowShouldClose()) {
mousePoint = GetMousePosition();
sq = -1;
get = 0;
for (i = 0; i <= 63; i++) {
if (CheckCollisionPointRec(mousePoint, chessBoard[i])) sq = i;
}
if (sq >= 0) {
if (IsMouseButtonReleased(0) && !blockers[sq]) {
if (queen == -1) {
queen = sq;
get = 1;
}
else if (queen == sq) {
queen = -1;
for (i = 0; i <= 63; i++) {
attacks[i] = 0;
blockers[i] = 0;
}
}
}
if (IsMouseButtonReleased(1) && queen != -1 && sq != queen) {
if (!blockers[sq]) blockers[sq] = 1;
else blockers[sq] = 0;
get = 1;
}
}
if (get) {
occ = 0;
for (i = 0; i <= 63; i++) {
attacks[i] = 0;
if (blockers[i])
occ |= 1ull << i;
}
b = Queen(queen, occ);
while (b) {
_BitScanForward64(&index, b);
b ^= 1ull << index;
attacks[index] = 1;
}
}
BeginDrawing();
ClearBackground(BLACK);
for (i = 0; i <= 63; i++) {
f = i & 7;
r = i >> 3;
c = (r & 1) ^ !(f & 1);
DrawRectangleRec(chessBoard[i], color[c]);
x = (int)chessBoard[i].x;
y = (int)chessBoard[i].y;
if (queen == i) DrawRectangle(x + 20, y + 20, 20, 20, BEIGE);
if (blockers[i]) DrawRectangle(x + 20, y + 20, 20, 20, RED);
if (attacks[i]) DrawCircle(x + 30, y + 30, 6.0f, GREEN);
}
EndDrawing();
}
}Here is just the executable. https://www.mediafire.com/file/d91vn75q ... e.exe/file
-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Combining two of Bob's classic bitboard attack getters
There we go! - Now its 100% correct and compatible with all sliding lookups:Mike Sherwin wrote: ↑Sat Dec 11, 2021 4:51 am }[/code]
Here is just the executable. https://www.mediafire.com/file/d91vn75q ... e.exe/file
Code: Select all
Megalookups/s:
Reference: 117.90MOps
KoggeStone: 112.60MOps
BobMike: 120.94MOps
FancyHash: 500.44MOps
Pext : 863.54MOps
HyperLookup: 911.47MOps
I have the templates ready - need to compile one executable per Algorithm.
Also I will add all the algos that Harald provided!
Also bob assembly is of interest to me.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Combining two of Bob's classic bitboard attack getters
Update on Bobs lookup method
the non portable _bitscanforward64 made it much slower than the algo really is. I made the initializsation consteval and now it compatible with gcc/clang.
Is 2x than before which is really good because it only takes 8Kib of lookup and thus fits inside L1. Also its 2x faster than Kogge-Stone Maybe we can optimize it more?
Produces 244.82MOps performance on queen lookups:
With the assembly here:
https://godbolt.org/z/qdbsYdnWY
the non portable _bitscanforward64 made it much slower than the algo really is. I made the initializsation consteval and now it compatible with gcc/clang.
Is 2x than before which is really good because it only takes 8Kib of lookup and thus fits inside L1. Also its 2x faster than Kogge-Stone Maybe we can optimize it more?
Code: Select all
#include <cstdint>
#include <bit>
#include <array>
struct Rays {
uint64_t rayNW;
uint64_t rayNN;
uint64_t rayNE;
uint64_t rayEE;
uint64_t raySE;
uint64_t raySS;
uint64_t raySW;
uint64_t rayWW;
uint64_t rwsNW;
uint64_t rwsNN;
uint64_t rwsNE;
uint64_t rwsEE;
uint64_t rwsSE;
uint64_t rwsSS;
uint64_t rwsSW;
uint64_t rwsWW;
};
consteval std::array<Rays,64> Initialize() {
enum { FILE1, FILE2, FILE3, FILE4, FILE5, FILE6, FILE7, FILE8 };
enum { RANK1, RANK2, RANK3, RANK4, RANK5, RANK6, RANK7, RANK8 };
int sq, ts, file, rank, c;
std::array<Rays, 64> ray{};
for (sq = 0; sq <= 63; sq++) {
file = sq & 7;
rank = sq >> 3;
// Northwest
ray[sq].rayNW = 0;
for (c = 1, ts = sq + 7; file - c >= FILE1 && rank + c <= RANK8; c++, ts += 7) ray[sq].rayNW |= 1ull << ts;
ray[sq].rwsNW = ray[sq].rayNW | 0x8000000000000000;
// Northeast
ray[sq].rayNE = 0;
for (c = 1, ts = sq + 9; file + c <= FILE8 && rank + c <= RANK8; c++, ts += 9) ray[sq].rayNE |= 1ull << ts;
ray[sq].rwsNE = ray[sq].rayNE | 0x8000000000000000;
// Southeast
ray[sq].raySE = 0;
for (c = 1, ts = sq - 7; file + c <= FILE8 && rank - c >= RANK1; c++, ts -= 7) ray[sq].raySE |= 1ull << ts;
ray[sq].rwsSE = ray[sq].raySE | 0x0000000000000001;
// Southwest
ray[sq].raySW = 0;
for (c = 1, ts = sq - 9; file - c >= FILE1 && rank - c >= RANK1; c++, ts -= 9) ray[sq].raySW |= 1ull << ts;
ray[sq].rwsSW = ray[sq].raySW | 0x0000000000000001;
// North
ray[sq].rayNN = 0;
for (c = 1, ts = sq + 8; rank + c <= RANK8; c++, ts += 8) ray[sq].rayNN |= 1ull << ts;
ray[sq].rwsNN = ray[sq].rayNN | 0x8000000000000000;
// East
ray[sq].rayEE = 0;
for (c = 1, ts = sq + 1; file + c <= FILE8; c++, ts += 1) ray[sq].rayEE |= 1ull << ts;
ray[sq].rwsEE = ray[sq].rayEE | 0x8000000000000000;
// South
ray[sq].raySS = 0;
for (c = 1, ts = sq - 8; rank - c >= RANK1; c++, ts -= 8) ray[sq].raySS |= 1ull << ts;
ray[sq].rwsSS = ray[sq].raySS | 0x0000000000000001;
// West
ray[sq].rayWW = 0;
for (c = 1, ts = sq - 1; file - c >= FILE1; c++, ts -= 1) ray[sq].rayWW |= 1ull << ts;
ray[sq].rwsWW = ray[sq].rayWW | 0x0000000000000001;
}
return ray;
}
constexpr std::array<Rays, 64> ray = Initialize();
constexpr auto size = sizeof(ray);
uint64_t Queen(int sq, uint64_t occ) {
unsigned long iNW, iNN, iNE, iEE, iSE, iSS, iSW, iWW = 0;
uint64_t r1, r2, r3, r4, r5, r6;
occ |= 0x8000000000000001;
r1 = ray[sq].rwsNW & occ;
r2 = ray[sq].rwsNN & occ;
r3 = ray[sq].rwsNE & occ;
r4 = ray[sq].rwsEE & occ;
iNW = std::countr_zero(r1);
iNN = std::countr_zero(r2);
iNE = std::countr_zero(r3);
iEE = std::countr_zero(r4);
r1 = ray[sq].rayNW ^ ray[iNW].rayNW;
r2 = ray[sq].rayNN ^ ray[iNN].rayNN;
r3 = ray[sq].rayNE ^ ray[iNE].rayNE;
r4 = ray[sq].rayEE ^ ray[iEE].rayEE;
r5 = r1 | r2;
r6 = r3 | r4;
r1 = ray[sq].rwsSE & occ;
r2 = ray[sq].rwsSS & occ;
r3 = ray[sq].rwsSW & occ;
r4 = ray[sq].rwsWW & occ;
iSE = 63 - std::countl_zero(r1);
iSS = 63 - std::countl_zero(r2);
iSW = 63 - std::countl_zero(r3);
iWW = 63 - std::countl_zero(r4);
r1 = ray[sq].raySE ^ ray[iSE].raySE;
r2 = ray[sq].raySS ^ ray[iSS].raySS;
r3 = ray[sq].raySW ^ ray[iSW].raySW;
r4 = ray[sq].rayWW ^ ray[iWW].rayWW;
r1 = r1 | r3;
r2 = r2 | r4;
r3 = r5 | r6;
return r1 | r2 | r3;
}Code: Select all
Megalookups/s:
Reference: 118.25MOps
KoggeStone: 112.38MOps
BobMike: 244.82MOps
FancyHash: 506.22MOps
Pext : 852.53MOps
HyperLookup: 926.42MOpsWith the assembly here:
https://godbolt.org/z/qdbsYdnWY
Code: Select all
Queen(int, unsigned long): # @Queen(int, unsigned long)
push r14
push rbx
movsxd rax, edi
movabs rcx, -9223372036854775807
shl rax, 7
or rcx, rsi
mov rsi, qword ptr [rax + ray+72]
mov rdi, qword ptr [rax + ray+80]
mov r8, qword ptr [rax + ray+64]
mov rdx, qword ptr [rax + ray+88]
mov rbx, qword ptr [rax + ray+96]
and rsi, rcx
and rdi, rcx
and r8, rcx
and rdx, rcx
and rbx, rcx
tzcnt r9, rsi
tzcnt r10, rdi
mov rsi, qword ptr [rax + ray+104]
mov rdi, qword ptr [rax + ray+112]
tzcnt r11, rdx
lzcnt r14, rbx
mov edx, 63
mov ebx, 63
tzcnt r8, r8
sub rdx, r14
shl rdx, 7
shl r10, 7
shl r11, 7
shl r8, 7
shl r9, 7
vmovq xmm2, qword ptr [rdx + ray+32] # xmm2 = mem[0],zero
vmovq xmm3, qword ptr [r8 + ray] # xmm3 = mem[0],zero
and rsi, rcx
and rdi, rcx
and rcx, qword ptr [rax + ray+120]
lzcnt r14, rsi
mov esi, 63
sub rsi, r14
lzcnt r14, rdi
mov edi, 63
sub rdi, r14
shl rsi, 7
shl rdi, 7
lzcnt rcx, rcx
vmovq xmm1, qword ptr [rdi + ray+48] # xmm1 = mem[0],zero
sub rbx, rcx
shl rbx, 7
vmovq xmm0, qword ptr [rbx + ray+56] # xmm0 = mem[0],zero
vpunpcklqdq xmm0, xmm1, xmm0 # xmm0 = xmm1[0],xmm0[0]
vmovq xmm1, qword ptr [rsi + ray+40] # xmm1 = mem[0],zero
vpunpcklqdq xmm1, xmm2, xmm1 # xmm1 = xmm2[0],xmm1[0]
vmovq xmm2, qword ptr [r10 + ray+16] # xmm2 = mem[0],zero
vinserti128 ymm0, ymm1, xmm0, 1
vmovq xmm1, qword ptr [r11 + ray+24] # xmm1 = mem[0],zero
vpxor ymm0, ymm0, ymmword ptr [rax + ray+32]
vpunpcklqdq xmm1, xmm2, xmm1 # xmm1 = xmm2[0],xmm1[0]
vmovq xmm2, qword ptr [r9 + ray+8] # xmm2 = mem[0],zero
vpunpcklqdq xmm2, xmm3, xmm2 # xmm2 = xmm3[0],xmm2[0]
vinserti128 ymm1, ymm2, xmm1, 1
vpxor ymm1, ymm1, ymmword ptr [rax + ray]
vpor ymm0, ymm1, ymm0
vextracti128 xmm1, ymm0, 1
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 238 # xmm1 = xmm0[2,3,2,3]
vpor xmm0, xmm0, xmm1
vmovq rax, xmm0
pop rbx
pop r14
vzeroupper
retWorlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
tcusr
- Posts: 325
- Joined: Tue Aug 31, 2021 10:32 pm
- Full name: tcusr
Re: Combining two of Bob's classic bitboard attack getters
why is it faster if the table available at compile? what kind of optimization does the compiler do?
thank you for your tests! would my mind adding hyperbola quintessence?
thank you for your tests! would my mind adding hyperbola quintessence?
-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Combining two of Bob's classic bitboard attack getters
I dont mind adding it. Do you have a repo where it is implemented?tcusr wrote: ↑Sat Dec 11, 2021 6:08 pm why is it faster if the table available at compile? what kind of optimization does the compiler do?
thank you for your tests! would my mind adding hyperbola quintessence?
Ideally i would have 1 file with this interface:
Code: Select all
void Init()
uint64_t Queen(int sq, uint64_t occ)it was faster because I replaced old msvc intrinsic the std::countlz.
constexpr does not make it faster when its unknown. BUT: when you know square and submit 0 instead of x the compiler will remove almost all code. So in general case its same speed. But when you have an engine that loops over the squares from 0 to 64 it will be much much faster!
When i say sq = 2 the assembly is much smaller and the compiler knows what the content is!
Code: Select all
Queen(unsigned long): # @Queen(unsigned long)
movabs rax, -9223372036854775807
movabs rdx, -8644650654150162432
movabs rsi, -9223371486023118848
movabs r9, 578721382704613376
or rdi, rax
lea rcx, [rax + 16909311]
add rax, 239
and rcx, rdi
and rdx, rdi
and rax, rdi
and rsi, rdi
and edi, 7
tzcnt rcx, rcx
tzcnt rdx, rdx
tzcnt r8, rax
mov eax, 16909312
tzcnt rsi, rsi
shl rcx, 7
shl rdx, 7
shl rsi, 7
shl r8, 7
xor r9, qword ptr [rdx + ray+8]
xor rax, qword ptr [rcx + ray]
mov edx, 240
xor rdx, qword ptr [r8 + ray+24]
or r9, rax
movabs rax, 550831656960
xor rax, qword ptr [rsi + ray+16]
or rdx, rax
lzcnt rax, rdi
xor rax, 63
or rdx, r9
shl rax, 7
mov rax, qword ptr [rax + ray+56]
xor rax, 7
or rax, rdx
retWorlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Combining two of Bob's classic bitboard attack getters
Update:
Sometimes I wonder how verbose the code is from other people. It hides the simple lookup code which is really just that:
Robert Hyatt's and Michael Sherwin's classical bitboard approach is this:
I see why this was done. Because 15 years ago the compilers didnt resolve the dependency chain in a good way.
This lookup is nice and clean. Just how it should be done. This can be improved by defining the rwsSE in a different way so that the inner 63 - will go away.
But that should be done by the original authors! - are they still active here or on other boards?
Sometimes I wonder how verbose the code is from other people. It hides the simple lookup code which is really just that:
Robert Hyatt's and Michael Sherwin's classical bitboard approach is this:
Code: Select all
uint64_t Queen(int sq, uint64_t occ) {
uint64_t result = 0;
occ |= 0x8000000000000001;
result |= ray[sq].rayNW ^ ray[std::countr_zero(ray[sq].rwsNW & occ)].rayNW;
result |= ray[sq].rayNN ^ ray[std::countr_zero(ray[sq].rwsNN & occ)].rayNN;
result |= ray[sq].rayNE ^ ray[std::countr_zero(ray[sq].rwsNE & occ)].rayNE;
result |= ray[sq].rayEE ^ ray[std::countr_zero(ray[sq].rwsEE & occ)].rayEE;
result |= ray[sq].raySE ^ ray[63 - std::countl_zero(ray[sq].rwsSE & occ)].raySE;
result |= ray[sq].raySS ^ ray[63 - std::countl_zero(ray[sq].rwsSS & occ)].raySS;
result |= ray[sq].raySW ^ ray[63 - std::countl_zero(ray[sq].rwsSW & occ)].raySW;
result |= ray[sq].rayWW ^ ray[63 - std::countl_zero(ray[sq].rwsWW & occ)].rayWW;
return result;
}This lookup is nice and clean. Just how it should be done. This can be improved by defining the rwsSE in a different way so that the inner 63 - will go away.
But that should be done by the original authors! - are they still active here or on other boards?
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer