## Protozoa Dev Log

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Mike Sherwin
Posts: 874
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

### Protozoa Dev Log

Not being happy with my last few attempts I decided to try one more time. In Protozoa things have been done a little different. Instead of keeping a giant move list in the Thread struct I just make a move list in the function that calls the move generator and pass a pointer to it. It makes for much cleaner code. I have strived to make clean easy to understand code. So far I think that I am doing okay. I'm not saying my code is best. I'm saying it is best for me. So I'm happy with it. Some code I am quite proud of. For example my castling code.

Code: Select all

``````// In GenAllMoves(Thread* t, sMove* m);
case WKC:
mft = ((u64)((board[h1] == WRC) && !(occ & owcs) && !AtkByBlack(t, awcs))) * WCS;
m += (mft == WCS);
mft = ((u64)((board[a1] == WRC) && !(occ & owcl) && !AtkByBlack(t, awcl))) * WCL;
m += (mft == WCL);
[[fallthrough]];
case WK:
atk |= bb = kingMoves[fs] & notme;
break;

``````
WKC is a king that can castle. Once it moves it becomes a WK a king that can't castle. On takeback it becomes a WKC again. The rooks work the same way. Castling is done with all branchless code. Even the main body of AtkByBlack() is branchless.

Code: Select all

``````static bool AtkByBlack(Thread* t, u64 b) {
u64 atk = 0;
u64 occ = sides[WHITE] | sides[BLACK];
while (b) {
s32 sq = std::countr_zero(b);
b ^= 1ull << sq;
atk |= (wPawnCaptures[sq] & pawns[BLACK]);
atk |= (knightMoves[sq] & knights[BLACK]);
atk |= (kingMoves[sq] & kings[BLACK]);
atk |= (dSubset[sq][(((occ & dMask[sq]) * file_b2_b7) >> 58)] & (bishops[BLACK] | queens[BLACK]));
atk |= (aSubset[sq][(((occ & aMask[sq]) * file_b2_b7) >> 58)] & (bishops[BLACK] | queens[BLACK]));
atk |= (hSubset[sq][(occ >> hShift[sq]) & 63] & (rooks[BLACK] | queens[BLACK]));
atk |= (vSubset[sq][((((occ >> (sq & 7)) & file_a2_a7) * diag_c2_h7) >> 58)] & (rooks[BLACK] | queens[BLACK]));
}
return (atk != 0);
}``````
For those that are new to TalkChess the line attacks are of my own creation. They are very competitive with Black Magic. The rook code is faster but the bishop is a little slower. In total though it only uses about 1/6th the memory of Black Magic. If one is interested here is the initialization code.

Code: Select all

``````		// diagonals
for (ts = sq + 9, dx = x + 1, dy = y + 1; dx < FILEh && dy < RANK8; dMask[sq] |= 1ull << ts, ts += 9, dx++, dy++);
for (ts = sq - 9, dx = x - 1, dy = y - 1; dx > FILEa && dy > RANK1; dMask[sq] |= 1ull << ts, ts -= 9, dx--, dy--);

// anti diagonals
for (ts = sq + 7, dx = x - 1, dy = y + 1; dx > FILEa && dy < RANK8; aMask[sq] |= 1ull << ts, ts += 7, dx--, dy++);
for (ts = sq - 7, dx = x + 1, dy = y - 1; dx < FILEh && dy > RANK1; aMask[sq] |= 1ull << ts, ts -= 7, dx++, dy--);

// diagonal indexes
for (index = 0; index < 64; index++) {
dSubset[sq][index] = 0;
occ = index << 1;
if ((sq & 7) != FILEh && (sq >> 3) != RANK8) {
for (ts = sq + 9; ; ts += 9) {
dSubset[sq][index] |= (1ull << ts);
if (occ & (1ull << (ts & 7))) break;
if ((ts & 7) == FILEh || (ts >> 3) == RANK8) break;
}
}
if ((sq & 7) != FILEa && (sq >> 3) != RANK1) {
for (ts = sq - 9; ; ts -= 9) {
dSubset[sq][index] |= (1ull << ts);
if (occ & (1ull << (ts & 7))) break;
if ((ts & 7) == FILEa || (ts >> 3) == RANK1) break;
}
}
}

// anti diagonal indexes
for (index = 0; index < 64; index++) {
aSubset[sq][index] = 0;
occ = index << 1;
if ((sq & 7) != FILEa && (sq >> 3) != RANK8) {
for (ts = sq + 7; ; ts += 7) {
aSubset[sq][index] |= (1ull << ts);
if (occ & (1ull << (ts & 7))) break;
if ((ts & 7) == FILEa || (ts >> 3) == RANK8) break;
}
}
if ((sq & 7) != FILEh && (sq >> 3) != RANK1) {
for (ts = sq - 7; ; ts -= 7) {
aSubset[sq][index] |= (1ull << ts);
if (occ & (1ull << (ts & 7))) break;
if ((ts & 7) == FILEh || (ts >> 3) == RANK1) break;
}
}
}

// vertical indexes
for (index = 0; index < 64; index++) {
vSubset[sq][index] = 0;
uint64_t blockers = 0;
for (int i = 0; i <= 5; i++) {
if (index & (1ull << i)) {
blockers |= (1ull << (((5 - i) << 3) + 8));
}
}
if ((sq >> 3) != RANK8) {
for (ts = sq + 8; ; ts += 8) {
vSubset[sq][index] |= (1ull << ts);
if (blockers & (1ull << (ts - (ts & 7)))) break;
if ((ts >> 3) == RANK8) break;
}
}
if ((sq >> 3) != RANK1) {
for (ts = sq - 8; ; ts -= 8) {
vSubset[sq][index] |= (1ull << ts);
if (blockers & (1ull << (ts - (ts & 7)))) break;
if ((ts >> 3) == RANK1) break;
}
}
}

// horizontal indexes
for (index = 0; index < 64; index++) {
hSubset[sq][index] = 0;
occ = index << 1;
if ((sq & 7) != FILEh) {
for (ts = sq + 1; ; ts += 1) {
hSubset[sq][index] |= (1ull << ts);
if (occ & (1ull << (ts & 7))) break;
if ((ts & 7) == FILEh) break;
}
}
if ((sq & 7) != FILEa) {
for (ts = sq - 1; ; ts -= 1) {
hSubset[sq][index] |= (1ull << ts);
if (occ & (1ull << (ts & 7))) break;
if ((ts & 7) == FILEa) break;
}
}
}

// horizontal shift
hShift[sq] = (sq & 56) + 1;
}
``````
Here are my piece types.

Code: Select all

``````enum {
WP2, WP3, WP4, WP5, WP6, WP7, WN, WB, WRC, WR, WQ, WKC, WK,
ES,
BP7, BP6, BP5, BP4, BP3, BP2, BN, BB, BRC, BR, BQ, BKC, BK,
WPQ, WPN, WPR, WPB, WCS, WCL,
BPQ, BPN, BPR, BPB, BCS, BCL
};``````
As can be seen I use pseudo pieces. It really allows the code to be simple. The most complex code is WP5 (BP4) as it has to determine if the pawn is making an en-passant capture. Once again the code is branchless.

Code: Select all

``````    case WP5:
sq = mts - ((epbit[sly] == (1ull << mts)) << 3);
mtt = board[sq];
board[mfs] = ES;
board[sq] = ES;
board[mts] = WP6;
pawns[WHITE] ^= (1ull << mfs | 1ull << mts);
sides[WHITE] ^= (1ull << mfs | 1ull << mts);
*types[mtt] ^= (u64)(mtt != ES) << sq;
sides[BLACK] ^= (u64)(mtt != ES) << sq;
mat[BLACK] -= value[mtt];
pos[WHITE] += (wptbl[mts] - wptbl[mfs]);
pos[BLACK] -= tbls[mtt][sq];
break;``````
the sq var gets the appropriate value for an ep capture. If there is no ep capture it gets the square of a normal capture or just a move. I think the rest is understandable. In MakeMove() since most nodes will be capture nodes because of Qsearch() captures are assumed and just processed branchless. If it is not a captures it doesn't change anything.

These are just a few highlights of Protozoa. Right now I can only make moves verified by the move generator. My last working engine is Bricabrac. It searched approximately 11,000,000 positions a second using a single thread on a Ryzen 9 3950x at 4.2 GHz. This engine will be substantially faster!

One other design feature that is different from Bricabrac is that it will be able to play more than one game at a time. I'm also developing a GUI where one can setup as many positions as there are available threads and analyze all at the same time. Analysis will incorporate a combination of real time reinforcement learning and full width search.

When I get Protozoa to the point it can play a game in a chess gui I will provide a link to the engine.
mar
Posts: 2579
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

### Re: Protozoa Dev Log

hey Mike,
nice, I wonder how fast you can go

I don't see where you store to move list in your castling example, maybe I missed something?
I like your idea with special pieces for castling and pawns on various ranks, that's clever

Mike Sherwin
Posts: 874
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

### Re: Protozoa Dev Log

mar wrote: Mon Apr 22, 2024 7:27 am hey Mike,
nice, I wonder how fast you can go

I don't see where you store to move list in your castling example, maybe I missed something?
I like your idea with special pieces for castling and pawns on various ranks, that's clever

Hi Martin,

Long time no see. This is a happy day that you decided to drop in and say hello!

I pass a move list pointer to GenMoves(Thread* t, sMove* m) {}

struct sMove {
u08 fs;
u08 ts;
u08 ft;
u08 tt;
s32 sc;
};

#define mfs m->fs
#define mts m->ts
#define mft m->ft
#define mtt m->tt
#define msc m->sc

In the castling code:

mft = ((u64)((board[h8] == BRC) && !(occ & obcs) && !AtkByWhite(t, abcs))) * BCS;
m += (mft == BCS);

mft (move from type, m->ft) equates to either BCS (black castles short) if all conditions are true 1 * BCS or 0 * BCS.
then m is incremented if (mft == BCS) and the move type is stored in the move list. Otherwise no move is stored and m+= 0; does not progress the move counter.

My slider code is SISSY, evolved. Same basic idea but I use Kindergarten math to make it only two lookups instead of the eight or six lookups like before.

My sort of a joke name for my new move generator is, Kindergarten Super SISSY bitboards. So far I think I'm the only one that finds that funny, lol. In some test they have tested faster than Black Magic and in some they haven't although they were close. The rook code is for sure much faster than the Black Magic rook. It is the bishop code that lags behind, unfortunately.

Anyway, it is good to see you again. And if you ever feel like contributing in anyway you are always welcome!!! Thanks!
Mike Sherwin
Posts: 874
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

### Re: Protozoa Dev Log

I was having trouble getting GetCommand() to work properly. The C functions like fgets() and scanf() always confused me. So I just decided that I'd see if ChatGPT could help me. So I gave ChatGpt the entire function and asked it to rewrite it using modern C++ and also to correct my error. And it did. And it works. And it is very clean.

Code: Select all

``````static void GetCommand(Thread* t) {
std::string buf, cmd;
u08 fs, ts, i;
sMove m[120];

PrintBoard(t);

std::cout << std::flush;
std::getline(std::cin, buf);

if (buf.empty()) return;

if (buf[0] >= 'a' && buf[0] <= 'h' &&
buf[1] >= '1' && buf[1] <= '8' &&
buf[2] >= 'a' && buf[2] <= 'h' &&
buf[3] >= '1' && buf[3] <= '8') {
fs = ((buf[1] - 49) << 3) + buf[0] - 97;
ts = ((buf[3] - 49) << 3) + buf[2] - 97;
GenAllMoves(t, m);
for (i = 0; m[i].ft != ES; i++) {
if (m[i].ft == WCS && fs == e1 && ts == g1) break;
if (m[i].ft == WCL && fs == e1 && ts == c1) break;
if (m[i].ft == BCS && fs == e8 && ts == g8) break;
if (m[i].ft == BCL && fs == e8 && ts == c8) break;
if (fs == m[i].fs && ts == m[i].ts) {
if (board[fs] == WP7) {
if (buf[4] == 'q' && m[i].ft == WPQ) break;
if (buf[4] == 'n' && m[i].ft == WPN) break;
if (buf[4] == 'r' && m[i].ft == WPR) break;
if (buf[4] == 'b' && m[i].ft == WPB) break;
continue;
}
if (board[fs] == BP2) {
if (buf[4] == 'q' && m[i].ft == BPQ) break;
if (buf[4] == 'n' && m[i].ft == BPN) break;
if (buf[4] == 'r' && m[i].ft == BPR) break;
if (buf[4] == 'b' && m[i].ft == BPB) break;
continue;
}
break;
}
}
if (m[i].ft != ES) GameMove(t, &m[i]);
return;
}

std::istringstream iss(buf);
iss >> cmd;

if (cmd == "xboard") return;

if (cmd == "otim") return;

if (cmd == "hint") return;

if (cmd == "remove") return;

if (cmd == "new") {
NewGame(t);
return;
}

if (cmd == "quit") {
mode = EXIT;
return;
}

if (cmd == "st") {
int stValue;
if (iss >> stValue) {
maxTime = stValue;
maxDepth = 99;
}
return;
}

if (cmd == "sd") {
int sdValue;
if (iss >> sdValue) {
maxDepth = sdValue;
maxTime = 1000000;
}
return;
}

if (cmd == "time") {
int timeValue;
if (iss >> timeValue) {
maxTime = timeValue * 10 / 30;
maxDepth = 99;
}
return;
}

if (cmd == "go") {
mode = SEARCH;
return;
}

if (cmd == "undo") {
if (gly >= 0) TakeBack(t, &game[gly]);
gly--;
return;
}
}``````
mar
Posts: 2579
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

### Re: Protozoa Dev Log

Mike Sherwin wrote: Tue Apr 23, 2024 5:45 am Hi Martin,

Long time no see. This is a happy day that you decided to drop in and say hello!
sorry for the late reply - I've been taking a 2-year break, now I'm back for a short while I don't intend to overstay the welcome.

I understand the mul operation, it's just that I don't see the movelist being updated in the pseudo-code
My slider code is SISSY, evolved. Same basic idea but I use Kindergarten math to make it only two lookups instead of the eight or six lookups like before.

My sort of a joke name for my new move generator is, Kindergarten Super SISSY bitboards. So far I think I'm the only one that finds that funny, lol. In some test they have tested faster than Black Magic and in some they haven't although they were close. The rook code is for sure much faster than the Black Magic rook. It is the bishop code that lags behind, unfortunately.

out of curiosity: black magic is this Volker's idea? viewtopic.php?t=64790
Anyway, it is good to see you again. And if you ever feel like contributing in anyway you are always welcome!!! Thanks!
Mike Sherwin
Posts: 874
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

### Re: Protozoa Dev Log

mar wrote: Sat Apr 27, 2024 11:32 pm I understand the mul operation, it's just that I don't see the movelist being updated in the pseudo-code

out of curiosity: black magic is this Volker's idea? viewtopic.php?t=64790
Now I'm perplexed because I am not being clear enough.
It is real code copied directly from Protozoa. m is a pointer to a move list. I don't like writing writing m->ft so I use the preprocessor (#define mft m->ft) to alias m->ft to mft. So when 1 * BCS is stored into mft it is storing it into the move list. Then m += (mft == BCS) is the same as m += 1 incrementing the move pointer. It saves typing and makes the code look cleaner. But if it makes it less understandable perhaps I shouldn't do it that way.

Yes I believe it is Volker's idea. It test the fastest of all magic methods.

Here is some test data. It is not real engine test data though. So there is no benefit in this test because of less memory used.

Code: Select all

``````Kindergarten Super SISSY Bitboards 491.898030                    16640   [130kb]     imul64                   no        Michael Sherwin                              https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift    378.511164                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic       204.041865                    6468    [50kb]      none                     no        Daniel Inführ                                tbd
Plain Magic BB                     488.175971                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       508.980537                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     722.584250                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards``````
mar
Posts: 2579
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

### Re: Protozoa Dev Log

Mike Sherwin wrote: Sun Apr 28, 2024 10:08 am Now I'm perplexed because I am not being clear enough.
It is real code copied directly from Protozoa. m is a pointer to a move list. I don't like writing writing m->ft so I use the preprocessor (#define mft m->ft) to alias m->ft to mft. So when 1 * BCS is stored into mft it is storing it into the move list. Then m += (mft == BCS) is the same as m += 1 incrementing the move pointer. It saves typing and makes the code look cleaner. But if it makes it less understandable perhaps I shouldn't do it that way.
my bad - I missed that mft is a macro, I thought it's a local variable
Here is some test data. It is not real engine test data though. So there is no benefit in this test because of less memory used.

Code: Select all

``````Kindergarten Super SISSY Bitboards 491.898030                    16640   [130kb]     imul64                   no        Michael Sherwin                              https://www.talkchess.com/forum3/viewtopic.php?f=7&t=81234&start=30
Fancy Magic BB - Variable shift    378.511164                    93376   [729kb]     imul64                   yes       Pradu Kannan                                 https://www.chessprogramming.org/Magic_Bitboards#Fancy
FoldingHash - 4x fancy magic       204.041865                    6468    [50kb]      none                     no        Daniel Inführ                                tbd
Plain Magic BB                     488.175971                    295168  [2306kb]    imul64                   no        Lasse Hansen                                 https://www.chessprogramming.org/Magic_Bitboards#Plain
Black Magic BB - Fixed shift       508.980537                    88891   [694kb]     imul64                   no        Onno Garms and Volker Annuss                 https://www.chessprogramming.org/Magic_Bitboards#Fixed_shift_Fancy
Pext constexpr                     722.584250                    107904  [843kb]     pext_u64                 yes       Zach Wegner                                  https://www.chessprogramming.org/BMI2#PEXTBitboards``````
hmm, that looks pretty good, thanks. I'll take a look
Mike Sherwin
Posts: 874
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

### Re: Protozoa Dev Log

It took a long time to find most of the bugs. I'm sure there are more!
But it was good enough for a self play game at depth 1 + Qsearch().
For some reason it does not like to make captures.
[pgn][Event "Computer chess game"]
[Site "DESKTOP-BTK08BG"]
[Date "2024.05.23"]
[Round "?"]
[White "Protozoa"]
[Black "Protozoa"]
[Result "1/2-1/2"]
[BlackElo "?"]
[ECO "D01"]
[Opening "Richter-Veresov Attack"]
[Time "20:32:20"]
[Variation "2.d3 d5"]
[WhiteElo "2400"]
[TimeControl "0+20"]
[Termination "normal"]
[PlyCount "41"]
[WhiteType "Computer"]
[BlackType "Computer"]

1. d4 d5 2. Nc3 Nc6 3. e3 e6 4. Nf3 Nf6 5. Ne5 Ne4 6. f4 f5 7. Bd3 Bd6 8.
O-O Rf8 9. Qf3 Qf6 10. a3 a6 11. g4 g6 12. Kg2 Rh8 13. g5 Qe7 14. Rd1 Ra7
15. Kf1 Ra8 16. Ke1 Ra7 17. h3 Ra8 18. Ra2 Ra7 19. Ra1 Ra8 20. Ra2 Ra7 21.
Ra1 {3-fold repetition} 1/2-1/2
[/pgn]
Mike Sherwin
Posts: 874
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

### Re: Protozoa Dev Log

Back in 2005 I was called the Maestro of Speed. By at least one person.
Well the Maestro of Speed is back!

I got the regular search working and in the original position on my Intel i7 3930k @3.8GHz (my Ryzen 3950x is not currently working) Protozoa is searching 9,638,109 nodes per second. After 1. e4 e5 it searches 10,642,109 nodes per second. My 3950x is roughly 70% faster. That would mean on a 3950x in the original position 16,385,824 nodes per second. And after 1. e4 e5 18,091,585 nodes per second.

The 7950x is about 50% faster than the 3950x and the 9950x is about 46% faster than the 7950x.

18,091,585 * 1.5 * 1.46 = 39,620,571 estimated nodes per second on an AMD 9950x. And more than that in the middle game. All nodes per second numbers are on a single thread.