0x88 engines

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: 0x88 engines

Post by Zach Wegner »

MattieShoes wrote:
Of course the color of square function should be inlined.
It's just one of my macros. I probably rely on macros too much...

Code: Select all

#define ROW(x)				((x) >> 4)
#define COL(x)				((x)&7)
#define LIGHT_SQUARE(x)		((((x)>>4)^(x))&1)
#define TO_BOARD64(x)		(((x)&7) | (((x)>>1)&56))
#define TO_BOARD128&#40;x&#41;		(((&#40;x&#41;<<1&#41;&0xF0&#41; | (&#40;x&#41;&7&#41;)
#define INVALID_SQUARE&#40;x&#41;	(&#40;x&#41;&0x88&#41;
(By the way, index 0 is a8 in my scheme, not a1, so it's backwards compared to cpw)

Nowadays, the compiler handles inlining functions automatically, yes? I haven't explicitly inlined any of my functions on the assumption that very simple 1 line functions would automatically get inlined anyway.
Ain't nothing wrong with macros. Change that to RANK_OF and FILE_OF or something though, whats a row and column? :) There are two advantages to inlines, but both are pretty trivial: type safety, and the ability to stick asserts in there. And compilers will be inlining any one-line functions, yes.

BTW:

Code: Select all

#define TO_BOARD128&#40;x&#41; &#40;x + &#40;x & ~7&#41;)
#define TO_BOARD64&#40;x&#41; &#40;x - (&#40;x & ~7&#41; >> 1&#41;)
There might actually be something even better out there, I just made these up from what I remember. The first one should be optimal though.
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: 0x88 engines

Post by MattieShoes »

I picked row and col rather than rank and file on purpose. In chess, ranks are 1-8, and they may or may not be relative - rook on 7th rank can refer to rook on 2nd rank for black. Row within my program is absolute, and 0-7 instead of 1-8, and actually backwards - ROW(A8) is 0 and ROW(A1) is 7. File is absolute in chess but it's ambiguous in the programming sense. Plus chess files are characters a-h, not 0-7. :-)
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: 0x88 engines

Post by hgm »

MattieShoes wrote:This works with 16x8 as well, the exact same way, though, so I don't see how it'd makes 16x16 faster. for 16x8...
array[119+sqOne-sqTwo]
The constant might change in 16x16 but the array would be identical in size as far as I can see...
When we say 16x8, we mean char board[16][8], for which the compiler translates board[rank][file] as *(board + 8*rank + file).

So for 16x8 the attack lookup array does not work, as sqOne - sqTwo is not unambiguous. You need at least Hx15 for that.
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: 0x88 engines

Post by hgm »

diep wrote:In diep i'm using an incremental attacktable. It works as follows:

int attacktable[2][64]; For side times squares.

In each integer: sign bit is not used.
bits 15..30 are giving the index numbers, in bitboard style, 1 bit
for each n, into the piecelist array.

int piecelist[2][18]; // gives square of piece.

So in that manner you can see, if you would need it what squares the attackers are on.

Then least few significant 4 bits are the number of attackers onto 1 square.

For titled players, number of attackers to a square is an important consideration in evaluating a position simply. Also in SEE it is of use.

the bits 4..29 there is a 'gnuchess 4.0' type of sense of value of a piece.

That ranges from ctlP being most significant to ctlK being least significant

Note the manner in how i do this 'ctlX' stuff is different from gnuchess 4.0
How do you do this incremental updtae? I want to do something similar in my engine HaQiKi D, but the problem I am running into is efficient removal of attacks when a piece is moved, in combination with elongation of unblocked slider attacks.

It is easy enough to keep some sort of stack containing the square number of all newly attacked squares (revealed by move generation for the moved piece) and the old attack maps of those squares. That allows for easy UnMake, without invoking move generation again. But when the piece is a slider, and one of its targets moves away later, its target might shift to some new square further along the ray. Then, when I later make a move with that slider, I cannot rely on the information on the stack with attacked squares to clear the corresponding attack bits in its former targets, as this would not contain the elongated slider move. And I cannot easily update the target square in the stack to the shifted target, as I don't know where to find the stack.

This problem seemed so nasty, that I became convinced it would be more efficient to use linked list to contain the attack info. By linking each attack in a list for the fromSquare as well as the toSquare, it is very easy to remove every move of a moved or captured piece out of its attack map (the list hanging from the toSquare), and uncouple the fromList it is part of in its entirety (for easy UnMake). Then I can easily update a cell to a new target, when a slider is unblocked. (For easy UnMake I actually replace them, keeping the toList they are in, and just taking them out of the fromList.)

But working with the doubly linked lists puts a heavy burdon on the load/store unit.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: 0x88 engines

Post by diep »

hgm wrote:
diep wrote:In diep i'm using an incremental attacktable. It works as follows:

int attacktable[2][64]; For side times squares.

In each integer: sign bit is not used.
bits 15..30 are giving the index numbers, in bitboard style, 1 bit
for each n, into the piecelist array.

int piecelist[2][18]; // gives square of piece.

So in that manner you can see, if you would need it what squares the attackers are on.

Then least few significant 4 bits are the number of attackers onto 1 square.

For titled players, number of attackers to a square is an important consideration in evaluating a position simply. Also in SEE it is of use.

the bits 4..29 there is a 'gnuchess 4.0' type of sense of value of a piece.

That ranges from ctlP being most significant to ctlK being least significant

Note the manner in how i do this 'ctlX' stuff is different from gnuchess 4.0
How do you do this incremental updtae? I want to do something similar in my engine HaQiKi D, but the problem I am running into is efficient removal of attacks when a piece is moved, in combination with elongation of unblocked slider attacks.

It is easy enough to keep some sort of stack containing the square number of all newly attacked squares (revealed by move generation for the moved piece) and the old attack maps of those squares. That allows for easy UnMake, without invoking move generation again. But when the piece is a slider, and one of its targets moves away later, its target might shift to some new square further along the ray. Then, when I later make a move with that slider, I cannot rely on the information on the stack with attacked squares to clear the corresponding attack bits in its former targets, as this would not contain the elongated slider move. And I cannot easily update the target square in the stack to the shifted target, as I don't know where to find the stack.

This problem seemed so nasty, that I became convinced it would be more efficient to use linked list to contain the attack info. By linking each attack in a list for the fromSquare as well as the toSquare, it is very easy to remove every move of a moved or captured piece out of its attack map (the list hanging from the toSquare), and uncouple the fromList it is part of in its entirety (for easy UnMake). Then I can easily update a cell to a new target, when a slider is unblocked. (For easy UnMake I actually replace them, keeping the toList they are in, and just taking them out of the fromList.)

But working with the doubly linked lists puts a heavy burdon on the load/store unit.
It is not so easy to get attacktables bugfree to work in an incremental manner. Considering your great analytical skills i leave it to those to figure out how to solve it. Doing attacktables incremental is a lot faster than generating them new of course.

You will realize that the 'free' thing i get from my attacktables is whether the king is under attack. So i do save out *that* system time.

Note this datastructure is not so clever for a fast beancounter, as doing attacktables incremental already is too slow. Diep gets 1 million nps single core when i would remove its eval, some more toying i managed to get it a tad faster than that (k7 2.1Ghz).

The only manner to really get to 2.5 million nps at a k7, so under 1000 cycles, was to do things in ad hoc manner.

A big problem of combining 2 searches is that a 2.5 million nps engine is less efficient thanks to lacking all the information that Diep has to its avail during search. In diep it is crucial to not search hopeless moves in qsearch.

At 2.5 mln nps you prefer to search a few hopeless moves over doing too little, your nps allows it anyway.

In diep of course the qsearch is everywhere the same. That allows better hashing. At any given depth in qsearch, the search sees and concludes the same bounds/scores so to speak.

At 2.5 mln nps what i do is 1 ply of checks and then just recaptures or interesting captures. It's total material driven in that case. No hashing at all in qsearch of course at that speed, the memory controller wouldn't be able to handle it, only hashing during main search. So that 2.5 mln nps gets limited mainly by memory controller in main search and by me reusing functionality out of diep to sooner get the program done (whereas new special dedicated code would be faster).

My basic question would be why you would want to get to work incremental attacktables considering that AFAIK not a single engine yours has a lot of chessknowledge inside its evaluation function.

You want to try to go for that?

Thanks,
Vincent
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: 0x88 engines

Post by diep »

Zach Wegner wrote:
MattieShoes wrote:
diep wrote: 16x16 is faster for detecting checks and attacks. Doing checks in qsearch is real important and attacks are for other purposes interesting to discover.

It is easy to see that if you want to see whether a piece P at square sq attacks square x that you can already have a quick indication of this with a few + and minus signs:

attack1mask = attackarray[constant + sq- x];
This works with 16x8 as well, the exact same way, though, so I don't see how it'd makes 16x16 faster. for 16x8...
array[119+sqOne-sqTwo]
The constant might change in 16x16 but the array would be identical in size as far as I can see...
Simple explanation of superiority of 16x12+ versus 0x88:
16x12:

Code: Select all

int *gen_slider_moves&#40;int sq, int dir&#41;
&#123;
  int to = sq + dir;
  while &#40;board&#91;to&#93; == EMPTY&#41;
  &#123;
    add_move&#40;sq, to&#41;;
    to += dir;
  &#125;
  if &#40;board&#91;to&#93; == ENEMY&#41;
    add_move&#40;sq, to&#41;;
&#125;
0x88:

Code: Select all

int *gen_slider_moves&#40;int sq, int dir&#41;
&#123;
  int to = sq + dir;
  while (!&#40;to & 0x88&#41; && board&#91;to&#93; == EMPTY&#41;
  &#123;
    add_move&#40;sq, to&#41;;
    to += dir;
  &#125;
  if (!&#40;to & 0x88&#41; && board&#91;to&#93; == ENEMY&#41;
    add_move&#40;sq, to&#41;;
&#125;
Basically you save 1 branch per move generated.
Diep's move generator factors faster than all this when you would combine it with 16x16 board.

Here is how the same invariant looks like for xray pieces at a 8x8 board
and it includes move ordering and it works better at OoO processors as i'm not each time referencing the register you just modified. Note what i'm doing below works branchless if you're not doing move ordering implicitly:

Code: Select all

#define QuickLink&#40;FFTT&#41; &#123;genindex->zet = FFTT;genindex++;&#125;
while&#40; &#40;sq=*psq++) != 128 ) &#123; // loop of all xray pieces B R Q
    int SRsq  = &#40;sq<<6&#41;;
    int piece = board&#91;sq&#93;;
    int *s,*v,*w,u,*PSQvals,curval;

    PSQvals = QuickTables&#91;side&#93;&#91;piece&#93;;
    s = andscan&#91;0&#93;;
    v = ipiecepos&#91;piece&#93;&#91;sq&#93;;
    w = iskippos&#91;sq&#93;;
    curval = PSQvals&#91;sq&#93;;

    u = *v++;
    do &#123;
      int p1=snelbord&#91;u&#93;,sh=w&#91;u&#93;;
      v += &#40;s&#91;p1&#93;&sh&#41;;
      if&#40; (&#40;p1-1&#41;>>3&#41; != side ) &#123;
        genindex->score = PSQvals&#91;u&#93;+pvals&#91;p1&#93;-curval;
        QuickLink&#40;SRsq|u|t&#91;p1&#93;);
      &#125;
    &#125; while&#40; &#40;u=*v++) != 128 );
  &#125;
User avatar
hgm
Posts: 27795
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: 0x88 engines

Post by hgm »

The first application I had in mind, and that should pay back already more than the effort of updating the tables, is to speed up QS. About half of the QS nodes fail high by stand pat, but currently the other half that doesn't requires generation of captures (which is almost as expensive as generation of all moves, in mailbox). And in a large fraction of those just to conclude that I have no captures left.

With an attack table I can see very quickly that I have no non-futile captures left. Just look at the attacks on the non-futile victims. That saves tons of time. And even if there are captures, and they are non-futile, generating them in MVV/LVA order from the attack tables is very much faster than generating moves from scratch. Especially in cut nodes, where 80-90% of these nodes does not get beyond the first capture. I almost never have a hash move in QS.

A next thing is that I would like to forbid stand-pat (or adjust the eval value down so much that it is an almost certain fail low) in the face of two LxH or ANYxUNDEFENDED captures against me. (Or perhaps extend.)

Another important application would be to efficiently determine which side forfeits on a repetition, for perpetually attacking undefended pieces.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: 0x88 engines

Post by diep »

hgm wrote:
diep wrote:In diep i'm using an incremental attacktable. It works as follows:

int attacktable[2][64]; For side times squares.

In each integer: sign bit is not used.
bits 15..30 are giving the index numbers, in bitboard style, 1 bit
for each n, into the piecelist array.

int piecelist[2][18]; // gives square of piece.

So in that manner you can see, if you would need it what squares the attackers are on.

Then least few significant 4 bits are the number of attackers onto 1 square.

For titled players, number of attackers to a square is an important consideration in evaluating a position simply. Also in SEE it is of use.

the bits 4..29 there is a 'gnuchess 4.0' type of sense of value of a piece.

That ranges from ctlP being most significant to ctlK being least significant

Note the manner in how i do this 'ctlX' stuff is different from gnuchess 4.0
How do you do this incremental updtae? I want to do something similar in my engine HaQiKi D, but the problem I am running into is efficient removal of attacks when a piece is moved, in combination with elongation of unblocked slider attacks.

It is easy enough to keep some sort of stack containing the square number of all newly attacked squares (revealed by move generation for the moved piece) and the old attack maps of those squares. That allows for easy UnMake, without invoking move generation again. But when the piece is a slider, and one of its targets moves away later, its target might shift to some new square further along the ray. Then, when I later make a move with that slider, I cannot rely on the information on the stack with attacked squares to clear the corresponding attack bits in its former targets, as this would not contain the elongated slider move. And I cannot easily update the target square in the stack to the shifted target, as I don't know where to find the stack.

This problem seemed so nasty, that I became convinced it would be more efficient to use linked list to contain the attack info. By linking each attack in a list for the fromSquare as well as the toSquare, it is very easy to remove every move of a moved or captured piece out of its attack map (the list hanging from the toSquare), and uncouple the fromList it is part of in its entirety (for easy UnMake). Then I can easily update a cell to a new target, when a slider is unblocked. (For easy UnMake I actually replace them, keeping the toList they are in, and just taking them out of the fromList.)

But working with the doubly linked lists puts a heavy burdon on the load/store unit.
Harm-Geert,

Linked lists is something you teach first year students, using them is too slow during search for attacks at squares. Besides that, it wouldn't fit in the 2000 characters you have left in the margin.

Thanks,
Vincent
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: 0x88 engines

Post by diep »

hgm wrote:The first application I had in mind, and that should pay back already more than the effort of updating the tables, is to speed up QS. About half of the QS nodes fail high by stand pat, but currently the other half that doesn't requires generation of captures (which is almost as expensive as generation of all moves, in mailbox). And in a large fraction of those just to conclude that I have no captures left.

With an attack table I can see very quickly that I have no non-futile captures left. Just look at the attacks on the non-futile victims. That saves tons of time. And even if there are captures, and they are non-futile, generating them in MVV/LVA order from the attack tables is very much faster than generating moves from scratch. Especially in cut nodes, where 80-90% of these nodes does not get beyond the first capture. I almost never have a hash move in QS.

A next thing is that I would like to forbid stand-pat (or adjust the eval value down so much that it is an almost certain fail low) in the face of two LxH or ANYxUNDEFENDED captures against me. (Or perhaps extend.)

Another important application would be to efficiently determine which side forfeits on a repetition, for perpetually attacking undefended pieces.
Attacktables are far too slow to do this.

Wasting system time on a call to qsearch is silly if standpat would already fail high in qsearch. You can detect that faster of course.

The only problem to solve is fast generation of checks in qsearch.
It is there where bitboards are not extremely slow. However there is fast manners to do this in regular datastructures, which gives a better high level overview anyway on the program.

Also a problem that bitboarders have is that they first generate the captures and THEN the checks.

This is bad for mate threat extensions (so you use them), as it'll return a cutoff for a capture already instead of detecting the mate. It's better to order the best move first, that's better for your search tree.

Vincent
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: 0x88 engines

Post by diep »

Greg Strong wrote:
diep wrote:In fact biggest limit is c++ in most codes. The guys here don't realize how much of a slow down average c++ code is compared to C code, let alone assembler.
We don't realize it because it's not true.
You are?

Vincent