Incremental update

hgm · Post by **hgm** » Sat Dec 26, 2015 12:55 pm

bob wrote:I've not seen anything faster...

That doesn't mean it does not exist, or would be impossible to create. This thread discusses the merits of incrementally updated attack maps, which, when done properly, would lead to an entirely different design from a standard bitboard engine. So I am not talking about how bitboard engines might best do thing, as these would mostly suck for an attack-map design, and would thus never allow you to judge the performance of such a design when you keep doing them.

I am just bringing this up because on several occasions you now wrote "you lost me, because that is not what I am doing in my bitboard engine". Well, of course it is not. I am not describing a bitboard engine, but an attackers-set engine.

Sven · Post by **Sven** » Sat Dec 26, 2015 1:35 pm

hgm wrote:
bob wrote:I've not seen anything faster...
That doesn't mean it does not exist, or would be impossible to create. This thread discusses the merits of incrementally updated attack maps, which, when done properly, would lead to an entirely different design from a standard bitboard engine. So I am not talking about how bitboard engines might best do thing, as these would mostly suck for an attack-map design, and would thus never allow you to judge the performance of such a design when you keep doing them.

I am just bringing this up because on several occasions you now wrote "you lost me, because that is not what I am doing in my bitboard engine". Well, of course it is not. I am not describing a bitboard engine, but an attackers-set engine.

But even an "attackers-set engine" uses either of the two main board representations: mailbox or bitboards. So in fact there would be four designs:

A. classical mailbox w/o attackers-sets,
B. mailbox with attackers-sets (presumably incrementally updated),
C. classical bitboards w/o explicitly maintaining attackers-sets, and
D. bitboards with additionally maintaining attackers-sets (incrementally updated as well).

As of today, a common assumption is that C is superior to A speed-wise. Your claim, which I believe instantly, is that a proper implementation of B is superior to A speed-wise as well. Now which are exactly your other claims? As I understand you claim that B would also be superior to C. I would reply "it depends" since I have not seen a thorough comparison, but my gut feeling tells me that C is still leading today.

Now it seems that we disagree about D.

First thing is, your last remark sounds as if you would not see a difference between B and D, I hope I misunderstood that since board representations are still a big difference which I believe to be crucial for certain parts of the engine.

Now my claim, and I think also Bob's, is that it is hard even for an optimized implementation of D to be superior to C, mostly due to the inherent redundancy of maintaining all existing attackers-sets where you can always retrieve the few relevant ones very cheaply with common bitboard technology.

But maybe the C vs. D competition only becomes important if you actually come up with an instance of B that beats C

We should not forget that we would need clearly defined measurement rules. We could measure:
a) perft speed (same hardware, single-threaded, with bulk counting, no hash table);
b) NPS of a material+PST only tree search (same hardware, single-threaded, no hash table, no pruning etc.);
both for a variety of positions with given depths.

More realistic would be
c) time-to-depth for a standard tree search on N cores with all engine features enabled,
but that would require that one single engine implements all the different designs given above via #ifdef which is a huge project.

What do you think?

Aleks Peshkov · Post by **Aleks Peshkov** » Sat Dec 26, 2015 3:58 pm

For example my attack-map engine can cheaply generate a bitboard of squares attacked by all (or any subset of) one side pieces. Very usefull for safe mobility, board control evaluation and so on.

Bitboard programs just cannot think about such kind of terms because they would be prohibitively slow.

bob · Post by **bob** » Sat Dec 26, 2015 4:53 pm

Aleks Peshkov wrote:For example my attack-map engine can cheaply generate a bitboard of squares attacked by all (or any subset of) one side pieces. Very usefull for safe mobility, board control evaluation and so on.

Bitboard programs just cannot think about such kind of terms because they would be prohibitively slow.

Why, exactly? It takes very little time to generate an attack set with bitmaps. This assumption that "you can't do this efficiently" is based more on lack of experience with bit boards than with an inherent flaw in bit boards. "Safe mobility" is not as important as key questions like "is this pawn passed", "is this file open" or whatever, all of which are trivial to do with bit boards. And safe mobility can also be done if it is deemed important. Certainly it is trivial in bitmaps for minor pieces, since you only care about pawns.

The only reasonable way to compare the two approaches is to compare two programs, one based on each approach. I've done both myself. I have absolutely no doubts about which is faster. You can make either unnecessarily slow, easy enough. But I've not seen anything that even approaches bit board speeds.

bob · Post by **bob** » Sat Dec 26, 2015 4:53 pm

hgm wrote:
Sven Schüle wrote:
hgm wrote:
bob wrote: A "set" has no "order" in the context of bitmaps as I (or most, apparently) use them. Just a set of 1's that represent valid destination squares...
Indeed. So that is why I consider what you are doing (bitboards) not very competitive, speedwise.
Then exactly how many engines are actually faster than Crafty, speedwise (for tree search)? How many of these are mailbox programs?
How many programs use incrementally updated move sets?

Ferret did. And it was reasonably fast, just nowhere near bit board speeds.

bob · Post by **bob** » Sat Dec 26, 2015 4:55 pm

ZirconiumX wrote:Would I be correct in saying that unless a king, knight or pawn moves, there are no situations that would change its attack set?

This would allow incremental update of those pieces, with sliding attack sets done when needed.

Matthew:out

Any piece that moves changes the attack sets for both sides. For example, any bishop move unblocks a rank and file, and blocks another rank and file.

sje · Post by **sje** » Sat Dec 26, 2015 5:04 pm

Update-in-place vs copy-update

Symbolic, along with my other chess programs, does update-in-place vs copy-update. The motivation here is speed and symmetry.

Symbolic has a relatively large database of bitboards and tables which represent all the attributes of a position. Copying all of this data on each call to Execute() would be very hard on L1 cache and the resulting extra load from hitting the much slower L2 cache would eat a lot of time. How much time? More time than what is needed for a time-reversed update-in-place.

In earlier days, the story might have been different when a CPU's ALU was relatively slow compared to L1/L2 cache accessing. But the speed of a CPU's operand juggling has grown faster than has cache speed, and I'll guess that this trend will continue.

bob · Post by **bob** » Sat Dec 26, 2015 7:37 pm

sje wrote:Update-in-place vs copy-update

Symbolic, along with my other chess programs, does update-in-place vs copy-update. The motivation here is speed and symmetry.

Symbolic has a relatively large database of bitboards and tables which represent all the attributes of a position. Copying all of this data on each call to Execute() would be very hard on L1 cache and the resulting extra load from hitting the much slower L2 cache would eat a lot of time. How much time? More time than what is needed for a time-reversed update-in-place.

In earlier days, the story might have been different when a CPU's ALU was relatively slow compared to L1/L2 cache accessing. But the speed of a CPU's operand juggling has grown faster than has cache speed, and I'll guess that this trend will continue.

I started off in 1994 (Crafty) with copy/make. Seemed to be a simple way of doing things. But I later did some careful profiling and the "copy" was a killer. This in the days of minimal L1 and NO L2/L3. So I went to Make/Unmake and that produced a really noticeable speedup although it is definitely a change. But it turns out with Make/Unmake it is not that hard to return to Copy/Make and I tried this a couple of years back. And it was still slower. Today's trees are quite deep and narrow, and that seems to play better with Make/Unmake as the board stays in L1.

However, not having an Unmake procedure is cleaner, just (for me) quite a bit slower.

hgm · Post by **hgm** » Sat Dec 26, 2015 8:25 pm

Sven Schüle wrote:But even an "attackers-set engine" uses either of the two main board representations: mailbox or bitboards.

Well, not necessarily. There are other representations, like storing individual ranks, files and diagonals in memory cells. Very competitive for 16x16 boards, for instance.

First thing is, your last remark sounds as if you would not see a difference between B and D, I hope I misunderstood that since board representations are still a big difference which I believe to be crucial for certain parts of the engine.

Well, if the board representation is hardly ever used for anything, it should not be very important what it is. I hope you agree with that. If captures and mobility are directly done from the attackers set, Pawn structure from the Pawn hash, material eval from a material table indexed by the material key... Well, not much use for any board, so far...

Now my claim, and I think also Bob's, is that it is hard even for an optimized implementation of D to be superior to C, mostly due to the inherent redundancy of maintaining all existing attackers-sets where you can always retrieve the few relevant ones very cheaply with common bitboard technology.

As Bob pointed out, you are not nearly there, after that. After this retrieval you would have the moves, but you would not automatically have them in the correct order. So you would have to sort them in a different order. Which cannot be done with shifts and bitwise boolean operations. So you would have to extract them into a move list...

We should not forget that we would need clearly defined measurement rules. We could measure:
a) perft speed (same hardware, single-threaded, with bulk counting, no hash table);
b) NPS of a material+PST only tree search (same hardware, single-threaded, no hash table, no pruning etc.);
both for a variety of positions with given depths.

Well, perft is not a realistic measure of engine speed, as too much emphasis is on checking legality of the moves in the final ply, while engines usually do better with pseudo-legal moves. (b) is unrealistic because it leaves out mobility, which is one of the most expensive evaluation terms. So I would go for an engine with material + PST + mobility. Writing that based on incremental attackers sets (and whatever board representation is most convenient; I don't expect it to matter much) is on my to-do list.

But unfortunately that list is rather long...

bob · Post by **bob** » Sat Dec 26, 2015 10:12 pm

hgm wrote:
Sven Schüle wrote:But even an "attackers-set engine" uses either of the two main board representations: mailbox or bitboards.
Well, not necessarily. There are other representations, like storing individual ranks, files and diagonals in memory cells. Very competitive for 16x16 boards, for instance.
First thing is, your last remark sounds as if you would not see a difference between B and D, I hope I misunderstood that since board representations are still a big difference which I believe to be crucial for certain parts of the engine.
Well, if the board representation is hardly ever used for anything, it should not be very important what it is. I hope you agree with that. If captures and mobility are directly done from the attackers set, Pawn structure from the Pawn hash, material eval from a material table indexed by the material key... Well, not much use for any board, so far...

Now my claim, and I think also Bob's, is that it is hard even for an optimized implementation of D to be superior to C, mostly due to the inherent redundancy of maintaining all existing attackers-sets where you can always retrieve the few relevant ones very cheaply with common bitboard technology.
As Bob pointed out, you are not nearly there, after that. After this retrieval you would have the moves, but you would not automatically have them in the correct order. So you would have to sort them in a different order. Which cannot be done with shifts and bitwise boolean operations. So you would have to extract them into a move list...
We should not forget that we would need clearly defined measurement rules. We could measure:
a) perft speed (same hardware, single-threaded, with bulk counting, no hash table);
b) NPS of a material+PST only tree search (same hardware, single-threaded, no hash table, no pruning etc.);
both for a variety of positions with given depths.
Well, perft is not a realistic measure of engine speed, as too much emphasis is on checking legality of the moves in the final ply, while engines usually do better with pseudo-legal moves. (b) is unrealistic because it leaves out mobility, which is one of the most expensive evaluation terms. So I would go for an engine with material + PST + mobility. Writing that based on incremental attackers sets (and whatever board representation is most convenient; I don't expect it to matter much) is on my to-do list.

But unfortunately that list is rather long...

You say that mobility is one of the most expensive evaluation terms. First, here is my mobility scoring code:

mobility = RookMobility(square, OccupiedSquares);

(replace Rook by Bishop or whatever piece you are interested in). What does this give?

a mobility score that gives 4 points for each center square attacked, 3 points for the squares adjacent to those, 2 points for the squares adjacent to those, and finally 1 point for squares on the edge of the board. All with exactly one simple lookup. The RookMobility() macro above expands to this:

# define BishopMobility(square, occ) *(magic_bishop_mobility_indices[square]+ ((((occ) & magic_bishop_mask[square]) * magic_bishop[square]) >> magic_bishop_shift[square]))

Square is the square the bishop stands on, occ is the set of all occupied squares. So take occupied squares, and with a bishop mask (which simply has 1's on the diagonals that bishop attacks) times a magic multiplier number. Then shift it to get the unique index bits and index into the array. Total mobility for bishop done, including the above calculations, since those scores can be produced at start-up time and then simply used when appropriate. I can't even measure the amount of time that takes in my eval. If I comment out all the mobility, I don't see any measurable NPS improvement. That is NOT where I spend most of my evaluation cycles...

Incremental update

Re: Incremental update

Re: Incremental update

Re: Incremental update

Re: Incremental update

Re: Incremental update

Re: Incremental update

Update-in-place vs copy-update

Re: Update-in-place vs copy-update

Re: Incremental update

Re: Incremental update