Hashing based on move lists

Discussion of chess software programming and technical issues.

Moderator: Ras

Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Hashing based on move lists

Post by Gerd Isenberg »

hgm wrote:
Gerd Isenberg wrote:maybe the color-flipper trick:

Code: Select all

zob (blackpiece, sq) := byteswap (zob (whitepiece, sq^56) );
Not sure what exactly byteswap does, but the chances that it transforms one valid attack set in another (causing two identical keys in the set) seems pretty large. Imagine for instance the attack set of a Rook. Swapping bytes in pairs, or swapping board halves would all give valid Rook attack sets.
Oups, sorry. That was intended as a joke, since the attacks from a vertically flipped square are already the byte swapped set (swapping bytes 0,7; 1,6; 2,5; 3,4).

https://blogs.oracle.com/DanX/entry/opt ... ng_for_fun
https://chessprogramming.wikispaces.com ... Vertically
hgm wrote: Unlike randomly chosen numbers, non-random keys like attack sets are very likely to transform into each other by simple operations on them.

Seldomly I have seen an idea this flawed, to solve a non-existing problem. It actually creates the (imagined) problem it was supposed to solve, but more than a million times worse.
Sure, after 1. Nd3c5 Nd7e5 or 1. Nd3e5 Nd7c5 there is already a 2-ply collision.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Hashing based on move lists

Post by mcostalba »

Gusev wrote:
mcostalba wrote: To moderation: Please be so kind to consider taking action against the the troll: it is many days already that he is dirtying up the threads. Normally this kind of people disappears after few days, but this one seems persistent so please evaluate about blocking the account.
To moderators: Matthew is NOT a troll!
I was not referring to Matthew, of course.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Hashing based on move lists

Post by Sven »

I see a couple of other severe problems of the GusevHashing approach, although I am not sure whether some of these have already been mentioned in this thread:

1) For the 64 bits of a hash key that is built from zobrist keys created through this approach, the probability of a bit being set is not evenly distributed, unlike the standard zobrist hashing approach based on (pseudo-)random numbers. There are several reasons for that, here are some examples:
- pawns are not able to attack squares on rank 1+2 (white) resp. 7-8 (black);
- the probability of central squares being part of any attack set is higher than for squares towards the borders and corners of the board;
- if one bishop is captured but the other of the same player remains on board then half of all squares are more likely to be attacked than the other half.

This leads to bad usage of the hash table since some parts of it will be used less frequently while other parts will get overwritten frequently.

2) From the 64 bits hash key a fixed subset must be used for the table index while the remaining bits (or again a subset of them, or redundantly the whole 64 bit key) are stored as signature in the hash entry. Since the GusevHashing approach essentially associates each hash key bit with a square (an attacked square when looking at attacks on an otherwise empty board for each single piece), the index part of the key will always be associated with a fixed set of squares on the board, e.g. a 24 bit index would be associated with the first three ranks a1 .. h3. But during real games, moves and their corresponding changes of attack sets are not always "randomly" distributed over the board, so it may happen frequently that certain areas of the board are much more affected than others.

Think of pawn endgames as an extreme example, here I believe that there are many positions resulting from subsequent moves of a game that map to exactly the same table index, or to only few different indices, since these moves do not affect the first three ranks. Other endgame types will suffer from similar problems.

3) For the same reason described in 2) there will also be an asymmetric behaviour when using the hash table: moves mainly affecting the non-index squares since they happen on the "black" side of the board (and have mostly "local" effects, as in case of pawn/knight/king moves, or when a rook moves up or down a rank) will rarely lead to a different table index while moves mainly affecting the index squares (happening on the "white" side) do not suffer from that.

Randomness avoids all of that!

Sven
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Hashing based on move lists

Post by Gerd Isenberg »

For the byteswap per se. I like the signature of a position equals the one of the color flipped position xor side2moveKey. "White" zobrists as usual, but black's swapped from flipped "white" squares:

Code: Select all

zob[blackPiece][sq^56] = byteswap(zob[whitePiece][sq] = rnd() );
If the usual TT probe fails, on may probe for the other side to possibly get a hit from the color flipped position, which gains additional hits in symmetrical positions. One issue with this is that symmetrical subkeys i.e. wRe2 bRe7 have obviously only 32 significant bits due to two byte reversed 32-bit dwords:

Code: Select all

  01020304|50607080  // white pieces
^ 80706050|04030201  // black pieces of a mirror position
= 81726354|54637281
Gusev
Posts: 1476
Joined: Mon Jan 28, 2013 2:51 pm

Re: Hashing based on move lists

Post by Gusev »

Gerd Isenberg wrote:
ZirconiumX wrote:
Gerd Isenberg wrote:
Gusev wrote:
ZirconiumX wrote:This is a chess-specific, very low collision (if not zero) hashing method.

This is basically a "do one thing and do it well" method.

Matthew:out
I don't know about zero, because it is impossible to describe a chess position uniquely using only 64 bits. The main idea is that we should be able to do better by controlling piece signatures directly, than by relying upon a pseudo-random number generator.
Sorry if I missed it, how do you distinguish same type white and black pieces (no pawns) which have same attack sets in the final 64-bit key?
He takes the complement (~attack_set) of the piece's attack set for black.

Matthew:out
Thanks, just found the init code in position.cpp, ahh yes, flip is complement and not byteswap. Complement is xor with -1, so each even number of black pieces xored, cancels the complement, hmm..., sounds dangerous.
This is a real danger! Indeed, a position with two white rooks collides with a position with two black rooks in the same exact locations. Same happens with two white knights vs. two black knights. However, this won't happen with pawns, because the "footprints" of pawns are asymmetric. The white pawns are "looking up", and the black pawns are "looking down". This suggests a possible fix. Let's make "footprints" of the other pieces artificially asymmetric so as to distinguish the white ones from the black ones by means of this asymmetry. Now, the resulting table of modified "footprints" will no longer have the originally intended dual use (for hash key generation and move generation), but that's not a problem.
Gusev
Posts: 1476
Joined: Mon Jan 28, 2013 2:51 pm

Re: Hashing based on move lists

Post by Gusev »

hgm wrote:Indeed, this is what I also remarked above, except that it was not clear to me that they are just using Zobrist hashing with hand-made keys. (Or rather, I could not believe they would propose something so obviously flawed.)
One idea is, if your keys are "hand-made", as you put it, and you have found a flaw with how you had made them, then you can relatively quickly come up with a way to get rid of the flaw. (Admittedly, that fix may sometimes introduce another flaw, so some trial-and-error cycle is guaranteed to take place, until most issues are hammered out.) But if you have decided to rely upon a good pseudo-random generator, then, even though you are already guaranteed to have good luck most of the time, there is no easy way to deal with the remaining bad luck. It just has to be accepted as the cost of doing business this way.

Another underlying idea is, not all collisions are created equal. Some of the positions that could be generated, given enough time, by trying different options for each square (empty, white pawn, back pawn, ..., white king, black king, en passant square where available) are simply illegal and/or unreachable. They will never occur in our search trees, so it doesn't matter if their hash keys collide with those of legal positions. Ideally, we would like to generate the keys so that legal reachable positions wouldn't collide with each other. Pseudo-randomness interferes with this intent.

The initial observation is, a human player does not have a collision problem, because each different piece is distinctly marked right where it is located. The implication of that being that the player recognizes where each piece could potentially move to and when its turns are. While we cannot achieve that with 64 bits, we can certainly separate the occupied squares from the unoccupied ones. This seems important, in order to ensure that when a piece moves, we won't arrive at a collision right away.

Then we can observe that we start out with only 32 pieces, so the board is half-empty to begin with, and then the number of pieces can only decrease as the game progresses, so all collisions with positions that have more than 16 pieces of any given color are inconsequential. From that, given that we are not allowed to have any additional marking at the piece's exact location, the temptation naturally arises to introduce extra marking around the location and make that marking vary from one piece type to another.

It's natural to correlate this outside marking (the piece's "halo") with how the piece can move/capture. The remaining problem to address is how white pieces can be distinguished from black ones with the same movement ability. We should be able to solve this problem by introducing some asymmetry into the "halo" (which, together with the piece's location, forms its "footprint").

The other observation is, no individual move introduces a board wide pattern (or a half-board wide pattern, for that matter). In fact, no natural sequence of moves is likely to "inject" a regular global pattern. Such patterns can be used to take care of castling rights and SideToMove.
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Hashing based on move lists

Post by lucasart »

ZirconiumX wrote:
mcostalba wrote:
ZirconiumX wrote: Please demonstrate this, since with my stupid eyes I cannot see anything.
For instance Ra1, Ba8 and Ra1, Bh1


To moderation: Please be so kind to consider taking action against the the troll: it is many days already that he is dirtying up the threads. Normally this kind of people disappears after few days, but this one seems persistent so please evaluate about blocking the account.
OK, talking to Dmitri confirms it was XOR, so my mistake.

With XOR, not OR, this doesn't collide.

Matthew:out
XOR is better, but you will still have collisions. What you propose is exactly like Zobrist hashing, except that instead of generating zobrist keys at random, you use attack bitboards for the zobrist keys.

Whether this scheme collides less than Zobrist (assuming a proper random generator with good entropy and non zero keys at least) is not clear.

I can't think of a trivial example to make this collide, but I diden't give it much thought. I'm sure it's possible. At least I see no reason why it would collide less than random Zobrist.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
lucasart
Posts: 3243
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: A brief comment on using 128 bit Zobrist hashing

Post by lucasart »

sje wrote:What is the expected false positive rate of 128 bit keys vs 64 bit keys? The answer depends on many factors, but my estimate is if a fast, multicore machine gets less than one false positive per day with 64 bit hashing, then moving to 128 bit hashing (reducing the error rate by about a factor of 2^32) means a false positive will occur at the rate of less than one per ten million years.

Cosmic rays causing random bit flips are much more of a threat.
Yes, 128 is useless, at least in an engine. 64 is easily enough. In fact some engines use 32, like Stockfish. I use 62 bits, which is reasonably conservative.

Next time DiscoCheck plays a bad move, I'll blame the Cosmic rays :lol:
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
Gusev
Posts: 1476
Joined: Mon Jan 28, 2013 2:51 pm

Re: Hashing based on move lists

Post by Gusev »

lucasart wrote:
ZirconiumX wrote:
mcostalba wrote:
ZirconiumX wrote: Please demonstrate this, since with my stupid eyes I cannot see anything.
For instance Ra1, Ba8 and Ra1, Bh1


To moderation: Please be so kind to consider taking action against the the troll: it is many days already that he is dirtying up the threads. Normally this kind of people disappears after few days, but this one seems persistent so please evaluate about blocking the account.
OK, talking to Dmitri confirms it was XOR, so my mistake.

With XOR, not OR, this doesn't collide.

Matthew:out
XOR is better, but you will still have collisions. What you propose is exactly like Zobrist hashing, except that instead of generating zobrist keys at random, you use attack bitboards for the zobrist keys.

Whether this scheme collides less than Zobrist (assuming a proper random generator with good entropy and non zero keys at least) is not clear.

I can't think of a trivial example to make this collide, but I diden't give it much thought. I'm sure it's possible. At least I see no reason why it would collide less than random Zobrist.
Zobrist merely tries to ensure that, for any pair of positions, we are as unlikely to experience a collision as possible. This is not a chess-specific approach. Intuitively, it is somewhat wasteful in its generality, as it is trying to take equal care of pairs of positions one or both of which is illegal/unreachable and pairs of legal reachable positions (the only ones of practical interest). I am trying to design chess-specific hashing "biased" so as to minimize collisions between legal reachable positions. In order to achieve that, pseudo-randomness has to be abandoned. Instead, "intelligent design" is needed, so to speak. I am not claiming that I have already achieved that. But each example that makes legal reachable positions collide can lead to improvement of the directly designed technique. In the meanwhile, bad luck shall remain bad luck, for it's inherent to the non-chess-specific Zobrist approach that employs pseudo-randomness.
syzygy
Posts: 5943
Joined: Tue Feb 28, 2012 11:56 pm

Re: Hashing based on move lists

Post by syzygy »

Gusev wrote:Zobrist merely tries to ensure that, for any pair of positions, we are as unlikely to experience a collision as possible. This is not a chess-specific approach. Intuitively, it is somewhat wasteful in its generality, as it is trying to take equal care of pairs of positions one or both of which is illegal/unreachable and pairs of legal reachable positions (the only ones of practical interest). I am trying to design chess-specific hashing "biased" so as to minimize collisions between legal reachable positions. In order to achieve that, pseudo-randomness has to be abandoned. Instead, "intelligent design" is needed, so to speak. I am not claiming that I have already achieved that. But each example that makes legal reachable positions collide can lead to improvement of the directly designed technique. In the meanwhile, bad luck shall remain bad luck, for it's inherent to the non-chess-specific Zobrist approach that employs pseudo-randomness.
I'm afraid you're trying to solve a problem that simply does not exist. Zobrist hashing works very well and certainly well enough that nothing can be gained by further minimising collisions. Any alternative scheme that is less efficient (in terms of cpu cycles and memory usage) is a sure loss, and it is difficult to see how a hashing scheme could be more efficient.