Zobrist alternative?

Don · Post by **Don** » Thu Jun 14, 2012 6:46 pm

hgm wrote:Normally we construct a hash key by adding (or XORing) a number of Zobrist keys from a two-dimensional table key[coloredPieceType][square]. Now is the board gets large, and the number of piece types gets large too, this table would becoe intolarably large. For instance, in Taikyoku Shogi the board is 36x36 (1296 squares), and each side has 209 piece types (with somewhat less promoted types). That would make some 700,000 keys, which for a 64-bit key would require 5.6MB! That would not even fit the L2 or L3 cache! (OK, perhaps it can be compressed to 1 byte per key, by overlapping them, but that still nearly fills the L2 cache of the smaller CPUs....)

Do there exist less memory-hungry techniques for producing a hash key? E.g. if I would factorize the individual keys,

key[coloredPieceType][square] = key1[coloredPieceType]*key2[square]

so that I could do that multiplication during key update, would that result in any undesirable effects? I could not think of any obvious disadvantage, provided the key1 and key2 arrays are thoroughly random, with a range much larger than the number of pieces or squares (so that any pair of keys is extremely unlikely to have the same difference).

I'd much rather use a 10.4KB and a 4.8KB table, and do a multiplication in MakeMove, than having to access a 5.6MB table. Multiplications are pretty cheap, nowadays!

Clearly you want something that is incremental even more than you would for smaller boards. So I suggest you still use zobrist, but instead of a table of random numbers you use a simple hash function to produce the individual key elements - assuming you are not willing to eat the memory which is what this thread is about.

So zobrist_table[ colorType ] [ square ]

is replaced by the function zobFunc( colorTyp, square ) which is really a hash function on it's 2 arguments.

There are many fast hash functions that will work - an I don't think they have to be exceptional good hash functions - you just need to pick very fast ones that are good enough. You could have a table of constants, one for each square and multipy them to colorType to produce your key - I don't know how well that would work or whether the constants would have to be carefully picked or not - but for any given move application you would have to call this routine only a couple of times since it's still zobrist hashing at the outer level.

Don

wgarvin · Post by **wgarvin** » Thu Jun 14, 2012 8:11 pm

hgm wrote:I measured the penalty for such a cache-line-straddling access, and it was very low. (On Intel Celeron M; I would not know about AMD.) Within the cache line there was no penalty whatsoever.

But with a 0x88 board, the key table can be positioned in such a way that the cache line is never exceeded. (By starting the board at a cache line, the 4th rank of the board would be at address 48-55, and even the access for h4 would span 55-62. a5 would map to the start of the next cache line.

Of course on other architectures (ARM?) such tricks are likely to cause exceptions ('bus error'). i386 is really exceptional in allowing non-aligned access.

The other thing you can do (as long as you have less than about 56 piece types) is use one cacheline per square (or per two squares), and add pieceNumber (or perhaps 2*pieceNumber) to the address. E.g. for standard chess with piece numbers in the 1-12 range or so, you could use something like (sq*32 + pieceNumber*2) and you'll never cross a cache-line boundary. The table is a little bigger (maybe 1/3rd of the size of a table that uses no overlapping? can't remember off the top of my head) but it still helps.

If you really need to have more than 55 piece types, maybe its worth skipping some of the piece numbers. I.e. use piece numbers 1-56 and 65-120, etc. but don't use the numbers 57..64. Then you can always index into your zobrist table without crossing a cacheline boundary.

Don · Post by **Don** » Thu Jun 14, 2012 8:38 pm

Don wrote:
hgm wrote:Normally we construct a hash key by adding (or XORing) a number of Zobrist keys from a two-dimensional table key[coloredPieceType][square]. Now is the board gets large, and the number of piece types gets large too, this table would becoe intolarably large. For instance, in Taikyoku Shogi the board is 36x36 (1296 squares), and each side has 209 piece types (with somewhat less promoted types). That would make some 700,000 keys, which for a 64-bit key would require 5.6MB! That would not even fit the L2 or L3 cache! (OK, perhaps it can be compressed to 1 byte per key, by overlapping them, but that still nearly fills the L2 cache of the smaller CPUs....)

Do there exist less memory-hungry techniques for producing a hash key? E.g. if I would factorize the individual keys,

key[coloredPieceType][square] = key1[coloredPieceType]*key2[square]

so that I could do that multiplication during key update, would that result in any undesirable effects? I could not think of any obvious disadvantage, provided the key1 and key2 arrays are thoroughly random, with a range much larger than the number of pieces or squares (so that any pair of keys is extremely unlikely to have the same difference).

I'd much rather use a 10.4KB and a 4.8KB table, and do a multiplication in MakeMove, than having to access a 5.6MB table. Multiplications are pretty cheap, nowadays!
Clearly you want something that is incremental even more than you would for smaller boards. So I suggest you still use zobrist, but instead of a table of random numbers you use a simple hash function to produce the individual key elements - assuming you are not willing to eat the memory which is what this thread is about.

So zobrist_table[ colorType ] [ square ]

is replaced by the function zobFunc( colorTyp, square ) which is really a hash function on it's 2 arguments.

There are many fast hash functions that will work - an I don't think they have to be exceptional good hash functions - you just need to pick very fast ones that are good enough. You could have a table of constants, one for each square and multipy them to colorType to produce your key - I don't know how well that would work or whether the constants would have to be carefully picked or not - but for any given move application you would have to call this routine only a couple of times since it's still zobrist hashing at the outer level.

Don

P.S. I see my suggestion is not so different than what you suggest in your initial post. So the only thing I contributed here was a framework for looking at it. I believe that factorizing it like you suggest and multiplying to 2 pieces together is sound but I don't know that for sure. Multiplication is used similar to a hash function in magic bit boards but the constants are chosen carefully. On the other hand that is done to provide a perfect hash so maybe it's not a requirement here.

Maybe this is a way of asking the question:

Given a constant source of truly random number pairs, can the product of them pass any test of randomness? It's assumed of course that we are ignoring the overflows.

Don

Daniel Shawul · Post by **Daniel Shawul** » Thu Jun 14, 2012 8:57 pm

You are rehashing stuff that I suggest, did a frequency test with,and then suggested "addition" instead of "multiplication" as a solution since that will keep the uniform distribution, and is also faster. Glancing over other people's posts will save you from this trouble ...

Don · Post by **Don** » Fri Jun 15, 2012 4:39 pm

Daniel Shawul wrote:You are rehashing stuff that I suggest, did a frequency test with,and then suggested "addition" instead of "multiplication" as a solution since that will keep the uniform distribution, and is also faster. Glancing over other people's posts will save you from this trouble ...

XOR is clearly wrong as a sub-hash for the pice/square as everything would cancel but addition as you say may suffice.

I asked a computer science professor at MIT about this and he suggested that a mixing function should be tested if you want the most performance. He did not say it would be faster, but that it might be and it should be checked out. The idea of using addition as a hash function requires 2 indirect memory accesses to tables which could be avoided with some clever mixing function. Personally, I don't think any performance difference would be noticeable but I prefer avoiding having even more data structures when it can be done without giving up performance.

hgm · Post by **hgm** » Fri Jun 15, 2012 5:46 pm

Well, this is more a space issue than a speed issue anyway. In these insanely large variants even running the move generator or making an evaluation is extremely expensive compared to updating the Zobrist key, and usually you do at least one or the other after a key update. So it doesn,'t matter so much if it takes 5 or 10 clock cycles to generate the key. As long as you don't overflow L2/L3 with the key tables, so that it would require a costly RAM access, the gain that can be achieved is pretty marginal.

Of course incremental techniques become extremely competitive, with such large boards. Because the scale as the board circumference, rather than area. This is what made it interesting for me. So I plan to have incrementally updated mobility (total and for each piece separately) as well as an attack map for each occupied square. On 36x36, even scanning all 4 rays through the from- and the to-square edge-to-edge to see which moves were blocked / unblocked would only take a fraction of the time needed to scan the entire board or run through the piece list. (Let alone when you would then have to generate all moves for each piece you encounter...)

Of course even such incremental techniques cannot help that the branching ratio will be huge, so hefty LMR is probably called for to reach any depth at all. And search explosion in QS could be a concern in the presence of Lions or pieces with similar 'double-move' power, which is another fundamental challenge.

Daniel Shawul · Post by **Daniel Shawul** » Fri Jun 15, 2012 7:04 pm

Don wrote:
Daniel Shawul wrote:You are rehashing stuff that I suggest, did a frequency test with,and then suggested "addition" instead of "multiplication" as a solution since that will keep the uniform distribution, and is also faster. Glancing over other people's posts will save you from this trouble ...
XOR is clearly wrong as a sub-hash for the pice/square as everything would cancel but addition as you say may suffice.

I asked a computer science professor at MIT about this and he suggested that a mixing function should be tested if you want the most performance. He did not say it would be faster, but that it might be and it should be checked out. The idea of using addition as a hash function requires 2 indirect memory accesses to tables which could be avoided with some clever mixing function. Personally, I don't think any performance difference would be noticeable but I prefer avoiding having even more data structures when it can be done without giving up performance.

Well the mix function has to generate two good hash keys for piece and square from scratch. We assumed that those keys are available but how to combine them so that it won't mess up the zobrist hashing. I looked at some integer hashing functions with mixing and they all are complex because they have to "mix" the upper bits with the lower bits. The simple Knuth integer hash function that uses multiplication with golden number has the problem that multiplying the upper bits don't affect the lower ones.

Code: Select all

int mix(int a, int b, int c)
{
  a=a-b;  a=a-c;  a=a^(c >>> 13);
  b=b-c;  b=b-a;  b=b^(a << 8); 
  c=c-a;  c=c-b;  c=c^(b >>> 13);
  a=a-b;  a=a-c;  a=a^(c >>> 12);
  b=b-c;  b=b-a;  b=b^(a << 16);
  c=c-a;  c=c-b;  c=c^(b >>> 5);
  a=a-b;  a=a-c;  a=a^(c >>> 3);
  b=b-c;  b=b-a;  b=b^(a << 10);
  c=c-a;  c=c-b;  c=c^(b >>> 15);
  return c;
}

So I think it is better to generate two small size tables for piece/square rather than trying to generate the keys by a hash function performance wise. The zobrist keys for standard chess are as big as these ones so that is a "proof" tables are OK for this purpose. Note that the keys are also 64 bits which require more operations to generate.

Daniel Shawul · Post by **Daniel Shawul** » Fri Jun 15, 2012 7:22 pm

Well, this is more a space issue than a speed issue anyway. In these insanely large variants even running the move generator or making an evaluation is extremely expensive compared to updating the Zobrist key, and usually you do at least one or the other after a key update. So it doesn,'t matter so much if it takes 5 or 10 clock cycles to generate the key. As long as you don't overflow L2/L3 with the key tables, so that it would require a costly RAM access, the gain that can be achieved is pretty marginal.

Well quality of keys (hash collisions) is another issue. The penalty of using a mixing hash function could be hefty.

Of course incremental techniques become extremely competitive, with such large boards. Because the scale as the board circumference, rather than area. This is what made it interesting for me. So I plan to have incrementally updated mobility (total and for each piece separately) as well as an attack map for each occupied square. On 36x36, even scanning all 4 rays through the from- and the to-square edge-to-edge to see which moves were blocked / unblocked would only take a fraction of the time needed to scan the entire board or run through the piece list. (Let alone when you would then have to generate all moves for each piece you encounter...)

Of course even such incremental techniques cannot help that the branching ratio will be huge, so hefty LMR is probably called for to reach any depth at all. And search explosion in QS could be a concern in the presence of Lions or pieces with similar 'double-move' power, which is another fundamental challenge.

I think for this game qsearch is too much. In nebiyu I don't do qsearch in multi-player games and suicide chess variants. I have a small tactical analysis layer for games like Go that only considers ladders and other typical tactical patterns. Other people use some kind of zonal analysis instead of full board swaps which is time consuming.

hgm · Post by **hgm** » Fri Jun 15, 2012 7:38 pm

Well, the 12x12 game (Chu Shogi) is close enough to normal Chess that a good QS will probably pay off enormously. The concern is the Lion piece, however. Two opposing Lions have even higher potential to cause a search explosion by going on a plunder raid than two Queens, and I can't see MVV/LVA ordering could put an effective stop to that. (As it does with Queens.) Because the Lions can make 'hit-and-run' captures with their double moves, with many opportunities to 'run' to a safe square after the capture, from where they can aim at the next victim of the raid. In fact they can move back to the starting square if that was a safe one, and annihilate all their neighbors from there one by one, in many different orders. So probably something special would have to be programmed to put a stop to that.

Don · Post by **Don** » Fri Jun 15, 2012 8:00 pm

Daniel Shawul wrote:
Don wrote:
Daniel Shawul wrote:You are rehashing stuff that I suggest, did a frequency test with,and then suggested "addition" instead of "multiplication" as a solution since that will keep the uniform distribution, and is also faster. Glancing over other people's posts will save you from this trouble ...
XOR is clearly wrong as a sub-hash for the pice/square as everything would cancel but addition as you say may suffice.

I asked a computer science professor at MIT about this and he suggested that a mixing function should be tested if you want the most performance. He did not say it would be faster, but that it might be and it should be checked out. The idea of using addition as a hash function requires 2 indirect memory accesses to tables which could be avoided with some clever mixing function. Personally, I don't think any performance difference would be noticeable but I prefer avoiding having even more data structures when it can be done without giving up performance.
Well the mix function has to generate two good hash keys for piece and square from scratch. We assumed that those keys are available ....

You are essentially creating a hash function that replaces the 5.6 megabyte table HG would need otherwise. So here is the prototype:

uint64_t hash( int cp, int square); // where cp is the color/piece

Presumably all the information is in the lower bits. Speed is very important if this is to beat the two table lookups so one possibility (which has to be tested, it's just an exmple) is to use something similar to the FNV hash function:

hash = any_number;
hash = hash ^ cp;
hash = hash * FNV;
hash = hash ^ square;
hash = hash * FNV;

FNV is a 64 bit constant known to work very well for such things.

I'm not explicitly mixing here but this will be pretty fast and avoid a lot of source code clutter with tables of random numbers.

The only difference in this and the actual FNV hash function is that you are supposed to do this in a loop 1 byte at a time. I think HG is doing this with 3 bytes of actual data so he could unroll the loop and use the actual FNV hash function which looks like this:

Code: Select all

#define OFB 14695981039346656037ull
#define FNV 1099511628211

uint64_t  dataHash(char *dta, int size)
{
  uint64_t  hash = OFB;
  int       i;

  for (i=0; i<size; i++) {
    hash = hash * FNV;
    hash = hash ^ dta[i];
  }

  return hash;
}

but how to combine them so that it won't mess up the zobrist hashing. I looked at some integer hashing functions with mixing and they all are complex because they have to "mix" the upper bits with the lower bits. The simple Knuth integer hash function that uses multiplication with golden number has the problem that multiplying the upper bits don't affect the lower ones.
Code: Select all
int mix(int a, int b, int c)
{
  a=a-b;  a=a-c;  a=a^(c >>> 13);
  b=b-c;  b=b-a;  b=b^(a << 8); 
  c=c-a;  c=c-b;  c=c^(b >>> 13);
  a=a-b;  a=a-c;  a=a^(c >>> 12);
  b=b-c;  b=b-a;  b=b^(a << 16);
  c=c-a;  c=c-b;  c=c^(b >>> 5);
  a=a-b;  a=a-c;  a=a^(c >>> 3);
  b=b-c;  b=b-a;  b=b^(a << 10);
  c=c-a;  c=c-b;  c=c^(b >>> 15);
  return c;
}
So I think it is better to generate two small size tables for piece/square rather than trying to generate the keys by a hash function performance wise. The zobrist keys for standard chess are as big as these ones so that is a "proof" tables are OK for this purpose. Note that the keys are also 64 bits which require more operations to generate.

Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?

Re: Zobrist alternative?