Worst advice

Zenmastur · Post by **Zenmastur** » Sun Aug 16, 2015 5:19 am

bob wrote:I recently played with a 64 bit hash entry which left 24 bits for the signature. Since I was using 8gb of hash, that turned into 32 + 24 bits and it led me to discover a very tricky condition my legal move check failed to recognize, because there were enough collisions that the oddball case (had to do with check, where moving a piece was legal in one position but not in the one that collided). Took a long time for it to happen, running on 20 cores, until I began to get suspicious about the condition, then I found a position that would crash within 10 minutes or so and tracked it down. My legality check doesn't deal with the case of moving a piece that is pinned on the king since that can't happen, except on collisions... Have not fixed it, but gave up on 8 byte hash entries until I decide what to do. I have a quick "PinnedOnKing()" function for endgames, but did not want to make the legal move test any slower since it is a rare problem (no crashes in over 10M games in fact, until the shorter hash signature became a problem.

Hmmm...

A 64-bit entry is 8 bytes which means that there were just 1B entries in the table so with 24 bits of stored signature its 30+24 = 54 bits not 56 bits. I'm kind of curious, I've played what if with 64 bit TT entries, what exactly is the format of each entry you used?

Seems like a simple solution would be to add 8 more bits of stored signature to make it 30+32=62 bits and seven nine byte entries per bucket. Unless Crafty has problems with a 7 by 9 configuration. Or if you're really pinching bits you could use a 73 bit entry and still get 7 entries per bucket with 1 extra bit of signature stored for a total of 63-bits.

If you really want to pinch every last bit then you could use a 9-bit move, a 14-bit score (down to the centi-pawn level as a short integer). You still might get to a 64-bit entry depending on what else you are storing.

bob wrote:The gain for doubling is not a constant. It certainly drops off once the table holds everything needed, assuming a decent replacement policy. Or for the trivial case of going from 2 to 4 entries which will give nothing measurable at all today. A key point is where you are at 1/2 the optimal. Doubling can give a significant Elo gain. You can probably find some old threads on r.g.c.c dealing with this. It is a sort of normal distribution, where you get less until you get into the key zone for number of entries, then Elo jumps quite a bit, then starts to improve less with additional size. I found cases where ttable size could double the search time when it gets too small.

You can probably provide interesting information by matching lower 64 and whole signature and reporting when they don't agree... I did a similar test back in the 90's, but I stored the actual position as 32 bytes, to see how often a collision occurred. With 64 bits back then, it was extremely rare, maybe once every day of searching (24 hours) or so. Not today however.

bob wrote:Note that on common hardware I can hit 14 billion nodes in reasonable games. IE 1B nodes in 10 seconds. 10B in 100 seconds. And I have run on better hardware where speeds were beyond 150M nodes per second... And this in real chess, not perft which runs quicker.

I'm curious about the “better” hardware. Mostly for scaling reasons. The biggest and baddest Intel based system I can think of (single machine only – not a cluster) is 8 X E7-8890v2 (15-cores per) running at 3.4Ghz. I'm wondering how well crafty would scale on such a machine or any other chess engine for that matter.

Regards,

Zen

bob · Post by **bob** » Sun Aug 16, 2015 5:54 am

Zenmastur wrote:
bob wrote:I recently played with a 64 bit hash entry which left 24 bits for the signature. Since I was using 8gb of hash, that turned into 32 + 24 bits and it led me to discover a very tricky condition my legal move check failed to recognize, because there were enough collisions that the oddball case (had to do with check, where moving a piece was legal in one position but not in the one that collided). Took a long time for it to happen, running on 20 cores, until I began to get suspicious about the condition, then I found a position that would crash within 10 minutes or so and tracked it down. My legality check doesn't deal with the case of moving a piece that is pinned on the king since that can't happen, except on collisions... Have not fixed it, but gave up on 8 byte hash entries until I decide what to do. I have a quick "PinnedOnKing()" function for endgames, but did not want to make the legal move test any slower since it is a rare problem (no crashes in over 10M games in fact, until the shorter hash signature became a problem.
Hmmm...

A 64-bit entry is 8 bytes which means that there were just 1B entries in the table so with 24 bits of stored signature its 30+24 = 54 bits not 56 bits. I'm kind of curious, I've played what if with 64 bit TT entries, what exactly is the format of each entry you used?

Seems like a simple solution would be to add 8 more bits of stored signature to make it 30+32=62 bits and seven nine byte entries per bucket. Unless Crafty has problems with a 7 by 9 configuration. Or if you're really pinching bits you could use a 73 bit entry and still get 7 entries per bucket with 1 extra bit of signature stored for a total of 63-bits.

If you really want to pinch every last bit then you could use a 9-bit move, a 14-bit score (down to the centi-pawn level as a short integer). You still might get to a 64-bit entry depending on what else you are storing.

bob wrote:The gain for doubling is not a constant. It certainly drops off once the table holds everything needed, assuming a decent replacement policy. Or for the trivial case of going from 2 to 4 entries which will give nothing measurable at all today. A key point is where you are at 1/2 the optimal. Doubling can give a significant Elo gain. You can probably find some old threads on r.g.c.c dealing with this. It is a sort of normal distribution, where you get less until you get into the key zone for number of entries, then Elo jumps quite a bit, then starts to improve less with additional size. I found cases where ttable size could double the search time when it gets too small.

You can probably provide interesting information by matching lower 64 and whole signature and reporting when they don't agree... I did a similar test back in the 90's, but I stored the actual position as 32 bytes, to see how often a collision occurred. With 64 bits back then, it was extremely rare, maybe once every day of searching (24 hours) or so. Not today however.

bob wrote:Note that on common hardware I can hit 14 billion nodes in reasonable games. IE 1B nodes in 10 seconds. 10B in 100 seconds. And I have run on better hardware where speeds were beyond 150M nodes per second... And this in real chess, not perft which runs quicker.
I'm curious about the “better” hardware. Mostly for scaling reasons. The biggest and baddest Intel based system I can think of (single machine only – not a cluster) is 8 X E7-8890v2 (15-cores per) running at 3.4Ghz. I'm wondering how well crafty would scale on such a machine or any other chess engine for that matter.

Regards,

Zen

I used a 12 bit best move (from/to only, the rest can be discovered at time of need), a 2 bit flag for EXACT, LOWER, UPPER, 16 bits for score, 7 bits for draft, 24 bits for "lock", 3 bits for the "age". I believe those are correct. I'll have to find the source to be sure.. I didn't like the 12 bit move format which had side-effects in other places that added a little complexity to the code. I might experiment further at some point, but the gain is pretty minimal overall. Current version is 12 bytes per entry like Cray Blitz, where previous versions have been 16 bytes per entry due to laziness.

I was alarmed at the number of bad moves from hash table I was seeing, 99.999% of which I recognized and did not try to use. But there were enough that the check bug slipped through and would cause a crash.

For the 1B stuff, at the time I was testing with 32-64gb for hash, which was 4-8 billion entries, 32 or 33 bits depending.

I thought it was something interesting to try, since Rybka apparently used that according to Vas/etc, and apparently SF has used it as well. I didn't see any particular gain however, since I don't hash in q-search and don't overly stress the hash table. On the 20 core box I use regularly, current version reports this:

Crafty v25.0 (20 cpus)

White(1): hash=64g
Warning-- xboard 'memory' option disabled
hash table memory = 64G bytes (5.4B entries).
White(1):

Seems to average 90-100M nodes per second although it does drop to 70-80M in simpler endings, and I have seen 300M in forced mates. The "better hardware" I referenced was a 2x16 box we were evaluating for a new cluster we are buying this year. I only had a chance to run remotely on the thing and did not really pay attention to the specific CPU/clock speed.

As far as how Crafty scales, I don't have enough data. The 20 core box seems to hit 11-12x faster easily. I didn't have time to run any serious tests (those take days for 1 and 32 cores) but a few positions produced 18-20x... Be nice if we end up with those for cluster nodes.

sje · Post by **sje** » Mon Aug 17, 2015 5:38 am

After 30+ billion probes, it's shown that 61 bits are not enough to reject false positive matches:

Code: Select all

New LSB match high watermark&#58; 61
PETBase&#58; items&#58; 536,870,912   probe&#58; 30,040,061,804   match&#58; 16,464,963,520   stash&#58; 13,616,742,771   usage&#58; 536,870,912   load&#58; 1

Note how the match/probe ratio hovers near 55%.

sje · Post by **sje** » Tue Aug 18, 2015 12:31 am

After 50+ billion probes, it's shown that 62 bits are not enough to reject false positive matches:

Code: Select all

New LSB match high watermark&#58; 62
PETBase&#58; items&#58; 536,870,912   probe&#58; 50,992,485,067   match&#58; 28,091,597,700   stash&#58; 22,970,564,361   usage&#58; 536,870,912   load&#58; 1

stegemma · Post by **stegemma** » Tue Aug 18, 2015 3:22 pm

Henk wrote:When you read all these comments there seems to be many ideas and recommendations to improve your chess engine.

For me worst idea was to use bitboards for move generation. For it was difficult to implement, gave horrible bugs and no speed up.

So question is: What was the worst idea you implemented ?

My worst idea was to change the compression algorithm used to store audio-video lesson in my business software: I've lost the audio of old lessons so now I have to work to patch this, instead to improve my chess engine!

Worst advice

Re: After 14+ billion probes

Re: After 14+ billion probes

After 30+ billion probes

After 50+ billion probes

Re: Worst advice