Random reads from huge GB hash-tables burn a lot of cycles, since it is likely a non cached memory read. I am not sure with recent processors (core2 and nehalem), but it may be hundreds of cycles, or two to three times the main memory latency due to tlb issues. This might be nice for hyperthreading, but it is a reason why many programs don't hash huge tables in qsearch.henkf wrote:Whoah... I'm a member since 1998 ( give or take a few years ), so maybe it's a bit late to introduce myself. I'm a professional business application programmer and I have always worked with high level programming languages ( 4GL's as they used to be called ). The only speed optimization I had ever to worry about where adding indexes to ( or refactoring ) slow queries. I'm a C/C++ autodidact and I am already happy I mastered the syntax.Gerd Isenberg wrote:Ideal to generate attack sets from scratch is while "waiting" for a hash-probe, for instance x64 prefetch-instruction combined with SSE2 fill-stuff to compute sliding attacks.
Although I always find your posts interesting, I am not even close to the level where one would start to understand a small part of them.
Anyway I'm in the designing phase and didn't even think about hashing yet. But when I will come to it, I will put the probe right before the generation of the attack tables
One may try Dieter Bürßner's dblat to get an idea.
With my current 64-program on an amd K8, and hashing everywhere, using prefetch (as intrinsic C-Function) or not, is something like 1.7 versus 1.5 mnps. As Matthias said this requires some fun and lust with low-level-stuff, understanding generated assembly, simd-istructions (but not necessary snoob - same number of one bits . I use directionwise fill-algorithms to do a lot of pure 128-bit register computation with only rare memory writes of the generated attacks (for both sides) and 16 direction-wise legal-move target bitboards as an unsorted move-list.
As always, exponential improvements and implementing the "right" knowledge in search and eval is much more important (not to mention a good parallel search), than a few percent linear speedup. In my old Dos-program I did incremental update of attack tables and related stuff during make/unmake. While it was quite nice most of the time, performance dropped significantly in queen endings.