diep wrote:bob wrote:diep wrote:bob wrote:diep wrote:zamar wrote:Multithreading. Typical size of hash table entry size is 128 bits.
Thread1.write(64bits)
Thread2.write(64bits)
Thread2.write(64bits)
Thread1.write(64bits)
leads to corrupted entry
First of all this can only happen when you have a parallel searching engine.
Did Rebel already get SMP?
Secondly this happens even more seldom than collissions at PC hardware as it can only happen with stores
Around 1 in 200 billion stores
In case Ed has his hashtable aligned and a multiple of it forms a cacheline, you can prove it cannot happen at a PC.
Note that in case of an i7 (not sandy bridge), it shows itself as having 64 bytes cacheline, however the memory controller only stores 192 bytes at a time. Sandy Bridge stores 256 bytes at once.
Has 4 memory channels.
Note that when reading it can give a premature abort reading less bytes, which is why it has such fast latency for us to the RAM.
It can abort after reading 32 bytes.
So the only way it can go wrong is when an entry is at the end of a cacheline, in this case 256 bytes and by accident 1 cacheline gets updated quicker than another.
Odds of this happening is real real tiny, especially if your hashtable is some gigabytes in size.
So how you represented it cannot happen as it doesn't write 64 bits. It writes 256 bytes at once.
It can happen far more frequently than that. This has been discussed in the past. With 8 threads (cores) writing pairs of 64 bit words, ordering gets mangled and you can store a position where the first 8 bytes is from position P1, and the second 8 is from position P2, leading to this. I've seen dozens per minute in some positions. with the lockless hashing I use, it never happens, however.
Bob you're making this story up.
You did not see 'dozens of write errors a minute.
I've been extensively testing this and you need dozens of billions of stores to get one.
Of course i have ECC machines here. Maybe you had a bad dimm during your tests or your memory again serves you bad?
Also realize Alpha is a HPC processor. Today we only have x64 cpu's which is a different architecture.
In between alpha and the memory controller there are several steps to connect to other memory. That step can cause the AbCD sometimes which is not possible at x86/x64 hardware as they directly connect to the RAM.
Even then this also happens VERY SELDOM at the alpha.
I had at the time contact with the technical design team of memory controllers to verify how much all this could occur at different processors under which Alpha and the R14000, and i got each time the same answers from the engineers.
Even then at x86 i've been testing this extensively and if i got 1 write error or 1 or 2 collissions that was a lot in every single test.
This testing ran for months in 2003.
So from all persons here seems i'm the only one who really measured the number of write errors and the number of collissions.
p.s. note that the alpha's later on up to 2007 even were used for serving storage. Petabyte level, that's where i worked at the time.
I didn't make ANYTHING up. Please feel free to download Crafty 22.9, which does NOT use lockless hashing at all. Compile it with -DCPUS=8, and then simply run it. look in next.c for the "bad move from hash" and remove the #if test that disables those two lines. Run with 8 cores and you will see dozens of those errors per minute, to tens of thousands if you use fine 70...
I don't know why you're posting this nonsense Bob.
I've already extensively tested this with a lockless hashtable. With a CRC you can easily measure this way faster than completely messing up the bus locking nonstop.
In Diep there are more collissions than write errors with a 64 bits hashkey!
So you're just making this story up.
Note that your normal crafty is using a lockless hashtable! Don't bring in other idiotic factors you're not using in practice! Changing the discussion here just to 'save face' is what they do in Asia Bob, over here you're busted.
Long before you wrote/posted about Tim Mann inventing the XOR manner of doing things lockless, others were already using a 8 bits CRC...
Making what up? I told you the version of Crafty to use. 22.9. I told you how to run it and see the bad move from hash table errors. You will see that in version 23.0, the lockless hashing was added back in, and the error completely goes away. The broken writes still happen, but now the signatures don't match and those positions become "hash sigs don't match, ignore" cases...
I had this problem on the first multiple-cpu Cray I used. It got progressively worse as we moved on to more and more cpus. when we hit the XMP with 4 cpus, we had to stop and find/fix the problem because at the time, we did not know but what it was a serious programming bug. Once we found the problem, we simply used an atomic semaphore on the Cray (very fast) to prevent two threads from accessing the hash table at exactly the same time. When we hit 16 cpus on the T90, we had to stop again and look at the fix because the single semaphore was becoming a bottleneck, that's when Harry suggested the "xor trick" that is now called "lockless hashing". It is not imaginary. The path through the search doesn't contain THAT many instructions. I'd guess 2500 or so per node based on old measurements. That means the probability of two cores executing the reads/writes at the same time is not vanishingly small, particularly when most HT accesses are cache misses and that tends to "stack" things up at the instructions reading/writing the same entry improving the odds quite a bit.
It happens. It is not imagination. It has been a known problem since 1984 for me when we first tested on the 4-cpu Cray XMP/4...
I agree that I DO use lockless hashing today. Do you know why? To address the SPECIFIC problem we are talking about. If the problem doesn't exist, then lockless hashing is completely irrelevant in the first place. I pointed you to an available version of Crafty that had the lockless hashing removed. It was removed as I rewrote lots of code in the 22.x versions. I forgot to add it back until I started using 22.9 extensively and while testing with the -DDEBUG flag set, noticed a lot of "bad move from hash table errors" that do not show up in non-DEBUG mode. You can compare the relevant parts of 22.9 and 23.0 (hash.c) to see that was the only change...
And then you can verify that 22.9 sees a ton of broken writes. Your claims of 1 in 10^20 is pure nonsense. Crafty uses about 2500 instructions to search a node. In those 2500 instructions, you find two points of hash access, once to look up a position, and once to store a position. two pairs of instructions in 2500. The probability of two threads executing the same pair of instructions at the same time is logically (4 / 2500) ^ 2. With 8 threads, multiply by 8. Not big but not small. Then factor in that when the first thread accesses the SAME hash bucket as the second is going to access, the first thread blocks for several thousand clocks while the cache block is filled. If the second thread accesses the same block, it sits and waits, even if it accesses it several hundred instructions AFTER the first one. The first is still waiting on memory, as will be the second. Now they are aligned and we are ready for a broken write if ANYTHING happens on either of the cpus, an interrupt, a pipeline stall, you name it.
This is not THAT hard to understand. And it is quite easy to verify...
As far as your alpha stuff goes, it appears to me you have not run on an alpha. They are NOTORIOUS for out-of-order memory writes, something X86 guarantees will not happen. They added the memory fence instruction to address this on the alpha, so that you could force the order when it is critical. For example, acquiring a lock, storing the tt entry, then clearing the lock is dangerous on the alpha, because the write that zeros the lock can be done BEFORE the writes to actually store the tt entry... One puts a mfence after the tt write, then clears the lock, and all is well...