bob wrote:diep wrote:wgarvin wrote:Well, that was sure an entertaining thread to read!
All the sensible people on one side of the argument, and Vincent on the other.
...No prizes for guessing which side is smoking crack.
Storing 16 bytes with two or more instructions is never atomic, you have no control over when the writes reach the cache or main memory or even other cores. Cache misses, paging or thread switching can make the gap between two reads or two writes arbitrarily long.
Even if its a single 16-byte write done with an SSE instruction, even aligned on a 16-byte boundary, it might not be atomic. Reads of 16 bytes with two or more instructions are obviously also not atomic, and even a 16-byte read done with a single SSE instruction, aligned on a 16-byte boundary, might not be atomic. L1 cache and write-combining buffers do not magically make wide accesses atomic, as Ronald (syzygy) has repeatedly pointed out, some other core could RFO the cacheline at any time.
Unless you always read AND write the entire entry with one atomic operation, you are pretty much guaranteed to get screwed by race conditions. Probably sooner rather than later.
Fortunately, as hgm and others have noted, Bob's lockless hashing trick costs almost nothing, and completely solves the problem.
Don't BS over here. I already posted weeks ago here a reply adding chances.
You'll see a problem can only occur 1 in 10^20 occasions then.
A write error of this kind, the Abcd type - but you guys seem to ignore what i wrote there, without consequences, happens once in each 10^10.
So i guess most here don't even realize what i mean with AbCD as being a bad error that CAN have consequences, which however has odds 1 in 10^20.
Yet because most guys over here are not reading very well - they seem to ignore all this.
Just add chances to it. I posted this already with an example calculation.
http://www.talkchess.com/forum/viewtopi ... 906150f9f9
I guess however for you guys who didn't measure at all, if you smoke crack that you are not very good in explaining the statistical chance something can occur.
The way to measure this is very simple. Make a small 8 bits CRC of your entry prior to storing, then store all bytes (so don't put bunches of instructions between each store).
now you can simply check the CRC at each READ of each entry and write down all write errors you detect.
Is this so hard to perform that test?
You can measure all this easily - which is what i did do.
So who's talking BS here?
Would you like me to give you the names of a couple of engineers at Intel that can explain the problem to you so that you will understand it? The problem is real. your math is imaginary. The data I posted is real. The problem was real on the Cray. It was real on the alpha. It is real on AMD/Intel as well.
Why you don't grasp that is beyond me. Study how cache lines bounce from CPU to CPU with MOESI (AMD) and MESIF (Intel). It is not THAT hard to understand this problem. There are no 128 bit atomic stores.
BTW, if I comment out that line that just updates the age on a probe match, here's what the output now looks like:
starting thread 5
starting thread 6
starting thread 7
44 1.55 -3 1. Kb1
bad move from hash table, ply=41
44 1.82 -M 1. Kb1
bad move from hash table, ply=49
44 2.99 0.01 1. Kb1 Kb7 2. Kb2 Ka8 3. Kc2 Kb8 4.
Kd3 Kc7 5. Ke2 Kd7 6. Kf2 Ke8 7. Kg3
Kf7 8. Kh4 Kg6 9. Kh3 Kf6 10. Kh4 Kg6
44-> 3.00 0.01 1. Kb1 Kb7 2. Kb2 Ka8 3. Kc2 Kb8 4.
Kd3 Kc7 5. Ke2 Kd7 6. Kf2 Ke8 7. Kg3
Kf7 8. Kh4 Kg6 9. Kh3 Kf6 10. Kh4 Kg6
45 3.45 +1 1. Kb1!!
45 3.61 +3 1. Kb1!!
bad move from hash table, ply=41
45 3.97 +M 1. Kb1!!
45 4.35 6.74 1. Kb1 Kb7 2. Kc1 Kc7 3. Kd1 Kd8 4.
Kc2 Kc7 5. Kd3 <HT>
45-> 4.35 6.74 1. Kb1 Kb7 2. Kc1 Kc7 3. Kd1 Kd8 4.
Kc2 Kc7 5. Kd3 <HT>
46 4.68 +1 1. Kb1!!
46 5.91 +3 1. Kb1!!
bad move from hash table, ply=45
bad move from hash table, ply=47
bad move from hash table, ply=43
bad move from hash table, ply=45
bad move from hash table, ply=44
bad move from hash table, ply=42
46 7.67 +M 1. Kb1!!
bad move from hash table, ply=44
bad move from hash table, ply=42
bad move from hash table, ply=42
bad move from hash table, ply=44
bad move from hash table, ply=44
bad move from hash table, ply=42
bad move from hash table, ply=44
bad move from hash table, ply=44
bad move from hash table, ply=50
bad move from hash table, ply=48
bad move from hash table, ply=48
bad move from hash table, ply=46
bad move from hash table, ply=46
bad move from hash table, ply=46
bad move from hash table, ply=44
bad move from hash table, ply=42
bad move from hash table, ply=42
bad move from hash table, ply=46
bad move from hash table, ply=46
bad move from hash table, ply=46
bad move from hash table, ply=46
b
as I said, that is NOT the problem. Splitting the two stores is the problem, and there is absolutely nothing that prevents the two stores from being done separately by the CPU. And THAT leads to trouble, as this simple test STILL shows.
But keep waving your hands. When you want to talk to an engineer at Intel, I'll be happy to put you in touch with a couple that I happen to know there. Since you refuse to read their docs which explain this problem quite clearly.