Sync question

hgm · Post by **hgm** » Thu Jul 23, 2009 1:14 pm

Gerd Isenberg wrote:Let say you have some data, for instance a cacheline, which is frequently probed by several threads (or processes), but very rarely (let say less than 0.01% of the reads) written by the same threads (or processes).

Wouldn't the following work?

Keep a counter that holds the 'issue' of the variable, and is incremented each time the variable is written, by the process that writes it, before it writes it.

All processes check this counter against their idea of the most recent issue, after they acquired a copy of the shared variable. If the counter is still unchanged, they know the copy they acquired was good. (This is the common case.)

Only when it finds the counter value changed, you do a full mutex on code for updating the counter + variable or acquiring the new counter value. Writing counter and variable would always be done inside the critical section.

Zach Wegner · Post by **Zach Wegner** » Thu Jul 23, 2009 6:07 pm

hgm wrote:Wouldn't the following work?

Keep a counter that holds the 'issue' of the variable, and is incremented each time the variable is written, by the process that writes it, before it writes it.

All processes check this counter against their idea of the most recent issue, after they acquired a copy of the shared variable. If the counter is still unchanged, they know the copy they acquired was good. (This is the common case.)

Only when it finds the counter value changed, you do a full mutex on code for updating the counter + variable or acquiring the new counter value. Writing counter and variable would always be done inside the critical section.

That will get race conditions. The writer can change the counter, and the reader can get this value and start to read while the writer is still writing.

bob · Post by **bob** » Thu Jul 23, 2009 6:40 pm

hgm wrote:
Gerd Isenberg wrote:Let say you have some data, for instance a cacheline, which is frequently probed by several threads (or processes), but very rarely (let say less than 0.01% of the reads) written by the same threads (or processes).
Wouldn't the following work?

Keep a counter that holds the 'issue' of the variable, and is incremented each time the variable is written, by the process that writes it, before it writes it.

All processes check this counter against their idea of the most recent issue, after they acquired a copy of the shared variable. If the counter is still unchanged, they know the copy they acquired was good. (This is the common case.)

Only when it finds the counter value changed, you do a full mutex on code for updating the counter + variable or acquiring the new counter value. Writing counter and variable would always be done inside the critical section.

Has no chance of being correct. Writes are done out of order. Reads and writes are done out of order. You can read the counter, write it back, read the value, and get the "value" before you have actually written the counter back to memory. Unless you use the "lock" prefix. And then you incur the usual atomicity overhead you are trying to avoid by not using a real lock in the first place.

There are also race conditions.

You read at some point a value V1 from the counter.

A second thread prepares to modify the value so it increments the counter and modifies the value.

Unfortunately, you read the counter before it is modified, and then don'tread the new value thinking it is unmodified value because your counter read happened before the counter was stored.

This is the way to a lifelong debugging assignment.

hgm · Post by **hgm** » Thu Jul 23, 2009 7:01 pm

Zach Wegner wrote:That will get race conditions. The writer can change the counter, and the reader can get this value and start to read while the writer is still writing.

Of course not. If the reader gets the changed counter it cannot start reading before the writer gets out of its critical section. The mutex will see to that.

bob · Post by **bob** » Thu Jul 23, 2009 7:05 pm

hgm wrote:
Zach Wegner wrote:That will get race conditions. The writer can change the counter, and the reader can get this value and start to read while the writer is still writing.
Of course not. If the reader gets the changed counter it cannot start reading before the writer gets out of its critical section. The mutex will see to that.

The whole point of the discussion was to eliminate the mutex. If you have that, you don't have this specific problem in the first place. But you do have a significant performance problem since this id being done frequently.

Zach Wegner · Post by **Zach Wegner** » Thu Jul 23, 2009 7:21 pm

hgm wrote:Of course not. If the reader gets the changed counter it cannot start reading before the writer gets out of its critical section. The mutex will see to that.

Well it was ambiguous anyways, so I'm not sure when the reader updates its counter and when it locks. Is this what you mean?

Code: Select all

write&#40;)
&#123;
  lock&#40;);
  counter++;
  // do write here
  unlock&#40;);
&#125;
read&#40;)
&#123;
  static int x = -1;
  if &#40;counter != x&#41; // changed
  &#123;
    lock&#40;);
    // do read here
    unlock&#40;);
  &#125;
  else
  &#123;
    // do read here
    if &#40;counter != x&#41; // corrupted data, reread with mutex
    &#123;
      lock&#40;);
      // do read again here
      unlock&#40;);
    &#125;
  &#125;
  x = counter;
&#125;

hgm · Post by **hgm** » Thu Jul 23, 2009 7:32 pm

bob wrote:Has no chance of being correct. Writes are done out of order. Reads and writes are done out of order. You can read the counter, write it back, read the value, and get the "value" before you have actually written the counter back to memory. Unless you use the "lock" prefix. And then you incur the usual atomicity overhead you are trying to avoid by not using a real lock in the first place.

There are also race conditions.

You read at some point a value V1 from the counter.

A second thread prepares to modify the value so it increments the counter and modifies the value.

Unfortunately, you read the counter before it is modified, and then don'tread the new value thinking it is unmodified value because your counter read happened before the counter was stored.

This is the way to a lifelong debugging assignment.

If out-of-order execution woulld work as you describe it, it would never be possible for any program to use the memory: there is never a guarantee that you read back what you write, because the write might not have been executed yet when you read.

Fortunately CPU designers have made out-of-order execution completely transparent, and have a lot of synchronization hardware in the load/store pipes to prevent effects like you describe. Reads and writes do not overtake each other this way.

With SMP there is the MESI protocol implemented in hardware to guarantee cache coherency. After one thread starts writing, the cache line is immediately invalidated in all other CPU caches, and there is no way any of them could start reading from it before it has reloaded that cache line, which cannot happen before the other CPU is done writing it, and ready to flush it to memory. You have to make sure that the counter is in the same cache line as the shared variable, of course.

There might indeed be a problem with out-of-order reads, although it is unlikely that they would be scheduled for reverse ordering if neither of them has to be delayed because of a data dependency (in the addres calculation). To make sure the read of the variable is done before that of the counter even in an out-of-order machine you can give the address calculation of the counter access a (false) dependency on the result of the load, e.g.

Code: Select all

myVariable = sharedVariable;
myCounter = *(&sharedCounter + &#40;0 & myVariable&#41;);

(making sure the optimizer does not try to outsmart you, by handcoding this in assembler).

This guarantees the loading of the counter cannot evens start before the copy of the variable is in a CPU register. (But with a modest delay, and without stalling the CPU pipe.)

hgm · Post by **hgm** » Thu Jul 23, 2009 7:38 pm

Zach Wegner wrote: Is this what you mean?

Nearly so. What I meant was:

Code: Select all

write&#40;)
&#123;
  lock&#40;);
  counter++;
  // do write here
  unlock&#40;);
&#125;
read&#40;)
&#123;
  static int x = -1;

 // do read here
  if &#40;counter != x&#41; // changed; read might be corrupted
  &#123;
    lock&#40;);
    // do read here
    x = counter;
    unlock&#40;);
  &#125;
&#125;

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Thu Jul 23, 2009 9:12 pm

If out-of-order execution woulld work as you describe it, it would never be possible for any program to use the memory: there is never a guarantee that you read back what you write, because the write might not have been executed yet when you read.

Fortunately CPU designers have made out-of-order execution completely transparent, and have a lot of synchronization hardware in the load/store pipes to prevent effects like you describe. Reads and writes do not overtake each other this way.

Don't confuse out-of-order execution in a single CPU with memory ordering guarantees for multiple CPUs.

hgm wrote: With SMP there is the MESI protocol implemented in hardware to guarantee cache coherency. After one thread starts writing, the cache line is immediately invalidated in all other CPU caches, and there is no way any of them could start reading from it before it has reloaded that cache line, which cannot happen before the other CPU is done writing it, and ready to flush it to memory. You have to make sure that the counter is in the same cache line as the shared variable, of course.

Even then, I see absolutely no reason why another CPU wouldn't be able to obtain the cache line before the second store is started.

The only way you could know that for sure is when the store would be atomic for the platform. For integer variables even in the same cacheline I there is no such guarantee that I'm aware of.

Of course not. If the reader gets the changed counter it cannot start reading before the writer gets out of its critical section. The mutex will see to that.

I see no reason why this would be true. How would the lock protect another variable than the lock itself? It won't...

hgm · Post by **hgm** » Thu Jul 23, 2009 9:29 pm

Gian-Carlo Pascutto wrote:Even then, I see absolutely no reason why another CPU wouldn't be able to obtain the cache line before the second store is started.

Indeed it could, but that is no problem. If it gets the line between the store of the counter and the store of the variable it sees a modified counter, and thus cannot get at the variable before its writing is completed. Such a hybrid cache line will suimply be discareded.

see no reason why this would be true. How would the lock protect another variable than the lock itself? It won't...

The lock protects anything that is modified only in the critical section, not? In effect it makes the entire critical section atomic.

Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question

Re: Sync question