Post
by bob » Mon Aug 20, 2007 9:04 pm
Here's the concept as I use it and as it is used most everywhere that wants to use spinlocks. It is called a "shadow lock" where the name comes from keeping a "shadow copy" of the lock in your local cache.
You spin on a simple read, which comes from your cache and stays off the system bus. When someone sets the lock variable back to zero, your cache's bus snooping will detect that and your next fetch gets the new value. Unfortunately other caches could also get that zero value, which would violate the critical section concept. So once you do a normal fetch and get a zero, now you do an interlocked exchange to see if you get a zero this time around, while atomically setting it to 1 to exclude others. Should you be too slow and get a one from the interlocked exchange, you again go back to the normal cache fetch loop.
The issue is all about the bus. If you were to spinlock on the interlocked exchange, you absolutely crush the bus, and while you are doing nothing but tying up the bus with the exchanges, the thread trying to complete some real work and then exit the critical section can't make much progress because of all the bus conflicts
The idea of doing this has been around for at least 20 years, it isn't new.
the recent idea was to replace the simple idea:
while (exchange(lock, 1)) while (lock);
{note that you first try a interlocked exchange, if it fails, you spin on the normal cache fetch until it returns a zero where you then go back and try to really acquire the lock with the exchange. If the lock is frequently set, the interlocked exchange wastes a bit of time on the first cycle, so you could try the following}
while (lock);
while (exchange(lock,1)) while (lock);
There you first do a normal read and hang there until the lock goes to zero, then you enter the normal exchange lock loop until you actually acquire the lock. For cases where the lock is normally zero, the first case is faster. For cases where the lock is normally set (a bad case for parallel code of course) then the second is slightly more efficient, but the difference is _not_ that significant...
My spinlock has been used in the linux kernel since the beginning of its development. It was in the Sequent parallel library back in the middle 80's... It only applies to systems with cache, and bus snooping, obviously, so it didn't fit with the Crays until the T90 which did have cache unlike its predecessors. It has nothing to do with distributed computing, but is an issue in cache-based NUMA architectures.