multithreading questions

wgarvin · Post by **wgarvin** » Mon Aug 13, 2007 5:15 pm

bob wrote:Synchronizing multiple updates (also known as potential interleaved updates) is a tricky thing to manage. And the failure to do it properly leads to more parallel programming bugs than all other issues put together. Second most common problem is not understanding the difference between:

volatile int x;
volatile int *x;
int * volatile x;
volatile int * volatile x;

Until those are completely understood to the point of becoming trivial, bugs will abound.

The volatile qualifier can be used to do some amazing things. Check out this 2001 article in DDJ for example.
And Dr. Hyatt is right of course, unless you use volatile correctly you might as well have not used it at all.

Guetti · Post by **Guetti** » Mon Aug 13, 2007 6:12 pm

wgarvin wrote:
bob wrote:Synchronizing multiple updates (also known as potential interleaved updates) is a tricky thing to manage. And the failure to do it properly leads to more parallel programming bugs than all other issues put together. Second most common problem is not understanding the difference between:

volatile int x;
volatile int *x;
int * volatile x;
volatile int * volatile x;

Until those are completely understood to the point of becoming trivial, bugs will abound.
The volatile qualifier can be used to do some amazing things. Check out this 2001 article in DDJ for example.
And Dr. Hyatt is right of course, unless you use volatile correctly you might as well have not used it at all.

To not use volatile at all might be the best solution.
Here is what David Butenhof, the author of "Programming with POSIX Threads", says about volatile:

Don't ever use the C/C++ language volatile in threaded code. It'll kill your performance, and the language definition has nothing to do with what you want when writing threaded code that shares data.

http://osequal.blogspot.com/2006/06/vol ... yword.html

As I understand it, if you have organized the critical sections right in your code, you don't need volatile at all.

bob · Post by **bob** » Mon Aug 13, 2007 8:47 pm

Guetti wrote:
wgarvin wrote:
bob wrote:Synchronizing multiple updates (also known as potential interleaved updates) is a tricky thing to manage. And the failure to do it properly leads to more parallel programming bugs than all other issues put together. Second most common problem is not understanding the difference between:

volatile int x;
volatile int *x;
int * volatile x;
volatile int * volatile x;

Until those are completely understood to the point of becoming trivial, bugs will abound.
The volatile qualifier can be used to do some amazing things. Check out this 2001 article in DDJ for example.
And Dr. Hyatt is right of course, unless you use volatile correctly you might as well have not used it at all.
To not use volatile at all might be the best solution.
Here is what David Butenhof, the author of "Programming with POSIX Threads", says about volatile:

Don't ever use the C/C++ language volatile in threaded code. It'll kill your performance, and the language definition has nothing to do with what you want when writing threaded code that shares data.

http://osequal.blogspot.com/2006/06/vol ... yword.html

As I understand it, if you have organized the critical sections right in your code, you don't need volatile at all.

Ignore that completely. Yes, one can do concurrency without using volatile. No, one can't use shared memory without using volatile. So without volatile, you have to resort to message passing approaches, which is simply way too much overhead in a chess engine.

Volatile is a critical tool, regards of the statements made on the blog you referenced. Ask someone with lots of actual _experience_ about this topic, and you will get a completely different answer.

bob · Post by **bob** » Mon Aug 20, 2007 7:48 pm

Pradu wrote:
Gerd Isenberg wrote:But isn't it possible that one thread never gets the ressource because some ather spinning thread always takes the chance to write a "one" between dirty read a "zero" and lock xchg?
Yes it is possible, but maybe this will be the same thing as you never getting the lock because other threads always gets the locked exchange before you. Maybe there's some way to come up with a spin-lock to ensure fairness among threads. Perhaps a wait-queue? Maybe implementing that will be too expensive.

This is unimportant. This just means that multiple threads are trying to get into a critical section, how would it matter which one gets the lock if the code is written correctly? If the contention for the lock is that serious, performance is already in the crapper since at least one thread would have to always be waiting there to live-lock another thread out of the critical section...

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Aug 20, 2007 9:22 pm

bob wrote:
Pradu wrote:
Gerd Isenberg wrote:But isn't it possible that one thread never gets the ressource because some ather spinning thread always takes the chance to write a "one" between dirty read a "zero" and lock xchg?
Yes it is possible, but maybe this will be the same thing as you never getting the lock because other threads always gets the locked exchange before you. Maybe there's some way to come up with a spin-lock to ensure fairness among threads. Perhaps a wait-queue? Maybe implementing that will be too expensive.
This is unimportant. This just means that multiple threads are trying to get into a critical section, how would it matter which one gets the lock if the code is written correctly? If the contention for the lock is that serious, performance is already in the crapper since at least one thread would have to always be waiting there to live-lock another thread out of the critical section...

Ok - meant as a pathological race between one thread using "dirty" read plus conditional lock exchange versus other threads polling with lock exchange only.

Is the "dirty" read before the lock exchange a hyper-threading relict - or is it beneficial with multiple cores also? To safe the write cycles if entering often barred sections - assuming many threads on different processors are requesting this semaphore.

Pradu · Post by **Pradu** » Mon Aug 20, 2007 9:40 pm

Gerd Isenberg wrote:Ok - meant as a pathological race between one thread using "dirty" read plus conditional lock exchange versus other threads polling with lock exchange only.

Is the "dirty" read before the lock exchange a hyper-threading relict - or is it beneficial with multiple cores also? To safe the write cycles if entering often barred sections - assuming many threads on different processors are requesting this semaphore.

It's benificial with multiple cores. Sune Fischer kindly tested the posted code on his Core2 Quad and it ran twice as fast with the dirty read when compliled 32-bits with the 3.x mingw (GCC) compiler with -O3.

bob · Post by **bob** » Mon Aug 20, 2007 11:04 pm

Here's the concept as I use it and as it is used most everywhere that wants to use spinlocks. It is called a "shadow lock" where the name comes from keeping a "shadow copy" of the lock in your local cache.

You spin on a simple read, which comes from your cache and stays off the system bus. When someone sets the lock variable back to zero, your cache's bus snooping will detect that and your next fetch gets the new value. Unfortunately other caches could also get that zero value, which would violate the critical section concept. So once you do a normal fetch and get a zero, now you do an interlocked exchange to see if you get a zero this time around, while atomically setting it to 1 to exclude others. Should you be too slow and get a one from the interlocked exchange, you again go back to the normal cache fetch loop.

The issue is all about the bus. If you were to spinlock on the interlocked exchange, you absolutely crush the bus, and while you are doing nothing but tying up the bus with the exchanges, the thread trying to complete some real work and then exit the critical section can't make much progress because of all the bus conflicts

The idea of doing this has been around for at least 20 years, it isn't new.

the recent idea was to replace the simple idea:

while (exchange(lock, 1)) while (lock);

{note that you first try a interlocked exchange, if it fails, you spin on the normal cache fetch until it returns a zero where you then go back and try to really acquire the lock with the exchange. If the lock is frequently set, the interlocked exchange wastes a bit of time on the first cycle, so you could try the following}

while (lock);
while (exchange(lock,1)) while (lock);

There you first do a normal read and hang there until the lock goes to zero, then you enter the normal exchange lock loop until you actually acquire the lock. For cases where the lock is normally zero, the first case is faster. For cases where the lock is normally set (a bad case for parallel code of course) then the second is slightly more efficient, but the difference is _not_ that significant...

My spinlock has been used in the linux kernel since the beginning of its development. It was in the Sequent parallel library back in the middle 80's... It only applies to systems with cache, and bus snooping, obviously, so it didn't fit with the Crays until the T90 which did have cache unlike its predecessors. It has nothing to do with distributed computing, but is an issue in cache-based NUMA architectures.

bob · Post by **bob** » Mon Aug 20, 2007 11:06 pm

Pradu wrote:
Gerd Isenberg wrote:Ok - meant as a pathological race between one thread using "dirty" read plus conditional lock exchange versus other threads polling with lock exchange only.

Is the "dirty" read before the lock exchange a hyper-threading relict - or is it beneficial with multiple cores also? To safe the write cycles if entering often barred sections - assuming many threads on different processors are requesting this semaphore.
It's benificial with multiple cores. Sune Fischer kindly tested the posted code on his Core2 Quad and it ran twice as fast with the dirty read when compliled 32-bits with the 3.x mingw (GCC) compiler with -O3.

There's something wrong with that test. The only way you could get that kind of difference is with a 1 instruction critical section and you have enough threads so that the lock is _always_ set to 1 by one thread or another. In chess you won't be able to measure the difference.

bob · Post by **bob** » Mon Aug 20, 2007 11:07 pm

POSIX threads already provides that facility. You can run multiple threads on one physical CPU if you want to. Of course there would be no speedup, and the overhead would actually slow you down some to boot.

Oliver · Post by **Oliver** » Thu Aug 23, 2007 9:53 am

bob wrote: ...
Any time a shared value (a value visible to more than one processor) is updated (and can potentially be updated by more than one thread simultaneously) some form of atomic locking is required. The classic "shadow lock" used in Crafty (borrowed from the Linux kernel source) is the best known way to do this when raw efficiency is important.

...

What is the classic "shadow lock"? I couldn't find anything about that on the net.

TYVMIA
oliver

multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions

Re: multithreading questions