Questions on SMP search

Discussion of chess software programming and technical issues.

Moderator: Ras

Roger Brown
Posts: 782
Joined: Wed Mar 08, 2006 9:22 pm

Re: Questions on SMP search

Post by Roger Brown »

bhlangonijr wrote:
SNIP

Moreover, I have a special feeling for GCC. I don't want to get rid of it. It is like asking me to use Chessbase instead of Winboard. I won't do this.
:lol:

Oh, the cold cockles of my heart have been warmed this day....

:-)

Later.
User avatar
marcelk
Posts: 348
Joined: Sat Feb 27, 2010 12:21 am

Re: Questions on SMP search

Post by marcelk »

marcelk wrote:
Houdini wrote: While I very much respect your person and your past work with Cray Blitz and Crafty, as far as Houdini concerns you just seem to be following the "it's a clone" song invented by people much less talented than yourself.
Are you claiming with this statement that Houdini is your original work?
Mr Houdart, would you mind giving a direct yes/no answer to the question you have dodged so many times before?
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Questions on SMP search

Post by Houdini »

Mr van Kervinck, please stop polluting Ben-Hur Carlos' topic about SMP search.
It's very annoying that a random person pops up in the middle of a technical discussion to ask an off-topic question to which he/she already knows the answer, with the sole purpose of launching another futile discussion. That is called trolling.

Robert
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Questions on SMP search

Post by Gian-Carlo Pascutto »

Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?

And what does "code prewired to its own internal board representation mean"?
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Questions on SMP search

Post by Houdini »

Gian-Carlo Pascutto wrote:
Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?

And what does "code prewired to its own internal board representation mean"?
The goal is to avoid passing the board representation structure as a parameter to each call. Each thread has its version of the code that uses a fixed board representation structure.
For example, instead of calling

Code: Select all

  Evaluate(Position, A, B, ...)
I use

Code: Select all

  Evaluate<ThreadID>(A, B, ...)
This generates slightly more efficient code.
(Obviously in the source code all this is done with C++ templates)

Robert
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Questions on SMP search

Post by mcostalba »

Houdini wrote: For example, instead of calling

Code: Select all

  Evaluate(Position, A, B, ...)
I use

Code: Select all

  Evaluate<ThreadID>(A, B, ...)
This generates slightly more efficient code.
(Obviously in the source code all this is done with C++ templates)

Robert
This means to instantiate at compile time for the max number of threads allowed, say 32, even if you are going to run in single CPU mode.....
bhlangonijr
Posts: 482
Joined: Thu Oct 16, 2008 4:23 am
Location: Milky Way

Re: Questions on SMP search

Post by bhlangonijr »

Houdini wrote:
Gian-Carlo Pascutto wrote:
Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?

And what does "code prewired to its own internal board representation mean"?
The goal is to avoid passing the board representation structure as a parameter to each call. Each thread has its version of the code that uses a fixed board representation structure.
For example, instead of calling

Code: Select all

  Evaluate(Position, A, B, ...)
I use

Code: Select all

  Evaluate<ThreadID>(A, B, ...)
This generates slightly more efficient code.
(Obviously in the source code all this is done with C++ templates)

Robert

I am not sure if it helps (substantially) as we usually pass a pointer to the board data structure (or in my case using C++ object reference). The cost of having one more pointer in the call stack might be almost zero.
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Questions on SMP search

Post by Houdini »

mcostalba wrote:This means to instantiate at compile time for the max number of threads allowed, say 32, even if you are going to run in single CPU mode.....
Correct, but that's hardly a problem with current hardware.
If needed I can disable this behaviour with a simple compilation switch, but I very rarely do.

Robert
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Questions on SMP search

Post by Gian-Carlo Pascutto »

Houdini wrote: This generates slightly more efficient code.
I'm not clear what exactly you save here. There is one push per function call, but in your case each called function still has to load the address from somewhere (even if it's the code bytes). The pop has to happen at function exit anyway. So the only advantage would be that you turn an indirect load into a direct load.

I'd expect your method to cause severe code cache pollution on CPUs where threads share the same L1 instruction cache (hyperthreading, AMD Bulldozer, GPU).

So this doesn't look like a clear-cut gain to me.

Note that if you really want to avoid passing the pointer that much, you can get the same effect via TLS.
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Questions on SMP search

Post by Houdini »

bhlangonijr wrote:I am not sure if it helps (substantially) as we usually pass a pointer to the board data structure (or in my case using C++ object reference). The cost of having one more pointer in the call stack might be almost zero.
Correct, it's a relatively small gain (less than 5%). In fact it's more efficient for 32-bit (where the compiler has fewer registers) than for 64-bit (where more registers are available to the compiler).
Certainly not something you should worry about when first implementing an SMP algorithm.

Robert