Questions on SMP search

Roger Brown · Post by **Roger Brown** » Tue Apr 26, 2011 2:59 am

bhlangonijr wrote:
SNIP

Moreover, I have a special feeling for GCC. I don't want to get rid of it. It is like asking me to use Chessbase instead of Winboard. I won't do this.

Oh, the cold cockles of my heart have been warmed this day....

Later.

marcelk · Post by **marcelk** » Tue Apr 26, 2011 7:59 am

marcelk wrote:
Houdini wrote: While I very much respect your person and your past work with Cray Blitz and Crafty, as far as Houdini concerns you just seem to be following the "it's a clone" song invented by people much less talented than yourself.
Are you claiming with this statement that Houdini is your original work?

Mr Houdart, would you mind giving a direct yes/no answer to the question you have dodged so many times before?

Houdini · Post by **Houdini** » Tue Apr 26, 2011 11:53 am

Mr van Kervinck, please stop polluting Ben-Hur Carlos' topic about SMP search.
It's very annoying that a random person pops up in the middle of a technical discussion to ask an off-topic question to which he/she already knows the answer, with the sole purpose of launching another futile discussion. That is called trolling.

Robert

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Apr 26, 2011 2:40 pm

Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.

I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?

And what does "code prewired to its own internal board representation mean"?

Houdini · Post by **Houdini** » Tue Apr 26, 2011 3:53 pm

Gian-Carlo Pascutto wrote:
Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?

And what does "code prewired to its own internal board representation mean"?

The goal is to avoid passing the board representation structure as a parameter to each call. Each thread has its version of the code that uses a fixed board representation structure.
For example, instead of calling

Code: Select all

  Evaluate(Position, A, B, ...)

I use

Code: Select all

  Evaluate<ThreadID>(A, B, ...)

This generates slightly more efficient code.
(Obviously in the source code all this is done with C++ templates)

Robert

mcostalba · Post by **mcostalba** » Tue Apr 26, 2011 4:24 pm

Houdini wrote: For example, instead of calling
Code: Select all
  Evaluate(Position, A, B, ...)
I use
Code: Select all
  Evaluate<ThreadID>(A, B, ...)
This generates slightly more efficient code.
(Obviously in the source code all this is done with C++ templates)

Robert

This means to instantiate at compile time for the max number of threads allowed, say 32, even if you are going to run in single CPU mode.....

bhlangonijr · Post by **bhlangonijr** » Tue Apr 26, 2011 4:40 pm

Houdini wrote:
Gian-Carlo Pascutto wrote:
Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?

And what does "code prewired to its own internal board representation mean"?
The goal is to avoid passing the board representation structure as a parameter to each call. Each thread has its version of the code that uses a fixed board representation structure.
For example, instead of calling
Code: Select all
  Evaluate(Position, A, B, ...)
I use
Code: Select all
  Evaluate<ThreadID>(A, B, ...)
This generates slightly more efficient code.
(Obviously in the source code all this is done with C++ templates)

Robert

I am not sure if it helps (substantially) as we usually pass a pointer to the board data structure (or in my case using C++ object reference). The cost of having one more pointer in the call stack might be almost zero.

Houdini · Post by **Houdini** » Tue Apr 26, 2011 5:02 pm

mcostalba wrote:This means to instantiate at compile time for the max number of threads allowed, say 32, even if you are going to run in single CPU mode.....

Correct, but that's hardly a problem with current hardware.
If needed I can disable this behaviour with a simple compilation switch, but I very rarely do.

Robert

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Apr 26, 2011 5:10 pm

Houdini wrote: This generates slightly more efficient code.

I'm not clear what exactly you save here. There is one push per function call, but in your case each called function still has to load the address from somewhere (even if it's the code bytes). The pop has to happen at function exit anyway. So the only advantage would be that you turn an indirect load into a direct load.

I'd expect your method to cause severe code cache pollution on CPUs where threads share the same L1 instruction cache (hyperthreading, AMD Bulldozer, GPU).

So this doesn't look like a clear-cut gain to me.

Note that if you really want to avoid passing the pointer that much, you can get the same effect via TLS.

Houdini · Post by **Houdini** » Tue Apr 26, 2011 5:10 pm

bhlangonijr wrote:I am not sure if it helps (substantially) as we usually pass a pointer to the board data structure (or in my case using C++ object reference). The cost of having one more pointer in the call stack might be almost zero.

Correct, it's a relatively small gain (less than 5%). In fact it's more efficient for 32-bit (where the compiler has fewer registers) than for 64-bit (where more registers are available to the compiler).
Certainly not something you should worry about when first implementing an SMP algorithm.

Robert

Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search