bhlangonijr wrote:
SNIP
Moreover, I have a special feeling for GCC. I don't want to get rid of it. It is like asking me to use Chessbase instead of Winboard. I won't do this.
Oh, the cold cockles of my heart have been warmed this day....
Later.
Moderator: Ras
bhlangonijr wrote:
SNIP
Moreover, I have a special feeling for GCC. I don't want to get rid of it. It is like asking me to use Chessbase instead of Winboard. I won't do this.
Mr Houdart, would you mind giving a direct yes/no answer to the question you have dodged so many times before?marcelk wrote:Are you claiming with this statement that Houdini is your original work?Houdini wrote: While I very much respect your person and your past work with Cray Blitz and Crafty, as far as Houdini concerns you just seem to be following the "it's a clone" song invented by people much less talented than yourself.
I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
The goal is to avoid passing the board representation structure as a parameter to each call. Each thread has its version of the code that uses a fixed board representation structure.Gian-Carlo Pascutto wrote:I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
And what does "code prewired to its own internal board representation mean"?
Code: Select all
Evaluate(Position, A, B, ...)Code: Select all
Evaluate<ThreadID>(A, B, ...)This means to instantiate at compile time for the max number of threads allowed, say 32, even if you are going to run in single CPU mode.....Houdini wrote: For example, instead of callingI useCode: Select all
Evaluate(Position, A, B, ...)This generates slightly more efficient code.Code: Select all
Evaluate<ThreadID>(A, B, ...)
(Obviously in the source code all this is done with C++ templates)
Robert
Houdini wrote:The goal is to avoid passing the board representation structure as a parameter to each call. Each thread has its version of the code that uses a fixed board representation structure.Gian-Carlo Pascutto wrote:I could see the point in splitting off R/W data to avoid contention. But cacheable read-only things like code? What is the design rationale behind that?Houdini wrote: Houdini's SMP code uses a special technique in which each thread runs its version of the code prewired to its own internal board representation. In other words, each thread uses its own code operating on its own memory segment.
And what does "code prewired to its own internal board representation mean"?
For example, instead of callingI useCode: Select all
Evaluate(Position, A, B, ...)This generates slightly more efficient code.Code: Select all
Evaluate<ThreadID>(A, B, ...)
(Obviously in the source code all this is done with C++ templates)
Robert
Correct, but that's hardly a problem with current hardware.mcostalba wrote:This means to instantiate at compile time for the max number of threads allowed, say 32, even if you are going to run in single CPU mode.....
I'm not clear what exactly you save here. There is one push per function call, but in your case each called function still has to load the address from somewhere (even if it's the code bytes). The pop has to happen at function exit anyway. So the only advantage would be that you turn an indirect load into a direct load.Houdini wrote: This generates slightly more efficient code.
Correct, it's a relatively small gain (less than 5%). In fact it's more efficient for 32-bit (where the compiler has fewer registers) than for 64-bit (where more registers are available to the compiler).bhlangonijr wrote:I am not sure if it helps (substantially) as we usually pass a pointer to the board data structure (or in my case using C++ object reference). The cost of having one more pointer in the call stack might be almost zero.