Really?bob wrote:You are not going to see a 10% penalty. Probably 1% at worst. I was originally afraid that passing the TREE pointer around in Crafty would be a 10% penalty, but when I finished, there was no significant speed difference at all. So you don't have to worry about speed,Carey wrote:Big quote snipped. I know you like to quote everything, but to me it gets in the way of reading what is currently being said.
I think I understand that part. I've read yours (and others) posts in here about that.bob wrote: The way you handle memory is almost independent of whether you decide to use threads or processes. In modern unix systems, both will work the same way, where all instruction pages are shared, and data that is initialized prior to creating the new processes or threads, and then never modified (magic data, rotated bitboard data, etc) is also shared and not duplicated.
So what you are left with is the data that you either want to share with other threads/processes, or data you want to keep private from other threads or processes.
The simplest solution is to create a large block of memory that contains your private data, but create N of them. GIve each thread/process a pointer to its private data and they will not interact with each other at all. For data that is shared and modified, you create one large block of memory and give all threads/processes a pointer to that data, and then you take proper care by using atomic locks. It will be likely that some of your private data will also be shared with processes helping you search that sub-tree, so you will need locks to limit that access as well... Other types of "private" data will work themselves out simply, but how depends on whether you use threads or processes.
Right. I'm pretty sure I understand that.
That's just basically creating a big struct to hold the private data and pass that pointer to every routine. (Or some similar way.) Just like creating a C++ class for each thread, except the C++ nature saves you some notational effort about passing the data pointer, etc.
Or with processes you do the global stuff first, fork, then use special memory calls to look into the parent's memory space.
I think I understand the basics. I haven't done it, but I think I understand the idea.
The problems I have with that is that it seems 'clunky' and it hurts performance.
It just seems like a kludge doing it that way. It just doesn't feel 'right' manually handling all the data like that.
I know, going multicore will be a win over all, but it still seems fundamentally wrong to choose a method that will have 10% or so performance penalty when you aren't multi-core.
That's part of my problem. It's not so much that I can't just pick a method and make it work. I believe I could do threads or processes or even seperate external programs and 'make it work'.
It's that none of these seem to feel right. They all feel like they are a 'work around'. Like a kludge. Like I'm forcing a solution onto a problem and the fit isn't quite right.
And when something doesn't feel right, then there is something wrong somewhere. Either with the program or the tools or your own mental process.
(I guess I should have written my original message a little clearer about that part. But my only excuse is that it was late at night and I was about to go to bed.)
I had just been experimenting to see what kind of code changes I could / should do and picked the old move generator as being tolerably representative while being easiest to convert for my tests.
I would expect the evaluator & (Un)MakeMove to have similar results.
By that time that's covering most of the cost of the program. (At least for my program it would be.)
Of course, that is with a simple mailbox program. Eventually that'll get switched to bitboard. But I would think that would have nearly as much penalty because bitboard operations are more 'atomic' than mailbox is, hence the pointer cost would be a higher percentage.
Maybe I ought to hack in some of the other parts of my old program and compare what performance differences they have when I use a pointer or a class for local data.
Maybe the mailbox move generator isn't the most representative.
I guess that's my project for the next few days...
I'm not too concerned about speed. Getting the last 0.001% of performance. I'm way too old for that kind of cycle counting. I stopped being able to do that with the 8 bit micros. The last time I tried was with some numerical programs and I ran into the extreme variability among processors and memory architectures.and actually should not be at this point anyway.
But I don't want to pick a method that is inherently going to be slow, either.
I'm all for simplistic & clean code... (grin)You simply want to find the simplest way to share what is needed because this kind of programming is non-trivial and anything you can do to simplify the design will reduce the number of bugs later...
That's why I actually took some code and started playing around. Just thinking about what I could / should do wasn't worth the time I had spent on it.
I didn't want to do anything more complicated than I had to.
Maybe it is a change in thinking that's part of my problem.Here the approach that I use, which works with either threads or processes by the way, seems effective enough, and bearable in terms of effort required for results obtained... You are going to have independent tasks running since you will be searching different parts of the tree in parallel anyway, and that will begin to feel "natural" after a while...
It just seemed like most of the ideas I came up were more of a kludge than a solution.
Oh, one question. How do you handle the hash table with multiple search threads?
Do you share them or does each thread (or process) get its own table?
I would assume you share it. Threads just access it directly and processes would use some system call to map it into its own address space?