Stockfish and optimizing options for SMP search

Milos · Post by **Milos** » Tue May 10, 2011 12:59 am

bob wrote:All of that is simply 100% nonsense. But don't let that stop you. If I thought you had even a tiny bit of 'understanding of scientific approach and method' our conversations might take a different path. But you butt in with nonsense, then after getting kicked around, you leave with your tail tucked. repeatedly. This will be another one of those cases, probably sooner rather than later...

You are the only one here writing nonsense.
Things like:

in reality, once you have searched one move, you almost _always_ want to split there. Except for the occasional case where you think another move might be best, and know that it is better to search such moves one at a time, using all available threads.

just prove that you have no clue what a scientific method is.
Unfortunately (for the education) there are university professors like that. And that is exactly what makes a difference between excellent and mediocre university programs, not the students that are attending those...

Daniel Shawul · Post by **Daniel Shawul** » Tue May 10, 2011 1:37 am

And how do you think engines have increased hundreds of ELO in the last 20 years ?

Hmm..by not making random stuff up ?!

Hint: not thanks to neural networks

Here is a link just in case you want to stay on topic http://www.valavan.net/mthesis.pdf

Excuse me if I don't buy into your "This is a great idea, that is a great idea, let me throw in my fantasic idea"

bob · Post by **bob** » Tue May 10, 2011 4:18 am

Milos wrote:
bob wrote:All of that is simply 100% nonsense. But don't let that stop you. If I thought you had even a tiny bit of 'understanding of scientific approach and method' our conversations might take a different path. But you butt in with nonsense, then after getting kicked around, you leave with your tail tucked. repeatedly. This will be another one of those cases, probably sooner rather than later...
You are the only one here writing nonsense.
Things like:
in reality, once you have searched one move, you almost _always_ want to split there. Except for the occasional case where you think another move might be best, and know that it is better to search such moves one at a time, using all available threads.
just prove that you have no clue what a scientific method is.
Unfortunately (for the education) there are university professors like that. And that is exactly what makes a difference between excellent and mediocre university programs, not the students that are attending those...

So you _really_ don't understand what is going on? If so, you would realize that (a) the above is the basis for YBW. The difference at the root is that you _do_ get a pretty good hint that a second-best, or third-best move might become the best move, by remembering the node count for each root move from the previous iteration. In a one-best-move root position, you might see the first move with a node count of 25x to 100x or even more compared to the second-best move. But in a position where a couple of moves are "close" those two moves will have a node count way higher than the other moves. Of course, had you been doing this stuff, rather than trying to become a legend in your own mind, you would have either (a) discovered this yourself or (b) find that this is the way Crafty splits at the root and is one of the reasons its parallel search is highly effective.

In your weak mind, "scientific method" is apparently "whatever you say is fact, whatever anyone says is junk." Not exactly the "accepted definition." I actually run these kinds of tests, by the hundreds of thousands. You might try testing rather than posting. You will learn more.

BTW, for two examples of my "nonsense:"

example 1, one clear best move at the root, but a couple of other moves are threatening to replace that move:

Code: Select all

       move       nodes     R/P         
       Bxe4    15396401     0 0
         d5     9510064     0 0
        Nh5     5469105     0 0
       Nxe4     1912204     0 1
        Bd5      952451     0 1
        Ne5      866838     0 1
       Rfd8      519152     0 1
         e5      393799     0 1
        Nc5      140114     1 1
        Qb8      132144     1 1
        Bc6      126426     1 1
        Ng4      104296     1 1
        Nd5      103276     1 1
        Qc5      100265     1 1
        Kh7       86073     1 1
         g6       82893     1 1
        Ra8       77554     1 1
       Rfe8       74845     1 1
         g5       69785     1 1
       Qxc4       63719     1 1
         h5       54724     1 1
        Nb8       52359     1 1
        Ba8       48347     1 1
        Ba6       45300     1 1
        Ne8       43563     1 1
        Qd8       42437     1 1
        Nh7       41668     1 1
       Rcd8       39294     1 1
       Rce8       38811     1 1
        Rb8       38811     1 1
        Bd8       33900     1 1
        Kh8       29350     1 1
        Qc6       15577     1 1
      total    36705545

First column is a move at the root, second column is the nodes in that sub-tree only. The next two are flags the first says reduce if 1, else not, the second says "this move can be searched in parallel with others if 1, else it has to be searched by itself by all processors."

One clear best move. Two potential replacements. I reduce neither at the root, although I reduce others unless something else causes some to not be reduced, and I sic all cpus on each of those 3 moves, one at a time, to avoid giving one of them to one cpu where the search could take an abnormally long time.

(the above is kopec #22, Bxe4 is the proposed best move...)

Here's another one from an old Belle vs Cray Blitz game. One clear best move, Bxh6. Check the node counts and flags.

Code: Select all

       move       nodes     R/P
       Bxh6   184605726     0 0
        Bf4    24200673     0 1
        Qd6    22641716     0 1
         a3    13915861     0 1
         b4     3124291     0 1
         b3     2711741     0 1
         h4     2112311     0 1
         a4     1003096     1 1
       Qxb6      433634     1 1
        Bd2      346845     1 1
        Rb1      312422     1 1
        Qg6      299091     1 1
        Be3      286610     1 1
        Qc6      215689     1 1
         g4      196193     1 1
        Qg4      154844     1 1
        Qe7      138909     1 1
        Bg5      125354     1 1
        Qd5       71011     1 1
        Qb3       44163     1 1
        Qf7       43624     1 1
      Qxh6+       25994     1 1
         g3       24814     1 1
        Qc8       24591     1 1
        Qc4       23722     1 1
       Qg8+       19774     1 1
       Qxe5       17483     1 1
        Qd7       15264     1 1
        Qf5       11680     1 1
        Qe8       10041     1 1
        Qf6        8902     1 1
      total   257166069

It can search all (but the first move) in parallel, which is classic YBW. A few of the first group have node counts high enough to mark them "interesting" so they do not get reduced, but they do get searched in parallel to improve search efficiency.

Too bad you don't understand the basic mechanics. But that is just one of many things you don't understand. now feel free to once again, tuck your tail and run away. This is not nonsense. It is a _very_ sound approach, with supporting mathematics proving that it works...

BTW, I suppose that the "you think" that you highlighted was a bit beyond your ability to grasp. The program is making these decisions. Based on the ratio between two node counts, the first move and any one of the others. If the ratio is above a carefully tested threshold, that move can be searched in parallel with others that meet that same test. If the ratio is below the threshold, that move is classified as a "potential new best move" and it is searched by itself, using all processors.

Again, just like the roof at your house or wherever you live, that is also over your head. But others might get it and want to give it a try. It _does_ work. It was developed scientifically and with careful thought and accurate testing...

Milos · Post by **Milos** » Tue May 10, 2011 5:06 am

bob wrote:So you _really_ don't understand what is going on? If so, you would realize that (a) the above is the basis for YBW. The difference at the root is that you _do_ get a pretty good hint that a second-best, or third-best move might become the best move, by remembering the node count for each root move from the previous iteration. In a one-best-move root position, you might see the first move with a node count of 25x to 100x or even more compared to the second-best move. But in a position where a couple of moves are "close" those two moves will have a node count way higher than the other moves. Of course, had you been doing this stuff, rather than trying to become a legend in your own mind, you would have either (a) discovered this yourself or (b) find that this is the way Crafty splits at the root and is one of the reasons its parallel search is highly effective.

In your weak mind, "scientific method" is apparently "whatever you say is fact, whatever anyone says is junk." Not exactly the "accepted definition." I actually run these kinds of tests, by the hundreds of thousands. You might try testing rather than posting. You will learn more.

BTW, for two examples of my "nonsense:"

example 1, one clear best move at the root, but a couple of other moves are threatening to replace that move:
Code: Select all
       move       nodes     R/P         
       Bxe4    15396401     0 0
         d5     9510064     0 0
        Nh5     5469105     0 0
       Nxe4     1912204     0 1
        Bd5      952451     0 1
        Ne5      866838     0 1
       Rfd8      519152     0 1
         e5      393799     0 1
        Nc5      140114     1 1
        Qb8      132144     1 1
        Bc6      126426     1 1
        Ng4      104296     1 1
        Nd5      103276     1 1
        Qc5      100265     1 1
        Kh7       86073     1 1
         g6       82893     1 1
        Ra8       77554     1 1
       Rfe8       74845     1 1
         g5       69785     1 1
       Qxc4       63719     1 1
         h5       54724     1 1
        Nb8       52359     1 1
        Ba8       48347     1 1
        Ba6       45300     1 1
        Ne8       43563     1 1
        Qd8       42437     1 1
        Nh7       41668     1 1
       Rcd8       39294     1 1
       Rce8       38811     1 1
        Rb8       38811     1 1
        Bd8       33900     1 1
        Kh8       29350     1 1
        Qc6       15577     1 1
      total    36705545
First column is a move at the root, second column is the nodes in that sub-tree only. The next two are flags the first says reduce if 1, else not, the second says "this move can be searched in parallel with others if 1, else it has to be searched by itself by all processors."

One clear best move. Two potential replacements. I reduce neither at the root, although I reduce others unless something else causes some to not be reduced, and I sic all cpus on each of those 3 moves, one at a time, to avoid giving one of them to one cpu where the search could take an abnormally long time.

(the above is kopec #22, Bxe4 is the proposed best move...)

Here's another one from an old Belle vs Cray Blitz game. One clear best move, Bxh6. Check the node counts and flags.
Code: Select all
       move       nodes     R/P
       Bxh6   184605726     0 0
        Bf4    24200673     0 1
        Qd6    22641716     0 1
         a3    13915861     0 1
         b4     3124291     0 1
         b3     2711741     0 1
         h4     2112311     0 1
         a4     1003096     1 1
       Qxb6      433634     1 1
        Bd2      346845     1 1
        Rb1      312422     1 1
        Qg6      299091     1 1
        Be3      286610     1 1
        Qc6      215689     1 1
         g4      196193     1 1
        Qg4      154844     1 1
        Qe7      138909     1 1
        Bg5      125354     1 1
        Qd5       71011     1 1
        Qb3       44163     1 1
        Qf7       43624     1 1
      Qxh6+       25994     1 1
         g3       24814     1 1
        Qc8       24591     1 1
        Qc4       23722     1 1
       Qg8+       19774     1 1
       Qxe5       17483     1 1
        Qd7       15264     1 1
        Qf5       11680     1 1
        Qe8       10041     1 1
        Qf6        8902     1 1
      total   257166069
It can search all (but the first move) in parallel, which is classic YBW. A few of the first group have node counts high enough to mark them "interesting" so they do not get reduced, but they do get searched in parallel to improve search efficiency.

Too bad you don't understand the basic mechanics. But that is just one of many things you don't understand. now feel free to once again, tuck your tail and run away. This is not nonsense. It is a _very_ sound approach, with supporting mathematics proving that it works...

BTW, I suppose that the "you think" that you highlighted was a bit beyond your ability to grasp. The program is making these decisions. Based on the ratio between two node counts, the first move and any one of the others. If the ratio is above a carefully tested threshold, that move can be searched in parallel with others that meet that same test. If the ratio is below the threshold, that move is classified as a "potential new best move" and it is searched by itself, using all processors.

Again, just like the roof at your house or wherever you live, that is also over your head. But others might get it and want to give it a try. It _does_ work. It was developed scientifically and with careful thought and accurate testing...

Again you've just proved what I wrote in the first post to Marco. You are not able (or you pretend that you are not able) to understand the written content unless it's simplified way below the level of average Joe undergrad kid you teach.
And then what you do, you present completely different topic (usually totally trivial) as some kind of rocket science but in a patronizing way as if the person you are talking to is retarded.

So let me bold it that you can understand it this time:
YBW is trivial, DTS is also trivial to understand (implementation is hard, but that's a different story) these are very simple engineering techniques. Yes, you made your PhD out of it, but that was 25 years ago, by today's standards that would be a master thesis in CS department on a strong university. And you are not von Neumann, and that's not a seminal contribution to the computer science, it's just a small original contribution in the filed of algorithm parallelization. Nothing more. Thinking of it as some rocket science Newtonian type of contribution is nothing but pure ego-trip and your enormous patronizations are just ridiculous.

Now to come back to original idea by Marco that you were not able to understand, so let me draw it for you one last time, and after that I give up. I simply value my time more than to feed that fetish troll you obviously have in you.

The idea was instead of having a fixed minimum depth where you don't allow to split (in order not to have too much overhead), you make it depend on order of the move you are currently searching. So if the move is lower on the list, minimum split depth will be higher and vice versa.

bob · Post by **bob** » Tue May 10, 2011 4:47 pm

Milos wrote:
bob wrote:So you _really_ don't understand what is going on? If so, you would realize that (a) the above is the basis for YBW. The difference at the root is that you _do_ get a pretty good hint that a second-best, or third-best move might become the best move, by remembering the node count for each root move from the previous iteration. In a one-best-move root position, you might see the first move with a node count of 25x to 100x or even more compared to the second-best move. But in a position where a couple of moves are "close" those two moves will have a node count way higher than the other moves. Of course, had you been doing this stuff, rather than trying to become a legend in your own mind, you would have either (a) discovered this yourself or (b) find that this is the way Crafty splits at the root and is one of the reasons its parallel search is highly effective.

In your weak mind, "scientific method" is apparently "whatever you say is fact, whatever anyone says is junk." Not exactly the "accepted definition." I actually run these kinds of tests, by the hundreds of thousands. You might try testing rather than posting. You will learn more.

BTW, for two examples of my "nonsense:"

example 1, one clear best move at the root, but a couple of other moves are threatening to replace that move:
Code: Select all
       move       nodes     R/P         
       Bxe4    15396401     0 0
         d5     9510064     0 0
        Nh5     5469105     0 0
       Nxe4     1912204     0 1
        Bd5      952451     0 1
        Ne5      866838     0 1
       Rfd8      519152     0 1
         e5      393799     0 1
        Nc5      140114     1 1
        Qb8      132144     1 1
        Bc6      126426     1 1
        Ng4      104296     1 1
        Nd5      103276     1 1
        Qc5      100265     1 1
        Kh7       86073     1 1
         g6       82893     1 1
        Ra8       77554     1 1
       Rfe8       74845     1 1
         g5       69785     1 1
       Qxc4       63719     1 1
         h5       54724     1 1
        Nb8       52359     1 1
        Ba8       48347     1 1
        Ba6       45300     1 1
        Ne8       43563     1 1
        Qd8       42437     1 1
        Nh7       41668     1 1
       Rcd8       39294     1 1
       Rce8       38811     1 1
        Rb8       38811     1 1
        Bd8       33900     1 1
        Kh8       29350     1 1
        Qc6       15577     1 1
      total    36705545
First column is a move at the root, second column is the nodes in that sub-tree only. The next two are flags the first says reduce if 1, else not, the second says "this move can be searched in parallel with others if 1, else it has to be searched by itself by all processors."

One clear best move. Two potential replacements. I reduce neither at the root, although I reduce others unless something else causes some to not be reduced, and I sic all cpus on each of those 3 moves, one at a time, to avoid giving one of them to one cpu where the search could take an abnormally long time.

(the above is kopec #22, Bxe4 is the proposed best move...)

Here's another one from an old Belle vs Cray Blitz game. One clear best move, Bxh6. Check the node counts and flags.
Code: Select all
       move       nodes     R/P
       Bxh6   184605726     0 0
        Bf4    24200673     0 1
        Qd6    22641716     0 1
         a3    13915861     0 1
         b4     3124291     0 1
         b3     2711741     0 1
         h4     2112311     0 1
         a4     1003096     1 1
       Qxb6      433634     1 1
        Bd2      346845     1 1
        Rb1      312422     1 1
        Qg6      299091     1 1
        Be3      286610     1 1
        Qc6      215689     1 1
         g4      196193     1 1
        Qg4      154844     1 1
        Qe7      138909     1 1
        Bg5      125354     1 1
        Qd5       71011     1 1
        Qb3       44163     1 1
        Qf7       43624     1 1
      Qxh6+       25994     1 1
         g3       24814     1 1
        Qc8       24591     1 1
        Qc4       23722     1 1
       Qg8+       19774     1 1
       Qxe5       17483     1 1
        Qd7       15264     1 1
        Qf5       11680     1 1
        Qe8       10041     1 1
        Qf6        8902     1 1
      total   257166069
It can search all (but the first move) in parallel, which is classic YBW. A few of the first group have node counts high enough to mark them "interesting" so they do not get reduced, but they do get searched in parallel to improve search efficiency.

Too bad you don't understand the basic mechanics. But that is just one of many things you don't understand. now feel free to once again, tuck your tail and run away. This is not nonsense. It is a _very_ sound approach, with supporting mathematics proving that it works...

BTW, I suppose that the "you think" that you highlighted was a bit beyond your ability to grasp. The program is making these decisions. Based on the ratio between two node counts, the first move and any one of the others. If the ratio is above a carefully tested threshold, that move can be searched in parallel with others that meet that same test. If the ratio is below the threshold, that move is classified as a "potential new best move" and it is searched by itself, using all processors.

Again, just like the roof at your house or wherever you live, that is also over your head. But others might get it and want to give it a try. It _does_ work. It was developed scientifically and with careful thought and accurate testing...
Again you've just proved what I wrote in the first post to Marco. You are not able (or you pretend that you are not able) to understand the written content unless it's simplified way below the level of average Joe undergrad kid you teach.
And then what you do, you present completely different topic (usually totally trivial) as some kind of rocket science but in a patronizing way as if the person you are talking to is retarded.

So let me bold it that you can understand it this time:
YBW is trivial, DTS is also trivial to understand (implementation is hard, but that's a different story) these are very simple engineering techniques. Yes, you made your PhD out of it, but that was 25 years ago, by today's standards that would be a master thesis in CS department on a strong university. And you are not von Neumann, and that's not a seminal contribution to the computer science, it's just a small original contribution in the filed of algorithm parallelization. Nothing more. Thinking of it as some rocket science Newtonian type of contribution is nothing but pure ego-trip and your enormous patronizations are just ridiculous.

Now to come back to original idea by Marco that you were not able to understand, so let me draw it for you one last time, and after that I give up. I simply value my time more than to feed that fetish troll you obviously have in you.

The idea was instead of having a fixed minimum depth where you don't allow to split (in order not to have too much overhead), you make it depend on order of the move you are currently searching. So if the move is lower on the list, minimum split depth will be higher and vice versa.

And, as I said, that idea is flawed. And anyone that has studied parallel search issues would see why. In fact, it is the inverse of best practice. Which would you rather split at, a node where only 1 move has been searched previously, or one where 10 have been searched? Simple answer. The latter. because there is a much higher probability that all of the remaining moves have to be searched. And that avoids search overhead.

Now, as I used to have to read in first grade, "See Milos. See Milos run. Run Milos. Run fast. Run fast Milos."

You have no clue. Additionally, just to try to help stamp out ignorance, the idea behind "minimum split depth" (which came _directly_ from Crafty, btw) is to try to balance work to be done in parallel against the cost of doing the split operation. If you split too near the tips, the work is small, the overhead is high, performance suffers. This is no longer the way I make this decision in Crafty. I have been using something different for a couple of years. And I will likely be using something different again in the next version. There are well-known issues.

Now, go back to whatever it is you do, and butt out of a conversation you _clearly_ know nothing about. Your "master's thesis" comment shows just exactly how deep your lack of knowledge really is... No surprise there.

Milos · Post by **Milos** » Tue May 10, 2011 5:30 pm

bob wrote:And, as I said, that idea is flawed. And anyone that has studied parallel search issues would see why. In fact, it is the inverse of best practice. Which would you rather split at, a node where only 1 move has been searched previously, or one where 10 have been searched? Simple answer. The latter. because there is a much higher probability that all of the remaining moves have to be searched. And that avoids search overhead.

You are hopeless, this is really childish. I say "the sky is blue", then you come and claim "you are completely wrong, you have no clue, the sky is not yellow, it is blue." You don't even try to understand what ppl claim, or you just pretend that you don't understand. In the end it doesn't matter, the effect is the same.

You have no clue. Additionally, just to try to help stamp out ignorance, the idea behind "minimum split depth" (which came _directly_ from Crafty, btw) is to try to balance work to be done in parallel against the cost of doing the split operation. If you split too near the tips, the work is small, the overhead is high, performance suffers. This is no longer the way I make this decision in Crafty. I have been using something different for a couple of years. And I will likely be using something different again in the next version. There are well-known issues.

Again patronizing about trivial stuff. Minimum split depth is so obvious concept that any average Joe you pick up on the street could figure it out. Bragging about how you "invented" that in Crafty just demonstrates boundless narcissism.

Your "master's thesis" comment shows just exactly how deep your lack of knowledge really is... No surprise there.

Sorry Bob, but your "authority" attitude might only work with laymen and your students. Others (many on this forum) however understand the true nature behind. However, most ppl chose not to confront. I'm not one of them. If the emperor is naked I will always say it.
You do have an iconic status in chess programming community, but in the sense of pioneering and education, not in the sense of scientific contribution and bringing new (groundbreaking) ideas. I know this hurts your ego a lot, but that's the fact...

rbarreira · Post by **rbarreira** » Tue May 10, 2011 6:26 pm

I have a problem with the assumption that people can't come up with the same idea independently.

I agree with Milos that the "minimum split depth" idea seems quite obvious to anyone who is used to make parallel algorithms. One of the basic goals in any parallel algorithm is to only split work between threads when that doesn't cause too much overhead, and "minimum split depth" follows directly from that.

Sven · Post by **Sven** » Tue May 10, 2011 6:48 pm

rbarreira wrote:I have a problem with the assumption that people can't come up with the same idea independently.

I agree with Milos that the "minimum split depth" idea seems quite obvious to anyone who is used to make parallel algorithms. One of the basic goals in any parallel algorithm is to only split work between threads when that doesn't cause too much overhead, and "minimum split depth" follows directly from that.

Did Bob claim people can't come up with the same idea independently?

Or did he claim that this idea actually came from Crafty development?

Do you claim that this idea was brought up independently by several people?

rbarreira wrote:that the "minimum split depth" idea seems quite obvious to anyone who is used to make parallel algorithms

It's easy to be wise after the event

Sven

Milos · Post by **Milos** » Tue May 10, 2011 6:56 pm

Sven Schüle wrote:
rbarreira wrote:that the "minimum split depth" idea seems quite obvious to anyone who is used to make parallel algorithms
It's easy to be wise after the event

It's even easier to be an apple-polisher

.

rbarreira · Post by **rbarreira** » Tue May 10, 2011 7:05 pm

Sven Schüle wrote: Did Bob claim people can't come up with the same idea independently?

He certainly implied it when he said this:

This was a Crafty idea, from the beginning, and everyone seems to have copied the idea without giving any more thought to the problem.

Not only does it seem everyone copied it, but they were even incapable of improving on it, poor bastards.

Sven Schüle wrote:Or did he claim that this idea actually came from Crafty development?

If something "comes from" something else, that implies a derived idea, i.e. not an independently discovered idea.

Sven Schüle wrote: Do you claim that this idea was brought up independently by several people?

I don't know either way, but I do claim it is obvious enough that it probably did. Again, it follows directly the basic principles of parallel programming - only parallelize work when that speeds up the algorithm, in other cases do it serially.

Sven Schüle wrote:It's easy to be wise after the event

It's easy to come up with clichés like that instead of critically evaluating the specific case we're discussing.

Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search

Re: Stockfish and optimizing options for SMP search