Clustering etc. thread

bob · Post by **bob** » Wed Mar 11, 2009 6:44 pm

Zach Wegner wrote:
Tony wrote:I think I gave this algoritm already about a year ago. It seems only 1 person understood

Without optimizations:

1. Generate all moves at the root
2. Send every move to a client for a fixed depth search (full width) Keep track which node search what move.
3. Calculate the score for this node based on the results returned.
4. Calculate the depth for this node, calculated from the reported depth of every move and a depth offset depending on the difference in score between each move and the best move.
5. Select and make the move that causes the current depth to be limited.
6 Goto 1

- Algoritm works great for "Deep analyses" and "Clustersearching"

- You can replace 2) with a monte Carlo search, only 3) needs to be able to convert the score . Therefor a engine that is able to do deep analyses, and cluster searching is most likely also able to do monte carlo searching

- One still splits at the root, it's just a variable root. So one can claim splitting at the root without lying.

- What can happen is that the best move scores 1 pawn above move 2, thererfore searched to fe 21 ply, then fails low and move 2 becomes best but was only seached (yet) to 19 ply.

- If needed, one can raise the searchdepth ( rather than growing the in memory tree), just search the same positions on the same client

- One could even use a 2nd instance of the engine to walk through the tree, and have it work at a selected position.

Tony
Certainly an interesting algorithm, and quite similar to UCT at one node as GCP points out. Have you tested it for pure strength against a normal root search for single processors? Seems like it has the potential to be better, just as a more optimal usage of time (mate scores can have confidence of infinity so aren't searched further, etc).

UCT is an interesting idea, applied mainly where one can't hope to do a deep normal search (as we are doing in chess). Given Go's excessive branching factor, it may well be the holy-grail of improving things there. Extending that to chess, however, is a different issue entirely and whether it would work here or not is not a given.

However, in the current context, it is irrelevant to the current discussion anyway.

Zach Wegner · Post by **Zach Wegner** » Wed Mar 11, 2009 7:02 pm

bob wrote:UCT is an interesting idea, applied mainly where one can't hope to do a deep normal search (as we are doing in chess). Given Go's excessive branching factor, it may well be the holy-grail of improving things there. Extending that to chess, however, is a different issue entirely and whether it would work here or not is not a given.

However, in the current context, it is irrelevant to the current discussion anyway.

Well, I think it is interesting as an easy way to do cluster searching (which seems to be what Rybka is doing), as well as a way to compare scores from different depths. It's hard to say what a speedup you could get, since you no longer have one-dimensional values (depth) to compare, but rather a set of values, one for each move. I personally don't think there's that much potential in it, since you would always want to be searching the best evaluated (by UCB) move, not the "best out of the set of moves that we chose for this node at iteration 1 so we can have some hash table consistency". And in a lot of cases you want multiple nodes searching the same root move. I'm going the YBW-like route, though with DTS it gets _extremely_ complicated.

You also reminded me that I should have said UCB rather than UCT, UCT is just UCB applied recursively, and we're just talking about one node here.

Tony · Post by **Tony** » Wed Mar 11, 2009 7:05 pm

bob wrote:
Tony wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Dirt wrote:
Vasik Rajlich wrote:
bob wrote:I don't buy the "this hurts Rybka" idea, because the cluster rybka is a joke. And a poor joke at that. There have been some decent cluster-based programs. But Rybka is simply not one of them.
Where did that come from ??

Vas
There is something of an explanation here.
I read this post and I can say 2 things.

1)I think that it is impossible to know the algorithm rybka is using based on output from a single position.

It is possible that something similiar that is not exactly the same is used
when some illogical moves that lose the queen are analyzed but this is not all the story and the algorithm is based partly on "split only at the root" and partly on another idea.

2)I remember that Vas said 100 elo based on testing at fast time control and I suspect that at fast time control you get clearly more than 50 elo per doubling so practically 5 nodes do not give 4:1 speed improvement but clearly less than it(maybe 2.5:1).

Uri
The problem is, I am not basing my opinion on a single position. I am basing it on a whole game I watched in the last ACCA online event. Where Rybka was kibitzing PVs that showed _exactly_ how the search was being split and where...
I think that you cannot get conclusions based on pvs that rybka showed.

It is possible that rybka gave only part of the information in the pvs that you saw and the algorithm is based on combination of splitting in the root and something else.

Uri
It is possible Rybka had a GM playing the moves and supplying nonsensical kibitzes as well. But it is not very likely. This is going nowhere. I _did_ an algorithm like that in 1983. I put it together in under 2 weeks and won the 1983 WCCC championship with it. And I _know_ how it works and how the PVs look. There's no doubt in this case as to what is going on. There is no other possible explanation. Feel free to offer _any_ reason how one could get a PV for depth-23, then a PV for depth=19, then a PV for depth=21, then a PV for depth=18, were if you take the best moves, and collect them individually, you can see one cluster node searching a group of moves in the normal way, kibitzing a PV after each iteration, another node doing the same for a different group of nodes. Etc. Or offer any explanation of why a program would kibitz (intermingled with the kibitzes of other nodes at different depths) a steady depth progression, with a score of -9.0, because there is only one possible recapture and that move is not being searched on that node.

This is _not_ guesswork. It is a simple statement of what it was doing. And how...

As far as for "why this was done" it is because it is an easy way to get a cluster search to work. But not effective. Not even +20 Elo effective...
I think I gave this algoritm already about a year ago. It seems only 1 person understood

Without optimizations:

1. Generate all moves at the root
2. Send every move to a client for a fixed depth search (full width) Keep track which node search what move.
3. Calculate the score for this node based on the results returned.
4. Calculate the depth for this node, calculated from the reported depth of every move and a depth offset depending on the difference in score between each move and the best move.
5. Select and make the move that causes the current depth to be limited.
6 Goto 1

- Algoritm works great for "Deep analyses" and "Clustersearching"

- You can replace 2) with a monte Carlo search, only 3) needs to be able to convert the score . Therefor a engine that is able to do deep analyses, and cluster searching is most likely also able to do monte carlo searching

- One still splits at the root, it's just a variable root. So one can claim splitting at the root without lying.

- What can happen is that the best move scores 1 pawn above move 2, thererfore searched to fe 21 ply, then fails low and move 2 becomes best but was only seached (yet) to 19 ply.

- If needed, one can raise the searchdepth ( rather than growing the in memory tree), just search the same positions on the same client

- One could even use a 2nd instance of the engine to walk through the tree, and have it work at a selected position.

Tony
I don't see how that addresses all of the following:

move A is searched to depth N. Move B is searched to depth N+2.

(1) A is much better than B (score). Which to play?

(2) B is much better than A (score). Which to play?

(3) A and B are very close. Which to play?

.....

It doesn't. The B-Star algoritm however has some interesting points. It's main point of failure (at that time) seems to be its very low searchdepth. (iirc it fired off 4 ply searches)

Tony

Zach Wegner · Post by **Zach Wegner** » Wed Mar 11, 2009 7:20 pm

Tony wrote:
I don't see how that addresses all of the following:

move A is searched to depth N. Move B is searched to depth N+2.

(1) A is much better than B (score). Which to play?

(2) B is much better than A (score). Which to play?

(3) A and B are very close. Which to play?

.....
It doesn't. The B-Star algoritm however has some interesting points. It's main point of failure (at that time) seems to be its very low searchdepth. (iirc it fired off 4 ply searches)

Tony

Well it actually can if you use what GCP suggested.

Code: Select all

nodes = 1 << depth; // rough approximation based on ebf=2
prob = 1 / &#40;1 + pow&#40;10, -score/4&#41;); // floating point score, see http&#58;//chessprogramming.wikispaces.com/Pawn+Advantage%2C+Win+Percentage%2C+and+ELO
ucb = score + k*sqrt&#40;log&#40;nodes&#41;/total_nodes&#41;;
best_score = max&#40;ucb&#41;;

The last line (choosing) can use some modification, but that's just a basic formula. GCP is the real expert there...

Uri Blass · Post by **Uri Blass** » Wed Mar 11, 2009 8:28 pm

bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Vasik Rajlich wrote:
bob wrote:
Vasik Rajlich wrote:
bob wrote:
Vasik Rajlich wrote:
Uri Blass wrote:
Dirt wrote:
Vasik Rajlich wrote:
bob wrote:I don't buy the "this hurts Rybka" idea, because the cluster rybka is a joke. And a poor joke at that. There have been some decent cluster-based programs. But Rybka is simply not one of them.
Where did that come from ??

Vas
There is something of an explanation here.
I read this post and I can say 2 things.

1)I think that it is impossible to know the algorithm rybka is using based on output from a single position.

It is possible that something similiar that is not exactly the same is used
when some illogical moves that lose the queen are analyzed but this is not all the story and the algorithm is based partly on "split only at the root" and partly on another idea.

2)I remember that Vas said 100 elo based on testing at fast time control and I suspect that at fast time control you get clearly more than 50 elo per doubling so practically 5 nodes do not give 4:1 speed improvement but clearly less than it(maybe 2.5:1).

Uri
The effective speedup is probably somewhere between 2.5:1 and 3:1 for 5 nodes, which is what Lukas had when he tested all of this.

Now he's up to 9 nodes BTW

Vas
Can we stay in the real world? Splitting at the root can not produce a 2.5x speedup, when the best move at the root takes _way_ over 50% of the total search time. There is theory. There is practice. And there is nonsense. For the event I am talking about, this claim is "nonsense". You might get the uninformed to buy this stuff, but not someone that has been doing it for 30+ years now (my first parallel search played its first ACM event in 1978....)
By effective speedup I mean the time handicap you could give to the original entity and score 50%. So, even if you do nothing other than split at the root and if the first move typically takes 50% of your search time, you could still get an effective speedup of >2. Not that that's what Rybka is doing

Vas
Here's what you need to make that happen.

(1) you need to change your mind at least once at the root during the last couple of iterations. More changes is better.

Sure. If you're not changing your mind, it doesn't matter what kind of speedup you have.

(2) you have to hope that the hash information from the first move does not affect any other move. Fine 70 is a good example of where this can be a problem.

That's just life without shared memory. Any cluster implementation is going to have problems in a position like that.

I think you'd be very lucky to get a speedup of 1.5x with any number of processors, which is not zero of course, but it is not something that will make me quake in my boots either.
With infinite # of processors and splitting only at the root, you will get a lot more than 1.5x.

Vas
No program today have search that is good enough but in theory
if you have a good search the speed up may be smaller from splitting at the root for the simple reason that the value of 1.5x speed improvement is bigger than the value of one ply with no pruning.

I think that
some type of bad evaluation also can cause the speed up to be smaller
or even negative.

Imagine that you search with no pruning and extensions and no qsearch and imagine that you have an evaluation that gives no bonus for the side to move so practically you often evaluate even depth as better than odd depths(maybe it should be the opposite and I did not think about it but the idea is the same)

If you split at the root you may get depth 7 for move A and depth 8 for move B and prefer move B not because it is better but only because you searched to even depth.

This problem does not happen without splitting at the root because without splitting at the root you always get the same depth for all moves.

Uri
You are badly misusing terminology.

1. "splitting at the root" does not mean each root move gets searched to different depth. I split at the root in current crafty. that only means that I do a parallel search on the root moves as well as deeper moves, because this is a far more efficient way to search.

2. You are using what is called "unsynchronized search" where each node searched a different move (or set of moves) at the root, and when iteration N is done, N+1 is started without regard for how the other moves are progressing on other nodes.

this is useless.

How can you choose between depth 21, eval=+1.3, and depth 19, eval +2.5?? You can't. This has been tried in the past, by Newborn, by Schaeffer, and by others. There is no way to compute any sort of equivalence function so that you can decide which of the above is better. The depth 19 move might be even higher at depth 21. Or it might be way lower. The only way to discover this is to search both moves to the same depth. Anything else is beyond hopeless and is a coin flip. You can't even choose between depth 21, +1.5, and depth 19, -2.0, because by depth 21 the -2.0 score might be +5.0...
I can certainly choose and the simplest choice is to choose higher score is better.
depth 19 eval=+2.5 is better than depth 21 eval=+1.3 for the simple reason that 2.5>1.3

I believe that with relatively good evaluation it is not so bad choice(not that splitting at the root is good but it may give effective speed improvement of 2 or more than 2 assuming that all root moves are searched).

I may be wrong and the only way to know is by testing.

Uri
There is absolutely no doubt you are wrong. How many times have you seen a program search a move to depth N with a + score, then at depth n+1 the score drops off the map and it switches to a new move? If you believe what you wrote, why not stop your search when the eval gets to a good point? Why continue to search deeper?

As far as a speedup of > 2, it simply will not/can not happen. Amdahl's law says that the overall speedup for an algorithm is:

speedup = time(N) / T(P)

T(P) = sequential_processing_time + parallel_processing_time / N

where N = number of processors.

If the first move takes 50% of the time, and that is the usual case, then you absolutely can not get a speedup > 2.0. It is impossible. Except for those cases where the first move is not best. Now you search the second move. And it takes the same amount of time. So you do get a bigger boost there. But that only happens about 15% of the time. 85% of the time the speedup is going to be way below 2.0 no matter how many CPUs you use. And there is simply no way to escape the basic sequential property of the alpha/beta algorithm.
Of course the score may change but by the same logic you also cannot compare between scores at the same depths because the program may change its mind in later depth.
I don't see where this nonsense comes from. Two moves, different scores, equal depths. I most certainly can compare them. And choose the largest. But if the depths are different, and the scores are different, I can only flip a coin to choose one. Choosing the best is wrong. Choosing the deepest depth is wrong. Choosing one of the two moves with different depths is just a purely random choice.

You do not know the best move and you always guess based on information and hope to be right.
I trust the search to at least search the two moves in an equal way, so that the resulting backed-up scores can be compared. That's what minimax and alpha/beta is all about.

If it is not the case then you cannot also compare scores at the same depth because practically one move is searched to normal depth and another move is searched to smaller depth because of late move reductions.

It may be bad to compare score in different depth with
bad evaluation and no reductions but it is not the practical case(maybe it was the case 20 or 30 years ago).

Uri
It is still the case today. And I am amazed you don't understand why...
1)With late move reductions even searching to the same depth can be practically different depth so if you cannot compare different depth then you also cannot compare the same depth.
All I can say to that is <sigh>

If you don't think we can compare scores from the same depth, then that would mean you do not believe alpha/beta or minimax work, because that is _exactly_ what they are based on. I'm not going to continue with the apples-to-oranges nonsense. If that is your belief, then that is your belief. It is wrong, however.

2)I think that an interesting experiment that everybody can do may be to test A against B when A search at fixed number of nodes without splitting the root(let say 100,000 nodes per move) and B search after every root move at 50,000 nodes per move.

B choose a move simply by comparing the scores without caring about the fact that depths are different.

I do not know if A wins or B wins and it may be dependent on the program but I see no reason to assume that it is obvious that B cannot win.
The reason B loses is intuitively obvious to the casual observer (who understands alpha/beta minimax). But apparently the point is lost here so just carry that belief with you...

Assuming nodes are nearly proportional to time if
B wins then it suggests that splitting at the root can give speed up of more than 2:1.

Uri
And again, I gave you the math. The math is absolutely infallible. And yet you keep saying 2+2 = 5, in spite of any mathematical analysis you are given...

If we can't communicate via accurate math, I can't communicate with "assuming..." and "maybe..." and "it might..." This is an exact science.

1)I did not claim that you cannot compare scores of different moves on the same depth

My words:
"if you cannot compare different depth then you also cannot compare the same depth."

My opinion is that you can compare scores from different depth and also can compare scores from the same depth.

deeper search is always better so I always prefer comparing scores from deeper depth if I can do it but if I have only score for move A at depth 19 and score for move B at depth 21 and need to decide which is best then I compare the scores.

practically you also do something that is equivalent to comparing scores at
different depth for a modified Crafty(that does not have check extensions)
imagine that you extend move A because it is check and do not extend move B.

comparing scores at the same depth for move A and move B is the same as comparing scores at different depths for modified Crafty that does not have check extensions.

2)All your math is simply irrelevant for "unsynchronized search"
The only way to show that my opinion is wrong or right is by testing.

Note that testing of programs of 1980 in case that people did it is irrelevant because today programs are clearly different.

The interesting test is for the best free source programs(Stockfish,Toga,Crafty)

Uri

bob · Post by **bob** » Wed Mar 11, 2009 9:48 pm

Tony wrote:
bob wrote:
Tony wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Dirt wrote:
Vasik Rajlich wrote:
bob wrote:I don't buy the "this hurts Rybka" idea, because the cluster rybka is a joke. And a poor joke at that. There have been some decent cluster-based programs. But Rybka is simply not one of them.
Where did that come from ??

Vas
There is something of an explanation here.
I read this post and I can say 2 things.

1)I think that it is impossible to know the algorithm rybka is using based on output from a single position.

It is possible that something similiar that is not exactly the same is used
when some illogical moves that lose the queen are analyzed but this is not all the story and the algorithm is based partly on "split only at the root" and partly on another idea.

2)I remember that Vas said 100 elo based on testing at fast time control and I suspect that at fast time control you get clearly more than 50 elo per doubling so practically 5 nodes do not give 4:1 speed improvement but clearly less than it(maybe 2.5:1).

Uri
The problem is, I am not basing my opinion on a single position. I am basing it on a whole game I watched in the last ACCA online event. Where Rybka was kibitzing PVs that showed _exactly_ how the search was being split and where...
I think that you cannot get conclusions based on pvs that rybka showed.

It is possible that rybka gave only part of the information in the pvs that you saw and the algorithm is based on combination of splitting in the root and something else.

Uri
It is possible Rybka had a GM playing the moves and supplying nonsensical kibitzes as well. But it is not very likely. This is going nowhere. I _did_ an algorithm like that in 1983. I put it together in under 2 weeks and won the 1983 WCCC championship with it. And I _know_ how it works and how the PVs look. There's no doubt in this case as to what is going on. There is no other possible explanation. Feel free to offer _any_ reason how one could get a PV for depth-23, then a PV for depth=19, then a PV for depth=21, then a PV for depth=18, were if you take the best moves, and collect them individually, you can see one cluster node searching a group of moves in the normal way, kibitzing a PV after each iteration, another node doing the same for a different group of nodes. Etc. Or offer any explanation of why a program would kibitz (intermingled with the kibitzes of other nodes at different depths) a steady depth progression, with a score of -9.0, because there is only one possible recapture and that move is not being searched on that node.

This is _not_ guesswork. It is a simple statement of what it was doing. And how...

As far as for "why this was done" it is because it is an easy way to get a cluster search to work. But not effective. Not even +20 Elo effective...
I think I gave this algoritm already about a year ago. It seems only 1 person understood

Without optimizations:

1. Generate all moves at the root
2. Send every move to a client for a fixed depth search (full width) Keep track which node search what move.
3. Calculate the score for this node based on the results returned.
4. Calculate the depth for this node, calculated from the reported depth of every move and a depth offset depending on the difference in score between each move and the best move.
5. Select and make the move that causes the current depth to be limited.
6 Goto 1

- Algoritm works great for "Deep analyses" and "Clustersearching"

- You can replace 2) with a monte Carlo search, only 3) needs to be able to convert the score . Therefor a engine that is able to do deep analyses, and cluster searching is most likely also able to do monte carlo searching

- One still splits at the root, it's just a variable root. So one can claim splitting at the root without lying.

- What can happen is that the best move scores 1 pawn above move 2, thererfore searched to fe 21 ply, then fails low and move 2 becomes best but was only seached (yet) to 19 ply.

- If needed, one can raise the searchdepth ( rather than growing the in memory tree), just search the same positions on the same client

- One could even use a 2nd instance of the engine to walk through the tree, and have it work at a selected position.

Tony
I don't see how that addresses all of the following:

move A is searched to depth N. Move B is searched to depth N+2.

(1) A is much better than B (score). Which to play?

(2) B is much better than A (score). Which to play?

(3) A and B are very close. Which to play?

.....
It doesn't. The B-Star algoritm however has some interesting points. It's main point of failure (at that time) seems to be its very low searchdepth. (iirc it fired off 4 ply searches)

Tony

Berliner claimed the issue was "how to set the confidence bounds" which turned into expensive searches...

bob · Post by **bob** » Wed Mar 11, 2009 9:51 pm

Zach Wegner wrote:
bob wrote:UCT is an interesting idea, applied mainly where one can't hope to do a deep normal search (as we are doing in chess). Given Go's excessive branching factor, it may well be the holy-grail of improving things there. Extending that to chess, however, is a different issue entirely and whether it would work here or not is not a given.

However, in the current context, it is irrelevant to the current discussion anyway.
Well, I think it is interesting as an easy way to do cluster searching (which seems to be what Rybka is doing), as well as a way to compare scores from different depths. It's hard to say what a speedup you could get, since you no longer have one-dimensional values (depth) to compare, but rather a set of values, one for each move. I personally don't think there's that much potential in it, since you would always want to be searching the best evaluated (by UCB) move, not the "best out of the set of moves that we chose for this node at iteration 1 so we can have some hash table consistency". And in a lot of cases you want multiple nodes searching the same root move. I'm going the YBW-like route, though with DTS it gets _extremely_ complicated.

You also reminded me that I should have said UCB rather than UCT, UCT is just UCB applied recursively, and we're just talking about one node here.

Rybka's not doing UCT so far as I can tell. It is simply splitting the moves into groups at the root, handing them off to different cluster nodes who are then told "search them as deeply as you can until N seconds have been used." At that point the results are collected and a move is chosen...

bob · Post by **bob** » Wed Mar 11, 2009 9:54 pm

Rolf wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:I can add that not the same algorithm is used so we do not have a theorethic value for the possible effective speed up and the value may be different for different programs.

Uri
Eh? All that matters is how long does the first move take to search compared to the rest of the moves. I've not seen a program where this is not approximately 1/2 of the total time for the first move. That means max speedup is 2 and actual speedup is going to be more like 1.5x as an optimal number.

Math is simple.

First move takes 50% of time. In 85% of cases the first move is best. So the parallel search time is T/2 for the first move, and T/2N for the rest. For large N, T/2N is near zero, but you still have that T/2 for the first move

speedup = serial_time / parallel_time

optimal speedup = T / (T/2) which looks very much like 2.0 to me. And that's all you can get. Best case.

With 2 processors, you get T / (T/2 + T/4) which is T / (T/.75)) or 1 / .75 = 1.3x

It really is that simple.

And anyone working on a parallel search understands that.

BTW, 5 nodes. optimal speedup = T / (T/2 + T/10) = T / .6 = 1.66x for the Rybka cluster box used for the ACCA event. Close to my 1.5x estimate. Been there. Done that. Got the T-shirt.
I do not know much about parallel search but I see no reason to assume that the same algorithm is used.

cluster is different conditions(worse relative to having the same nodes in normal parallel search) so it is logical to hope that a different algorithm
may be better.
"You can wish in one hand, and crap in the other, and see which one fills up first.." - Burgess Meridith, Grumpy Old Men.

We are using alpha/beta minimax at the present. And that is what I based my analysis on, and I clearly stated that at the beginning of this discussion. I am not "assuming" that some better search algorithm might be developed for use on a cluster. I am not assuming anything other than that we are all using minimax search with alpha/beta pruning. And that fixes the math quite nicely.

If something better than alpha/beta can be done, that would be great. But to date, it has not happened. It has not happened in almost 60 years since Shannon wrote "How to program a computer to play chess." Followed by the Newell, Simon and Shaw paper on alpha-beta pruning. So I am not arguing that a possible future algorithm might produce >2x speedup by splitting at the root. I am discussing _today's_ algorithms as used in all chess-playing programs.

I'd prefer science fact, rather than science fiction, for the present.
That is in no way something against you in personal or the now modern tricks. If one would follow you progress or other paradigms were absolutely unrealistic, something you cannot say IMO. Treating people with respect, who think ahead or just ask questions, is IMO selfunderstood.

I have no idea what you are talking about. This is simple theory that has existed for at least 20 years now (parallel alpha/beta search) and practical examples have existed for 30+ years. There is nothing to "improve" in this discussion, any more than you can improve on integral calculus to produce some new value when integrating x^2dx. This is just basic math at the moment, and the discussion has gone off into the twilight zone again.

bob · Post by **bob** » Wed Mar 11, 2009 10:05 pm

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Vasik Rajlich wrote:
bob wrote:
Vasik Rajlich wrote:
bob wrote:
Vasik Rajlich wrote:
Uri Blass wrote:
Dirt wrote:
Vasik Rajlich wrote:
bob wrote:I don't buy the "this hurts Rybka" idea, because the cluster rybka is a joke. And a poor joke at that. There have been some decent cluster-based programs. But Rybka is simply not one of them.
Where did that come from ??

Vas
There is something of an explanation here.
I read this post and I can say 2 things.

1)I think that it is impossible to know the algorithm rybka is using based on output from a single position.

It is possible that something similiar that is not exactly the same is used
when some illogical moves that lose the queen are analyzed but this is not all the story and the algorithm is based partly on "split only at the root" and partly on another idea.

2)I remember that Vas said 100 elo based on testing at fast time control and I suspect that at fast time control you get clearly more than 50 elo per doubling so practically 5 nodes do not give 4:1 speed improvement but clearly less than it(maybe 2.5:1).

Uri
The effective speedup is probably somewhere between 2.5:1 and 3:1 for 5 nodes, which is what Lukas had when he tested all of this.

Now he's up to 9 nodes BTW

Vas
Can we stay in the real world? Splitting at the root can not produce a 2.5x speedup, when the best move at the root takes _way_ over 50% of the total search time. There is theory. There is practice. And there is nonsense. For the event I am talking about, this claim is "nonsense". You might get the uninformed to buy this stuff, but not someone that has been doing it for 30+ years now (my first parallel search played its first ACM event in 1978....)
By effective speedup I mean the time handicap you could give to the original entity and score 50%. So, even if you do nothing other than split at the root and if the first move typically takes 50% of your search time, you could still get an effective speedup of >2. Not that that's what Rybka is doing

Vas
Here's what you need to make that happen.

(1) you need to change your mind at least once at the root during the last couple of iterations. More changes is better.

Sure. If you're not changing your mind, it doesn't matter what kind of speedup you have.

(2) you have to hope that the hash information from the first move does not affect any other move. Fine 70 is a good example of where this can be a problem.

That's just life without shared memory. Any cluster implementation is going to have problems in a position like that.

I think you'd be very lucky to get a speedup of 1.5x with any number of processors, which is not zero of course, but it is not something that will make me quake in my boots either.
With infinite # of processors and splitting only at the root, you will get a lot more than 1.5x.

Vas
No program today have search that is good enough but in theory
if you have a good search the speed up may be smaller from splitting at the root for the simple reason that the value of 1.5x speed improvement is bigger than the value of one ply with no pruning.

I think that
some type of bad evaluation also can cause the speed up to be smaller
or even negative.

Imagine that you search with no pruning and extensions and no qsearch and imagine that you have an evaluation that gives no bonus for the side to move so practically you often evaluate even depth as better than odd depths(maybe it should be the opposite and I did not think about it but the idea is the same)

If you split at the root you may get depth 7 for move A and depth 8 for move B and prefer move B not because it is better but only because you searched to even depth.

This problem does not happen without splitting at the root because without splitting at the root you always get the same depth for all moves.

Uri
You are badly misusing terminology.

1. "splitting at the root" does not mean each root move gets searched to different depth. I split at the root in current crafty. that only means that I do a parallel search on the root moves as well as deeper moves, because this is a far more efficient way to search.

2. You are using what is called "unsynchronized search" where each node searched a different move (or set of moves) at the root, and when iteration N is done, N+1 is started without regard for how the other moves are progressing on other nodes.

this is useless.

How can you choose between depth 21, eval=+1.3, and depth 19, eval +2.5?? You can't. This has been tried in the past, by Newborn, by Schaeffer, and by others. There is no way to compute any sort of equivalence function so that you can decide which of the above is better. The depth 19 move might be even higher at depth 21. Or it might be way lower. The only way to discover this is to search both moves to the same depth. Anything else is beyond hopeless and is a coin flip. You can't even choose between depth 21, +1.5, and depth 19, -2.0, because by depth 21 the -2.0 score might be +5.0...
I can certainly choose and the simplest choice is to choose higher score is better.
depth 19 eval=+2.5 is better than depth 21 eval=+1.3 for the simple reason that 2.5>1.3

I believe that with relatively good evaluation it is not so bad choice(not that splitting at the root is good but it may give effective speed improvement of 2 or more than 2 assuming that all root moves are searched).

I may be wrong and the only way to know is by testing.

Uri
There is absolutely no doubt you are wrong. How many times have you seen a program search a move to depth N with a + score, then at depth n+1 the score drops off the map and it switches to a new move? If you believe what you wrote, why not stop your search when the eval gets to a good point? Why continue to search deeper?

As far as a speedup of > 2, it simply will not/can not happen. Amdahl's law says that the overall speedup for an algorithm is:

speedup = time(N) / T(P)

T(P) = sequential_processing_time + parallel_processing_time / N

where N = number of processors.

If the first move takes 50% of the time, and that is the usual case, then you absolutely can not get a speedup > 2.0. It is impossible. Except for those cases where the first move is not best. Now you search the second move. And it takes the same amount of time. So you do get a bigger boost there. But that only happens about 15% of the time. 85% of the time the speedup is going to be way below 2.0 no matter how many CPUs you use. And there is simply no way to escape the basic sequential property of the alpha/beta algorithm.
Of course the score may change but by the same logic you also cannot compare between scores at the same depths because the program may change its mind in later depth.
I don't see where this nonsense comes from. Two moves, different scores, equal depths. I most certainly can compare them. And choose the largest. But if the depths are different, and the scores are different, I can only flip a coin to choose one. Choosing the best is wrong. Choosing the deepest depth is wrong. Choosing one of the two moves with different depths is just a purely random choice.

You do not know the best move and you always guess based on information and hope to be right.
I trust the search to at least search the two moves in an equal way, so that the resulting backed-up scores can be compared. That's what minimax and alpha/beta is all about.

If it is not the case then you cannot also compare scores at the same depth because practically one move is searched to normal depth and another move is searched to smaller depth because of late move reductions.

It may be bad to compare score in different depth with
bad evaluation and no reductions but it is not the practical case(maybe it was the case 20 or 30 years ago).

Uri
It is still the case today. And I am amazed you don't understand why...
1)With late move reductions even searching to the same depth can be practically different depth so if you cannot compare different depth then you also cannot compare the same depth.
All I can say to that is <sigh>

If you don't think we can compare scores from the same depth, then that would mean you do not believe alpha/beta or minimax work, because that is _exactly_ what they are based on. I'm not going to continue with the apples-to-oranges nonsense. If that is your belief, then that is your belief. It is wrong, however.

2)I think that an interesting experiment that everybody can do may be to test A against B when A search at fixed number of nodes without splitting the root(let say 100,000 nodes per move) and B search after every root move at 50,000 nodes per move.

B choose a move simply by comparing the scores without caring about the fact that depths are different.

I do not know if A wins or B wins and it may be dependent on the program but I see no reason to assume that it is obvious that B cannot win.
The reason B loses is intuitively obvious to the casual observer (who understands alpha/beta minimax). But apparently the point is lost here so just carry that belief with you...

Assuming nodes are nearly proportional to time if
B wins then it suggests that splitting at the root can give speed up of more than 2:1.

Uri
And again, I gave you the math. The math is absolutely infallible. And yet you keep saying 2+2 = 5, in spite of any mathematical analysis you are given...

If we can't communicate via accurate math, I can't communicate with "assuming..." and "maybe..." and "it might..." This is an exact science.
1)I did not claim that you cannot compare scores of different moves on the same depth

My words:
"if you cannot compare different depth then you also cannot compare the same depth."

ack. so you can't compare moves from the same depth (quote from previous post) but you can compare moves from the same depth (quote from this post)???

With late move reductions even searching to the same depth can be practically different depth so if you cannot compare different depth then you also cannot compare the same depth

That seems clear enough to me. I have already stated that you can not compare scores from different depths. So you then say you can not compare the same depth.

My opinion is that you can compare scores from different depth and also can compare scores from the same depth.

Again, depth 19, move A, score = -1.2; depth 21, move B, score = +.7. Which do you play? I will then post both of those in the same search, along with the depth 21 _full search_ which shows which one is actually best. Given better depth _and_ better score, the decision is pretty safe (but not absolutely correct, still). But given different scores and different depths, comparing them greatly increases the error.

deeper search is always better so I always prefer comparing scores from deeper depth if I can do it but if I have only score for move A at depth 19 and score for move B at depth 21 and need to decide which is best then I compare the scores.

practically you also do something that is equivalent to comparing scores at
different depth for a modified Crafty(that does not have check extensions)
imagine that you extend move A because it is check and do not extend move B.

comparing scores at the same depth for move A and move B is the same as comparing scores at different depths for modified Crafty that does not have check extensions.

Not the same at all. Now you have two different searches, and I again would _not_ compare the scores.

2)All your math is simply irrelevant for "unsynchronized search"
The only way to show that my opinion is wrong or right is by testing.

Note that testing of programs of 1980 in case that people did it is irrelevant because today programs are clearly different.

The interesting test is for the best free source programs(Stockfish,Toga,Crafty)

Uri

It is not so interesting to me. I don't have to stick my hand on a hot pan to discover I am going to get burned. I already know the result from past mistakes. I have split at the root. With Cray Blitz. I have _tried_ unsynchronized search at the root. With Cray Blitz. I saw the problems then, and they have not changed one bit to today...

If it was a good idea, everybody would be doing it rather than writing _far_ more complicated search code for the various ways of doing YBW. But it just doesn't work well. And by "well" I mean results vs effort. I don't consider 1.5x faster better unless I am just using 2 nodes. So it is better. It is +never+ going to be 2x faster for the typical cases across a game. It _might_ exceed 2x on rare positions. But it is going to _average_ around 1.5x so long as we are talking alpha/beta minimax.

Uri Blass · Post by **Uri Blass** » Wed Mar 11, 2009 10:43 pm

bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Vasik Rajlich wrote:
bob wrote:
Vasik Rajlich wrote:
bob wrote:
Vasik Rajlich wrote:
Uri Blass wrote:
Dirt wrote:
Vasik Rajlich wrote:
bob wrote:I don't buy the "this hurts Rybka" idea, because the cluster rybka is a joke. And a poor joke at that. There have been some decent cluster-based programs. But Rybka is simply not one of them.
Where did that come from ??

Vas
There is something of an explanation here.
I read this post and I can say 2 things.

1)I think that it is impossible to know the algorithm rybka is using based on output from a single position.

It is possible that something similiar that is not exactly the same is used
when some illogical moves that lose the queen are analyzed but this is not all the story and the algorithm is based partly on "split only at the root" and partly on another idea.

2)I remember that Vas said 100 elo based on testing at fast time control and I suspect that at fast time control you get clearly more than 50 elo per doubling so practically 5 nodes do not give 4:1 speed improvement but clearly less than it(maybe 2.5:1).

Uri
The effective speedup is probably somewhere between 2.5:1 and 3:1 for 5 nodes, which is what Lukas had when he tested all of this.

Now he's up to 9 nodes BTW

Vas
Can we stay in the real world? Splitting at the root can not produce a 2.5x speedup, when the best move at the root takes _way_ over 50% of the total search time. There is theory. There is practice. And there is nonsense. For the event I am talking about, this claim is "nonsense". You might get the uninformed to buy this stuff, but not someone that has been doing it for 30+ years now (my first parallel search played its first ACM event in 1978....)
By effective speedup I mean the time handicap you could give to the original entity and score 50%. So, even if you do nothing other than split at the root and if the first move typically takes 50% of your search time, you could still get an effective speedup of >2. Not that that's what Rybka is doing

Vas
Here's what you need to make that happen.

(1) you need to change your mind at least once at the root during the last couple of iterations. More changes is better.

Sure. If you're not changing your mind, it doesn't matter what kind of speedup you have.

(2) you have to hope that the hash information from the first move does not affect any other move. Fine 70 is a good example of where this can be a problem.

That's just life without shared memory. Any cluster implementation is going to have problems in a position like that.

I think you'd be very lucky to get a speedup of 1.5x with any number of processors, which is not zero of course, but it is not something that will make me quake in my boots either.
With infinite # of processors and splitting only at the root, you will get a lot more than 1.5x.

Vas
No program today have search that is good enough but in theory
if you have a good search the speed up may be smaller from splitting at the root for the simple reason that the value of 1.5x speed improvement is bigger than the value of one ply with no pruning.

I think that
some type of bad evaluation also can cause the speed up to be smaller
or even negative.

Imagine that you search with no pruning and extensions and no qsearch and imagine that you have an evaluation that gives no bonus for the side to move so practically you often evaluate even depth as better than odd depths(maybe it should be the opposite and I did not think about it but the idea is the same)

If you split at the root you may get depth 7 for move A and depth 8 for move B and prefer move B not because it is better but only because you searched to even depth.

This problem does not happen without splitting at the root because without splitting at the root you always get the same depth for all moves.

Uri
You are badly misusing terminology.

1. "splitting at the root" does not mean each root move gets searched to different depth. I split at the root in current crafty. that only means that I do a parallel search on the root moves as well as deeper moves, because this is a far more efficient way to search.

2. You are using what is called "unsynchronized search" where each node searched a different move (or set of moves) at the root, and when iteration N is done, N+1 is started without regard for how the other moves are progressing on other nodes.

this is useless.

How can you choose between depth 21, eval=+1.3, and depth 19, eval +2.5?? You can't. This has been tried in the past, by Newborn, by Schaeffer, and by others. There is no way to compute any sort of equivalence function so that you can decide which of the above is better. The depth 19 move might be even higher at depth 21. Or it might be way lower. The only way to discover this is to search both moves to the same depth. Anything else is beyond hopeless and is a coin flip. You can't even choose between depth 21, +1.5, and depth 19, -2.0, because by depth 21 the -2.0 score might be +5.0...
I can certainly choose and the simplest choice is to choose higher score is better.
depth 19 eval=+2.5 is better than depth 21 eval=+1.3 for the simple reason that 2.5>1.3

I believe that with relatively good evaluation it is not so bad choice(not that splitting at the root is good but it may give effective speed improvement of 2 or more than 2 assuming that all root moves are searched).

I may be wrong and the only way to know is by testing.

Uri
There is absolutely no doubt you are wrong. How many times have you seen a program search a move to depth N with a + score, then at depth n+1 the score drops off the map and it switches to a new move? If you believe what you wrote, why not stop your search when the eval gets to a good point? Why continue to search deeper?

As far as a speedup of > 2, it simply will not/can not happen. Amdahl's law says that the overall speedup for an algorithm is:

speedup = time(N) / T(P)

T(P) = sequential_processing_time + parallel_processing_time / N

where N = number of processors.

If the first move takes 50% of the time, and that is the usual case, then you absolutely can not get a speedup > 2.0. It is impossible. Except for those cases where the first move is not best. Now you search the second move. And it takes the same amount of time. So you do get a bigger boost there. But that only happens about 15% of the time. 85% of the time the speedup is going to be way below 2.0 no matter how many CPUs you use. And there is simply no way to escape the basic sequential property of the alpha/beta algorithm.
Of course the score may change but by the same logic you also cannot compare between scores at the same depths because the program may change its mind in later depth.
I don't see where this nonsense comes from. Two moves, different scores, equal depths. I most certainly can compare them. And choose the largest. But if the depths are different, and the scores are different, I can only flip a coin to choose one. Choosing the best is wrong. Choosing the deepest depth is wrong. Choosing one of the two moves with different depths is just a purely random choice.

You do not know the best move and you always guess based on information and hope to be right.
I trust the search to at least search the two moves in an equal way, so that the resulting backed-up scores can be compared. That's what minimax and alpha/beta is all about.

If it is not the case then you cannot also compare scores at the same depth because practically one move is searched to normal depth and another move is searched to smaller depth because of late move reductions.

It may be bad to compare score in different depth with
bad evaluation and no reductions but it is not the practical case(maybe it was the case 20 or 30 years ago).

Uri
It is still the case today. And I am amazed you don't understand why...
1)With late move reductions even searching to the same depth can be practically different depth so if you cannot compare different depth then you also cannot compare the same depth.
All I can say to that is <sigh>

If you don't think we can compare scores from the same depth, then that would mean you do not believe alpha/beta or minimax work, because that is _exactly_ what they are based on. I'm not going to continue with the apples-to-oranges nonsense. If that is your belief, then that is your belief. It is wrong, however.

2)I think that an interesting experiment that everybody can do may be to test A against B when A search at fixed number of nodes without splitting the root(let say 100,000 nodes per move) and B search after every root move at 50,000 nodes per move.

B choose a move simply by comparing the scores without caring about the fact that depths are different.

I do not know if A wins or B wins and it may be dependent on the program but I see no reason to assume that it is obvious that B cannot win.
The reason B loses is intuitively obvious to the casual observer (who understands alpha/beta minimax). But apparently the point is lost here so just carry that belief with you...

Assuming nodes are nearly proportional to time if
B wins then it suggests that splitting at the root can give speed up of more than 2:1.

Uri
And again, I gave you the math. The math is absolutely infallible. And yet you keep saying 2+2 = 5, in spite of any mathematical analysis you are given...

If we can't communicate via accurate math, I can't communicate with "assuming..." and "maybe..." and "it might..." This is an exact science.
1)I did not claim that you cannot compare scores of different moves on the same depth

My words:
"if you cannot compare different depth then you also cannot compare the same depth."
ack. so you can't compare moves from the same depth (quote from previous post) but you can compare moves from the same depth (quote from this post)???
With late move reductions even searching to the same depth can be practically different depth so if you cannot compare different depth then you also cannot compare the same depth
That seems clear enough to me. I have already stated that you can not compare scores from different depths. So you then say you can not compare the same depth.

My opinion is that you can compare scores from different depth and also can compare scores from the same depth.
Again, depth 19, move A, score = -1.2; depth 21, move B, score = +.7. Which do you play? I will then post both of those in the same search, along with the depth 21 _full search_ which shows which one is actually best. Given better depth _and_ better score, the decision is pretty safe (but not absolutely correct, still). But given different scores and different depths, comparing them greatly increases the error.

deeper search is always better so I always prefer comparing scores from deeper depth if I can do it but if I have only score for move A at depth 19 and score for move B at depth 21 and need to decide which is best then I compare the scores.

practically you also do something that is equivalent to comparing scores at
different depth for a modified Crafty(that does not have check extensions)
imagine that you extend move A because it is check and do not extend move B.

comparing scores at the same depth for move A and move B is the same as comparing scores at different depths for modified Crafty that does not have check extensions.
Not the same at all. Now you have two different searches, and I again would _not_ compare the scores.

2)All your math is simply irrelevant for "unsynchronized search"
The only way to show that my opinion is wrong or right is by testing.

Note that testing of programs of 1980 in case that people did it is irrelevant because today programs are clearly different.

The interesting test is for the best free source programs(Stockfish,Toga,Crafty)

Uri
It is not so interesting to me. I don't have to stick my hand on a hot pan to discover I am going to get burned. I already know the result from past mistakes. I have split at the root. With Cray Blitz. I have _tried_ unsynchronized search at the root. With Cray Blitz. I saw the problems then, and they have not changed one bit to today...

If it was a good idea, everybody would be doing it rather than writing _far_ more complicated search code for the various ways of doing YBW. But it just doesn't work well. And by "well" I mean results vs effort. I don't consider 1.5x faster better unless I am just using 2 nodes. So it is better. It is +never+ going to be 2x faster for the typical cases across a game. It _might_ exceed 2x on rare positions. But it is going to _average_ around 1.5x so long as we are talking alpha/beta minimax.

1)Maybe there is misunderstanding.
I claim that based on the same logic that say that you cannot compare move at different depth you also cannot compare moves at the same depth.
I did not agree with the logic that you cannot compare moves with different depth

2)depth 19, move A, score = -1.2; depth 21, move B, score = +.7

I prefer move B because the score is better.

Of course if I can search move A to depth 21 it is better information but I assume that I have no time for it.

There are cases when the score for move A is going to be 0.8 at depth 21 but I cannot know it.

I can also ask you what you prefer
depth 19 move A score=-1.2 depth 19 move B score=+0.7 and if you say move B I can show you cases that the picture is different if you search both to depth 21.

It is going to prove nothing.

3)I did not claim that unsyncronized splitting at the root is good relative to other parallel search techniques.

If you get effective speed improvement of 2.5 by splitting at the root with 100 processors(so you can use one processor for every root move) and get effective speed improvement of 50 by parallel search of today with 100 processors then of course parallel search of today is better.

My only claim is that it is not obvious that you cannot get effective speed improvement of more than 2 by unsyncronized splitting at the root and you cannot trust results of cray blitz when modern programs use different search and clearly search deeper.

Maybe comparing odd and even plies with cray blitz caused problems but today when programs get higher depth with all reductions and extensions you often get odd and even plies even if you search to the same depth and I do not understand your comment about having 2 different searchs
for crafty without check extensions.

My claim was that comparing depth A with depth A for default Crafty is equivalent to comparing depth A and depth A+1 for modified Crafty with no check extensions.

Saying that comparing depth A and depth A+1 does not make sense for default Crafty and saying that the same idea is logical for modified Crafty
does not seem to me logical.

Uri

Clustering etc. thread

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?

Re: An idea for a new WCCC format - what do you think?