If You have cluster/PC with 100 CPUs...

Discussion of chess software programming and technical issues.

Moderator: Ras

FlavusSnow
Posts: 89
Joined: Thu Apr 01, 2010 5:28 am
Location: Omaha, NE

Re: If You have cluster/PC with 100 CPUs...

Post by FlavusSnow »

I'd classify a multi-pv search as getting better information (a secondary PV) without searching deeper. And searching nodes which we know are not needed to be visited by todays algorithms. I'm just suggesting that maybe a different style of search could use this additional information to extend depth where it counts to decide between the two PVs. Maybe this wouldn't be as good as going an extra ply deep across the whole root node, or maybe it wouldn't find any better move at all...I'm just theorizing here.

Every algorithm that I know of approaches a depth where going one more ply deeper requires too much time to be practical in a tournament time control. Some smart person will figure out a better use of this time.
FlavusSnow
Posts: 89
Joined: Thu Apr 01, 2010 5:28 am
Location: Omaha, NE

Re: If You have cluster/PC with 100 CPUs...

Post by FlavusSnow »

To continue the analogy... if the house you're trying to build doesn't have enough work for 200 people to work on at the same time, I'd build a bigger house. Otherwise you'll run in to the fact that 20 people could get the house built in 99% of the time it would take 200 people.

So to bring it back to chess, maybe those less efficient CPUs could be gathering more information, running a second eval function depending on the type of position, or running a different style of search all together. Maybe a few of them could run a very risky search that then would be verified if it thinks it found something. At the end of the day, taking 4 cores away from 32 would not noticeably affect the main search.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: If You have cluster/PC with 100 CPUs...

Post by bob »

FlavusSnow wrote:To continue the analogy... if the house you're trying to build doesn't have enough work for 200 people to work on at the same time, I'd build a bigger house. Otherwise you'll run in to the fact that 20 people could get the house built in 99% of the time it would take 200 people.

So to bring it back to chess, maybe those less efficient CPUs could be gathering more information, running a second eval function depending on the type of position, or running a different style of search all together. Maybe a few of them could run a very risky search that then would be verified if it thinks it found something. At the end of the day, taking 4 cores away from 32 would not noticeably affect the main search.
But you broke the analogy. You would build a _single_ bigger house, so that you use the result when it is finished. Searching different root moves with different nodes _guarantees_ that in the typical case, 97% of the effort will be thrown away (assuming the usual branching factor of 35, where 34 of those moves will not be the final choice...

In the current approach, most effort is spent on the "best move". In the proposed case, almost all the effort is spent on moves that will never be played.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: If You have cluster/PC with 100 CPUs...

Post by bob »

FlavusSnow wrote:I'd classify a multi-pv search as getting better information (a secondary PV) without searching deeper. And searching nodes which we know are not needed to be visited by todays algorithms. I'm just suggesting that maybe a different style of search could use this additional information to extend depth where it counts to decide between the two PVs. Maybe this wouldn't be as good as going an extra ply deep across the whole root node, or maybe it wouldn't find any better move at all...I'm just theorizing here.

Every algorithm that I know of approaches a depth where going one more ply deeper requires too much time to be practical in a tournament time control. Some smart person will figure out a better use of this time.
Getting better information for _you_ maybe. Not for the program. We don't need to know the score of the 2nd best move. Just knowing it is 2nd best is enough to prove it is not going to be played...

The current branching factor is 2.0 or less. You _never_ reach a point where you can't go deeper, until you have used at least 1/2 of your total alotted time.
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: If You have cluster/PC with 100 CPUs...

Post by hgm »

One interesting piece of information, relevant to this debate:

Last week, for the first time in history, a professional Shogi player was beaten in by a machine at a standard match time control. The machine was actually a _collective_ of 4 different strong Shogi programs, each running on a cluster. They were each thinking on the position, and then voting for the move to play. In case of a voting tie, the move of Gekisashi (the strongest program) was picked.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: If You have cluster/PC with 100 CPUs...

Post by bob »

hgm wrote:One interesting piece of information, relevant to this debate:

Last week, for the first time in history, a professional Shogi player was beaten in by a machine at a standard match time control. The machine was actually a _collective_ of 4 different strong Shogi programs, each running on a cluster. They were each thinking on the position, and then voting for the move to play. In case of a voting tie, the move of Gekisashi (the strongest program) was picked.
I think we are so far away from solving go, Shogi, etc, compared to chess, that this is not an unreasonable approach. But with chess, the search has become so refined, we are seeing "the truth" in many positions today, and going deeper to reach the truth seems more worthwhile than any sort of Monte-Carlo approximation. Obviously not for non-chess/non-checkers, yet...
FlavusSnow
Posts: 89
Joined: Thu Apr 01, 2010 5:28 am
Location: Omaha, NE

Re: If You have cluster/PC with 100 CPUs...

Post by FlavusSnow »

I'm not saying you unroll the search completely at the root. I know I don't have most of the answers here, but bear with me. Current algorithms don't even identify the 2nd best move (as you pointed out, they don't need to... they just need to know its not best).

I'll have to pull some random positions out and do a little study in order to have some backup, but the time it takes to go from depth x to depth x+1 grows exponentially. And I can do many depth x searches in the same time as it takes to do a search at depth x+1. If an engine such as stockfish identifies the top 3 moves, I can rest pretty confidently that the true best move is one of those three. If I were analysing one of my own games, I would move forward a move or two along each of the top three PVs and see how the score changes. Its not perfect, but it adds a couple ply depth to the PV and second best move by ignoring all the other moves. I guess it'd be like a very risky form of LMR at the root node.

I understand what you're saying Dr. Hyatt, but I'd rather move up the search tree along the PV with multiple depth x searches than to sit and wait for depth x+1 . Humans manually manipulate engines to help us analyse positions using this method. I don't know if an engine were to do it itself, would it yield a higher level of play?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: If You have cluster/PC with 100 CPUs...

Post by bob »

FlavusSnow wrote:I'm not saying you unroll the search completely at the root. I know I don't have most of the answers here, but bear with me. Current algorithms don't even identify the 2nd best move (as you pointed out, they don't need to... they just need to know its not best).

I'll have to pull some random positions out and do a little study in order to have some backup, but the time it takes to go from depth x to depth x+1 grows exponentially.

Today, it does not. Most good programs have an effective branching factor of 2.0 (or less). That means that for any search to some depth D, going to D+1 takes 2x longer than the search to depth D. For searching minimax trees, even using pure alpha/beta, you are correct, since we are talking about N = W^D nodes searched were W is somewhere in the 35-38 move range. But with pruning today, reductions, null-move search, that W value is down to 2 or less. So while it is exponential, the effective branching factor at 2.0 or less makes it quite tractable.




And I can do many depth x searches in the same time as it takes to do a search at depth x+1.
Actually, you can do two. :)

If an engine such as stockfish identifies the top 3 moves, I can rest pretty confidently that the true best move is one of those three. If I were analysing one of my own games, I would move forward a move or two along each of the top three PVs and see how the score changes. Its not perfect, but it adds a couple ply depth to the PV and second best move by ignoring all the other moves. I guess it'd be like a very risky form of LMR at the root node.

I understand what you're saying Dr. Hyatt, but I'd rather move up the search tree along the PV with multiple depth x searches than to sit and wait for depth x+1 . Humans manually manipulate engines to help us analyse positions using this method. I don't know if an engine were to do it itself, would it yield a higher level of play?
Do the math. You are at depth D. Searching each move at the same time. It will take as long to search them all as it will to search just one. So you use 38x the horsepower to search to depth D as you would if you used all processors. And since that is a bit over 2^5 (38 compared to 32) we could go 5 plies _deeper_ while you are searching to the nominal depth D. (since we are 5 doublings faster using all 32 processors on one tree we go 5 plies deeper with an effective branching factor of 2.0) Which sounds better?
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: If You have cluster/PC with 100 CPUs...

Post by Milos »

hgm wrote:One interesting piece of information, relevant to this debate:

Last week, for the first time in history, a professional Shogi player was beaten in by a machine at a standard match time control. The machine was actually a _collective_ of 4 different strong Shogi programs, each running on a cluster. They were each thinking on the position, and then voting for the move to play. In case of a voting tie, the move of Gekisashi (the strongest program) was picked.
This would be a great thing to try with chess.
For example pick up 4 strongest programs (e.g. R4, Houdini 1.03, SF1.9 and Naum 4.2) a give each 2 cores and chose the strongest move by voting and let it play against for example 8 cores R4 (or Houdini).
On 8 cores R4 would probably win, but if experiment is repeated with 16 cores R4 against 4 quads I would not be so sure about the result.
User avatar
hgm
Posts: 28387
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: If You have cluster/PC with 100 CPUs...

Post by hgm »

It would be especially attractive on clusters, where the extra CPU power cannot be used as efficiently as when you run on the same machine with a shared hash table. With shared hash using a single conventional SMP engine is probably best. But with a few separate quads or octals, it would make sense to try this.