Intell 48-core CPU scales to 1k cores

rbarreira · Post by **rbarreira** » Tue Nov 23, 2010 3:32 pm

Gian-Carlo Pascutto wrote:
rbarreira wrote: But the average branching factor is higher in Go, correct? Around 200 instead of 35 or so I've heard. So it should be easier to parallelize a Go tree.
This is only true if you would not prune anything (minimax). See previous post.

Why is it only true without pruning? Even with pruning, having more moves to distribute among CPUs can't hurt, and probably helps. In chess, if you're at an ALL node and you have a 100-core CPU, you don't have enough moves for all the CPUs. In Go you would.

In any case, the Monte Carlo part can be parallelized very easily, which makes me wonder how you were so quick to dismiss the value of more cores for Go.

Dann Corbit · Post by **Dann Corbit** » Tue Nov 23, 2010 3:49 pm

rbarreira wrote:
Gian-Carlo Pascutto wrote:
rbarreira wrote: But the average branching factor is higher in Go, correct? Around 200 instead of 35 or so I've heard. So it should be easier to parallelize a Go tree.
This is only true if you would not prune anything (minimax). See previous post.
Why is it only true without pruning? Even with pruning, having more moves to distribute among CPUs can't hurt, and probably helps. In chess, if you're at an ALL node and you have a 100-core CPU, you don't have enough moves for all the CPUs. In Go you would.

In any case, the Monte Carlo part can be parallelized very easily, which makes me wonder how you were so quick to dismiss the value of more cores for Go.

I suspect that the Monte Carlo approach for Go is a desperation move because the branching factor is so bad. Since the tree is too big to search effectively, lots of samples are taken to make an educated guess {I am extrapolating here, because I have not spent much time studying Go programs}.

On the other hand, if the Monte Carlo approach scales better with massive thread/process count, where are the ultra-strong (compared to single threaded) Go engines?

I guess that if it worked, we would see Beowulf cluster Go engines pounding the stuffings out of their few-threaded brethren.

rbarreira · Post by **rbarreira** » Tue Nov 23, 2010 3:51 pm

Dann Corbit wrote:On the other hand, if the Monte Carlo approach scales better with massive thread/process count, where are the ultra-strong (compared to single threaded) Go engines?

I guess that if it worked, we would see Beowulf cluster Go engines pounding the stuffings out of their few-threaded brethren.

http://www.springerlink.com/content/vln834835x52686m/

With a single slave and five seconds per move our algorithm scores 40.5% against GNU Go, with sixteen slaves and five seconds per move it scores 70.5%.

Seems to me like a great advantage from multi-threading.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Nov 23, 2010 4:04 pm

rbarreira wrote: Why is it only true without pruning?

Because the quoted numbers are only equal to the effective branching factor for pure minimax. Even just alpha-beta makes them (approximately) the square root. For a strong chessprogram, the effective factor is 2, not 35. For most strong go programs, it's not going to be very different, although the best-first formulation of the search might make a definition harder.

Even with pruning, having more moves to distribute among CPUs can't hurt, and probably helps. In chess, if you're at an ALL node and you have a 100-core CPU, you don't have enough moves for all the CPUs. In Go you would.

Again, only if you don't prune anything and want to search all those moves at the same moment in time. Which is simply not a good idea for building a strong program, and not how any strong Go program works.

(And no decent parallel chess or go problems is affected by the situation you describe anyway)

In any case, the Monte Carlo part can be parallelized very easily,

Easily, yes. But is it any good? Let's check a recent paper: Chaslot et al. (2008), using leaf parallelization (parallelizing the Monte Carlo part):

- raw speedup of 16 for 16 threads (sounds good)
- strength speedup of 2.4 for 16 threads (oops!!)

which makes me wonder how you were so quick to dismiss the value of more cores for Go.

I'm not saying many cores are not valuable for Go. I'm saying they're exactly as valuable as for a chess searcher, and that it's a completely wrong illusion to think that Go is somehow fundamentally easier than chess for that.

Dann Corbit · Post by **Dann Corbit** » Tue Nov 23, 2010 4:08 pm

rbarreira wrote:
Dann Corbit wrote:On the other hand, if the Monte Carlo approach scales better with massive thread/process count, where are the ultra-strong (compared to single threaded) Go engines?

I guess that if it worked, we would see Beowulf cluster Go engines pounding the stuffings out of their few-threaded brethren.
http://www.springerlink.com/content/vln834835x52686m/

With a single slave and five seconds per move our algorithm scores 40.5% against GNU Go, with sixteen slaves and five seconds per move it scores 70.5%.
Seems to me like a great advantage from multi-threading.

With 1 thread it is about -67 Elo compared to Gnu Go.
With 16 threads it is about +150 Elo
4 doublings of compute power gives about 217 Elo.
At 54 Elo per doubling, that is about the same as ordinary Chess increases through threading.

Consider from CEGT:

Code: Select all

Stockfish 1.8 x64 1CPU 3090 17 17 950 62.4% 3003 44.7% 
Stockfish 1.8 x64 2CPU 3129 17 17 886 57.1% 3080 46.4% +51
Stockfish 1.8 x64 4CPU 3181 12 12 2187 72.2% 3015 37.9% +52
Stockfish 1.8 x64 6CPU 3213 22 22 500 65.6% 3101 48.8% +32 &#40;only 50% bump in strength&#41;

On the other hand, it is certainly not worse, and to hold that ratio up through 16 threads is very good (some chess programs drop off in benefit with increasing thread count).

So there may be some added spunk to additional threads for Go, but it is nothing dramatic.

rbarreira · Post by **rbarreira** » Tue Nov 23, 2010 4:16 pm

Dann Corbit wrote: So there may be some added spunk to additional threads for Go, but it is nothing dramatic.

Obviously it couldn't be dramatic, unless the elo-per-doubling was much higher for Go than for Chess (in which case computers would be improving really fast at Go).

to hold that ratio up through 16 threads is very good

Yes, I think that was the original point of the poster who mentioned Go in this thread. With 32 threads it doesn't seem to get any worse:

http://www.mail-archive.com/computer-go ... 08145.html

95% winning rate with 32 threads vs 1 thread, which means 94 elo per doubling. (different program I think)

rbarreira · Post by **rbarreira** » Tue Nov 23, 2010 4:26 pm

Gian-Carlo Pascutto wrote: I'm not saying many cores are not valuable for Go. I'm saying they're exactly as valuable as for a chess searcher, and that it's a completely wrong illusion to think that Go is somehow fundamentally easier than chess for that.

Show me a chess searcher which gets near 100 elo per doubling (or even 50?) with 32 threads.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Nov 23, 2010 4:27 pm

Dann Corbit wrote: On the other hand, it is certainly not worse, and to hold that ratio up through 16 threads is very good (some chess programs drop off in benefit with increasing thread count).

So there may be some added spunk to additional threads for Go, but it is nothing dramatic.

You have to take into account the efficiency of the single-cpu search as well. Chess has had much more development and finetuning there compared to Go. Take out all innovations of the last 15 years in a chessprogram and the scaling will improve a lot.

As a further example: 9x9 Go has a (theorethical) branching factor averaging about 40, yet you will find many papers where people have problems getting as much as a strength speedup of 2 out of an arbitrary amount of cores, i.e. this is even worse than chess despite a higher (theorethical) branching factor.

Dann Corbit · Post by **Dann Corbit** » Tue Nov 23, 2010 4:28 pm

rbarreira wrote:
Dann Corbit wrote: So there may be some added spunk to additional threads for Go, but it is nothing dramatic.
Obviously it couldn't be dramatic, unless the elo-per-doubling was much higher for Go than for Chess (in which case computers would be improving really fast at Go).

to hold that ratio up through 16 threads is very good
Yes, I think that was the original point of the poster who mentioned Go in this thread. With 32 threads it doesn't seem to get any worse:

http://www.mail-archive.com/computer-go ... 08145.html

95% winning rate with 32 threads vs 1 thread, which means 94 elo per doubling. (different program I think)

That is quite a significant difference.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Tue Nov 23, 2010 4:39 pm

rbarreira wrote: Obviously it couldn't be dramatic, unless the elo-per-doubling was much higher for Go than for Chess (in which case computers would be improving really fast at Go).

IIRC about 120 ELO per doubling, about twice as much as for chess. (And yes, computers are improving rapidly, but the difference between human ranks is also big)

Show me a chess searcher which gets near 100 elo per doubling (or even 50?) with 32 threads.

Impossible regardless of scaling because of the argument you already identified above.

Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores

Re: Intell 48-core CPU scales to 1k cores