AlphaGo and Stockfish played on similar hardware

vvarkey · Post by **vvarkey** » Mon Dec 18, 2017 6:19 am

Michael Sherwin wrote:
vvarkey wrote:Ignore all the training that went into AlphaZero for a second.
You really cannot do that, not even for a millisecond. The training (did I read 44 million games?) is worth more than 1,000 elo and probably much more. The learning was guided by NN to learn on the most promising lines thus narrowing the field. A0 could have gotten winning positions against SF without ever leaving its learn file. The rest of the positions were so good that the chess playing algorithm of A0 could then get a win or at least draw. Believe me I know as I've seen RomiChess play entire games from its learn file. Even if the learn file does not produce a move to play immediately the fact that the whole subtree of the current position with its learned values are loaded into the hash causes the search to return much stronger moves on average. You can't ignore the training, it is 90% of the strength of A0.

in the 10 published games, it seems like Stockfish got out of the openings OK. though it's only 10 selected games out of 100 games, so a limited sample.

also, DeepMind ran 100 games matches from each of the 12 "most popular human openings" from 365Chess. AlphaZero dominated these. except B40 Sicilian as Black, where Stockfish got 7 wins to AlphaZero's 3.

Milos · Post by **Milos** » Mon Dec 18, 2017 6:39 am

Dirt wrote:How well AlphaZero handles FRC positions without specific training for them is the question I was getting at. We disagree on how well it would do, and without a way to do the actual test I see no way to know for sure which of us is correct.

We can't test it but still there are some pretty good indications.
When playing from the chess opening position A0 was 100Elo stronger than SF.
When playing from Sicilian B40 (the opening that A0 trained very little) which was only 2 moves deep, A0 advantage drop to 38.5Elo. Huge drop in performance for just 2 moves further from the root, don't you think?
And surprise, surprise, only one ply further, i.e. playing the same position as black, SF managed 7 victories to only 3 of A0, i.e. SF was already +28Elo (5 plies only from the root position)!!!
I think Michael has a pretty good guess that A0 wouldn't have a shining performance in FRC.

Dirt · Post by **Dirt** » Mon Dec 18, 2017 7:06 am

Milos wrote:We can't test it but still there are some pretty good indications.
When playing from the chess opening position A0 was 100Elo stronger than SF.
When playing from Sicilian B40 (the opening that A0 trained very little) which was only 2 moves deep, A0 advantage drop to 38.5Elo. Huge drop in performance for just 2 moves further from the root, don't you think?
And surprise, surprise, only one ply further, i.e. playing the same position as black, SF managed 7 victories to only 3 of A0, i.e. SF was already +28Elo (5 plies only from the root position)!!!
I think Michael has a pretty good guess that A0 wouldn't have a shining performance in FRC.

Stockfish won seven times with white.

And no, some position had to be the worst for AlphaZero. There's no huge meaning in which one it was.

vvarkey · Post by **vvarkey** » Mon Dec 18, 2017 7:32 am

Dirt wrote:
Milos wrote:We can't test it but still there are some pretty good indications.
When playing from the chess opening position A0 was 100Elo stronger than SF.
When playing from Sicilian B40 (the opening that A0 trained very little) which was only 2 moves deep, A0 advantage drop to 38.5Elo. Huge drop in performance for just 2 moves further from the root, don't you think?
And surprise, surprise, only one ply further, i.e. playing the same position as black, SF managed 7 victories to only 3 of A0, i.e. SF was already +28Elo (5 plies only from the root position)!!!
I think Michael has a pretty good guess that A0 wouldn't have a shining performance in FRC.
Stockfish won seven times with white.

And no, some position had to be the worst for AlphaZero. There's no huge meaning in which one it was.

You're right, my quote was wrong. From the paper, these are the scores from AlphaZero's perspective:

B40: Sicilian Defence
w 17/31/2, b 3/40/7

So, playing B40 as black, AlphaZero won 3, drew 40 and lost 7

hgm · Post by **hgm** » Mon Dec 18, 2017 9:18 am

mjlef wrote:... the quotes about power per TFLOP does not really tell us, but the above confirms TPUs are much faster at neural nets than a standard Intel chip.

I don't think anyone contests that. But TPUs are completely different from CPUs, and neural-network simulations have completely different needs from decision-taking software like Stockfish. It is always difficult to compare totally different hardware, but focussing on the metric that happens to be important for the neural networks (TFLOPS) definitely biases the comparison. You could just as well compare on performance traits important for Stockfish, e.g. number of compare & branch instructions per second. Then the CPU is suddenly infinitely more powerful than the TPU, making it sound like Stockfish had a massive hardware advantage.

Fair metrics are the number of transistors, chip area, or power consumption. (These heavily correlate with each other.) According to these metrics, the TPU are sigificantly smaller devices than CPUs. So the idea that AlphaZero was running on some massive super-computer is completely wrong. It was just that it performed the same task of playing Chess in a totally different way, which did not only use completely different software, but also completely different hardware. It didn't need more transistors to do that; they were just wired differently.

Milos · Post by **Milos** » Mon Dec 18, 2017 12:19 pm

Dirt wrote:Stockfish won seven times with white.

Correct. Still SF won 2 games with black while A0 won 3 games with black. That is only +7Elo for A0. Down from +100 from a chess starting position.

And no, some position had to be the worst for AlphaZero. There's no huge meaning in which one it was.

This is not some position. This is the position where A0 had the smallest amount of selfplaying games.
There is obviously a link between number of training games and performance of A0. And again this is only 2 moves into the game, where things didn't change that much on the board from the starting position and you often have many transpositions. We are not talking about FRC type of position which are very different.
With very little or no training games A0 would very probably perform very poorly.

Milos · Post by **Milos** » Mon Dec 18, 2017 12:31 pm

hgm wrote:Fair metrics are the number of transistors, chip area, or power consumption. (These heavily correlate with each other.) According to these metrics, the TPU are sigificantly smaller devices than CPUs. So the idea that AlphaZero was running on some massive super-computer is completely wrong. It was just that it performed the same task of playing Chess in a totally different way, which did not only use completely different software, but also completely different hardware. It didn't need more transistors to do that; they were just wired differently.

Using TDP and number of transistors is a pointless and faulty metric coz these chips are not something you buy on the street for pennies but are basically unattainable except for expensive price as a part of cloud services.
The only fair comparison would be to compare to what would be the cost of running such a system. Since we don't have TPUs available and 1TPU is roughly 2 GV100 or 10 1080Ti, a machine capable of running A0 would cost roughly 9k$ (you also have to include high-end x86 CPU necessary to run MCTS that they didn't even bother to mention in the paper) compared to 3k$ (for 32 cores single CPU machine). So the price is at least 3 times larger.
If you run SF on 32 core machine using 4 FPGA cards as dedicated hardware for move generator, qsearch and evaluation, you'd easily get 20-40x the number of nodes SF makes on 32 core machine alone. Do you really think that running SF with 2 billion nodes per second would not help drastically in performance?

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 12:38 pm

Dirt wrote:
Michael Sherwin wrote:If I understand A0's learning approach and I think that I do then all the pretraining at classic chess would be useless against Fischer Random unless it transposes somehow and the A0 learned tree can handle transpositions. However, A0 could train 44 million games on all FR positions with the same effect.
How well AlphaZero handles FRC positions without specific training for them is the question I was getting at. We disagree on how well it would do, and without a way to do the actual test I see no way to know for sure which of us is correct.

What some people are not getting yet is that the learning system in A0 is at its basic most level identical to the learning I invented 11+ years ago for RomiChess. It shows the same strengths and has the same weaknesses. Even the few descriptive clues are not incompatible with the learning in RomiChess. That is why I need very little information to surmise things about A0. I invented the basic learning system of A0. So if anyone can see through what they did it is me because I understand the evidence better. But it is okay to disagree if you want!

hgm · Post by **hgm** » Mon Dec 18, 2017 12:47 pm

More hardware always helps, this has never been an issue. But I don't think AlphaZero was running on 32 CPU cores + 4 TPUs. It seems to me that 1 CPU core would be more than enough to handle 80knps, and that the total task has the NN simulation by the TPUs as a severe bottleneck. Using one 80486 CPU + 8TPUs would probably give more ps than 32 haswells + 4TPUs. So having Stockfish run on 32 CPU cores, where perhaps 4 cores have the complexity of a TPU, and giving it 4 FPGA in addition, doesn't seem a fair comparison. I would say 1 low-performance CPU plus 4 FPGA would probably be the closest equivalent. How fast Stockfish would be with this hardware involves a large amount of speculation.

Whether the hardware can be bought on the free market or not should not be relevant for the question of whether it is similar. What is not for sale now could be for sale tomorrow, and then what was 20 times as powerful would suddely be only half as powerful when it is offered cheaply? I don't think so...

Milos · Post by **Milos** » Mon Dec 18, 2017 2:57 pm

hgm wrote:It seems to me that 1 CPU core would be more than enough to handle 80knps, and that the total task has the NN simulation by the TPUs as a severe bottleneck. Using one 80486 CPU + 8TPUs would probably give more ps than 32 haswells + 4TPUs. So having Stockfish run on 32 CPU cores, where perhaps 4 cores have the complexity of a TPU, and giving it 4 FPGA in addition, doesn't seem a fair comparison. I would say 1 low-performance CPU plus 4 FPGA would probably be the closest equivalent. How fast Stockfish would be with this hardware involves a large amount of speculation.

This is so wrong on so many levels.
Do you understand that TPU has basically only 5 instructions, load weight, load inputs, multiply, activate and write results to CPU.
Every higher level DCNN operation has to be handled by the CPU, all data manipulation for inference through layers. And this is around 1GB of weights manipulations 80000 times per second!!!
Plus on top of that traversing the whole tree, since you start from the root, and at each node performing action calculation for each move and once you reach leaf node, expand, prepare all the data for TPU and once priors are calculated backpropagate action value to the root.
And you think that could be done of 80486 CPU????
Gee, how can you be so clueless?
It really makes no sense discussing these stuff with you, when you don't have even the basic knowledge.
Again they didn't even bother to write that they used heavy CPU for that in the paper, most probably the same CPU they used to run SF on. So they had 4 TPUs extra. Just a tiny bit of difference.
The way they wrote that paper is shameless. If that is really the version they submitted for peer-review that would be just outrages if it got accepted.

Whether the hardware can be bought on the free market or not should not be relevant for the question of whether it is similar. What is not for sale now could be for sale tomorrow, and then what was 20 times as powerful would suddely be only half as powerful when it is offered cheaply? I don't think so...

This pretty irrelevant and pointless.
What you write is just wishful thinking. Tomorrow we could have 256 cores x86 CPU for 1000$ so what?
What is relevant is that if you can't buy it, you can only make estimation based on what you can buy. And in that respect 1TPU is like 2GV100 that cost 3k$ per piece. So 4TPUs=8x3k =24k$. Or if you like 1TPU=10x1080Ti. 1080Ti is 600$. 4TPUs=40x600=24k$, again the same math.
For 24k$ you can buy 4 top of the range FPGA boards + 2x 32core x86 CPUs easily.

AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware

Re: AlphaGo and Stockfish played on similar hardware