AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Henk · Post by **Henk** » Wed Dec 06, 2017 5:24 pm

From now on we can start cloning AlphaZero instead of Stockfish. Problem is I still don't understand how this AlphZero works for article is very summier.

Can put your alpha beta search and your current evaluation code in the dustbin.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:26 pm

MikeGL wrote:
Lyudmil Tsvetkov wrote:
kranium wrote:
Milos wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
It actually is, instead of 4TPUs required to run Alpha0 so far, on x64 hardware one would need around 2000 Haswell cores to achieve the same speed of NN (80k patterns evaluated per second). Since NNs are huge, with smaller resources matrix multiplication would have to be broken into smaller sub-matrices which would exponentially slow down the calculation.

AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...
There is no deep eval, eval is static.
How could an entity have 1000 times more terms than SF?
Come on Lyudmil, read the paper. AlphaZero don't use eval functions.
It probably don't know about a Rook being 5.0 or a pawn being 1.0 just like most engines.
It is using MCTS, and doing a lot of probability checks with 1-0, 0.5, 0-1 at the end of the
search Then choosing that path/route which has more wins or draws and least losses.

It uses eval, of course.
How would it pick the best move then?

clumma · Post by **clumma** » Wed Dec 06, 2017 5:53 pm

Daniel Shawul wrote:What is different is that alphazero's evaluation selects features of eval by itself (via a nerual network), while in the standard approach the programmer select features (e.g. passsed pawns, king safety, rook on open file etc) and just tunes the weights.

The other difference is that Alpha uses MCTS not alpha-beta. Paper says nps used in these games is only 80,000!

Also worth noting that Alpha trained for 4 hours, compared to many years of painstakingly tuning Stockfish!

-Carl

clumma · Post by **clumma** » Wed Dec 06, 2017 5:56 pm

Lyudmil Tsvetkov wrote:Alpha had considerable hardware advantage

That comparison is not straightforward, but this claim does not seem to be true. SF had 64 threads. I'm not up on the latest scaling behavior of the engine but that has got to be near saturation.

-Carl

Milos · Post by **Milos** » Wed Dec 06, 2017 6:02 pm

clumma wrote:
Daniel Shawul wrote:What is different is that alphazero's evaluation selects features of eval by itself (via a nerual network), while in the standard approach the programmer select features (e.g. passsed pawns, king safety, rook on open file etc) and just tunes the weights.
The other difference is that Alpha uses MCTS not alpha-beta. Paper says nps used in these games is only 80,000!

Also worth noting that Alpha trained for 4 hours, compared to many years of painstakingly tuning Stockfish!

4 hours my ass (pardon my french). Try training it on state-of-the-art 1080.
Fully trained network requres 12h on 5000 gen1 TPUs for self-games and 64 gen2 TPUs for training itself.
Gen1 TPU is like 30x K80 which is like 5x 1080 in performance.
So you'd need like 375k training days with 1080, which is like 1000 years!!!

MikeGL · Post by **MikeGL** » Wed Dec 06, 2017 6:03 pm

clumma wrote:
Lyudmil Tsvetkov wrote:Alpha had considerable hardware advantage
... but that has got to be near saturation.

-Carl

Excellent point.

I mean at that hardware, there won't be any fluctuation on the best candidate move by SF8 anyway.

Rémi Coulom · Post by **Rémi Coulom** » Wed Dec 06, 2017 6:11 pm

Milos wrote:4 hours my ass (pardon my french). Try training it on state-of-the-art 1080.
Fully trained network requres 12h on 5000 gen1 TPUs for self-games and 64 gen2 TPUs for training itself.
Gen1 TPU is like 30x K80 which is like 5x 1080 in performance.
So you'd need like 375k training days with 1080, which is like 1000 years!!!

Your math is wrong. I think it is doable with a distributed effort smaller than what was used for Stockfish.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 6:11 pm

clumma wrote:
Lyudmil Tsvetkov wrote:Alpha had considerable hardware advantage
That comparison is not straightforward, but this claim does not seem to be true. SF had 64 threads. I'm not up on the latest scaling behavior of the engine but that has got to be near saturation.

-Carl

From what I gleaned from hardware comparisons, the advantage is 16/1.
Why would one want to run a similar very unfair match?
Only one thing comes to mind: that the company will want to advertise its colossal breakthrough with TPUs and artificial intelligence and then sell its products.

But then, the achievement is not there.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 6:16 pm

MikeGL wrote:
clumma wrote:
Lyudmil Tsvetkov wrote:Alpha had considerable hardware advantage
... but that has got to be near saturation.

-Carl
Excellent point.

I mean at that hardware, there won't be any fluctuation on the best candidate move by SF8 anyway.

Alpha hardware equivalent was somewhere 1024 standard cores.
How 1024 cores compare with 64 cores?
How scientific is that.

I don't know what saturation you are talking about, from what I read, without fully understanding it, the TPUs are a very different architecture and quite differently affected by general computer chess concepts.

fern · Post by **fern** » Wed Dec 06, 2017 6:17 pm

so chess is doomed
and in 25 years, us.

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo