AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

MikeGL · Post by **MikeGL** » Wed Dec 06, 2017 4:36 pm

Maybe this thread should be merged to:
http://www.talkchess.com/forum/viewtopi ... w=&start=0

As the topic is the same.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:06 pm

kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.

The hardware advantage is 16/1, so simply ridiculous.
I could not accept such a test in any way.
But then, what have Google done right?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:08 pm

kranium wrote:
Code: Select all
Program Chess Shogi Go
AlphaZero 80k 40k 16k
Stockfish 70,000k
Elmo 35,000k
Table S4: Evaluation speed (positions/second) of AlphaZero, Stockfish, and Elmo in chess,
shogi and Go.

This makes no sense at all.
Alpha 1000 slower?
This would mean 1000 more evaluation terms in its code!
What kind of terms?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:11 pm

kranium wrote:
Milos wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
It actually is, instead of 4TPUs required to run Alpha0 so far, on x64 hardware one would need around 2000 Haswell cores to achieve the same speed of NN (80k patterns evaluated per second). Since NNs are huge, with smaller resources matrix multiplication would have to be broken into smaller sub-matrices which would exponentially slow down the calculation.

AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...

There is no deep eval, eval is static.
How could an entity have 1000 times more terms than SF?

Milos · Post by **Milos** » Wed Dec 06, 2017 5:15 pm

Lyudmil Tsvetkov wrote:
kranium wrote:
Code: Select all
Program Chess Shogi Go
AlphaZero 80k 40k 16k
Stockfish 70,000k
Elmo 35,000k
Table S4: Evaluation speed (positions/second) of AlphaZero, Stockfish, and Elmo in chess,
shogi and Go.
This makes no sense at all.
Alpha 1000 slower?
This would mean 1000 more evaluation terms in its code!
What kind of terms?

Just read the paper, not only 1000 times more, much more than that, there are 4,672 planes just for possible pieces/move to/side to move.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:17 pm

Daniel Shawul wrote:What is different is that alphazero's evaluation selects features of eval by itself (via a nerual network), while in the standard approach the programmer select features (e.g. passsed pawns, king safety, rook on open file etc) and just tunes the weights. The downside of the neural-network approach is that you may not understand why it does what it does.

Daniel

Tell me the precise code, that says nothing to me.
How does it select features, based on what?
Playing 100 000 games, many wins with pawn on d4 or e5, so this is good, or interpreting 100 000 games from some large himan database, e5 pawn is more common in winnig games then d5 pawn, so increase its value.

But that has its limits.
What about mobility, how they figure out mobility from human games?
More importantly, what would an evaluation pattern consist of?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:21 pm

Milos wrote:
Lyudmil Tsvetkov wrote:
kranium wrote:
Code: Select all
Program Chess Shogi Go
AlphaZero 80k 40k 16k
Stockfish 70,000k
Elmo 35,000k
Table S4: Evaluation speed (positions/second) of AlphaZero, Stockfish, and Elmo in chess,
shogi and Go.
This makes no sense at all.
Alpha 1000 slower?
This would mean 1000 more evaluation terms in its code!
What kind of terms?
Just read the paper, not only 1000 times more, much more than that, there are 4,672 planes just for possible pieces/move to/side to move.

I browsed over it, but did not pay attention to it.
So, evaluation will depend not only where the piece lands, but also where it comes from.
That is ridiculous, that is already not static evaluation, but indeed, working like some kind of a very sophisticated book.

MikeGL · Post by **MikeGL** » Wed Dec 06, 2017 5:22 pm

Lyudmil Tsvetkov wrote:
kranium wrote:
Milos wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
It actually is, instead of 4TPUs required to run Alpha0 so far, on x64 hardware one would need around 2000 Haswell cores to achieve the same speed of NN (80k patterns evaluated per second). Since NNs are huge, with smaller resources matrix multiplication would have to be broken into smaller sub-matrices which would exponentially slow down the calculation.

AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...
There is no deep eval, eval is static.
How could an entity have 1000 times more terms than SF?

Come on Lyudmil, read the paper. AlphaZero don't use eval functions.
It probably don't know about a Rook being 5.0 or a pawn being 1.0 just like most engines.
It is using MCTS, and doing a lot of probability checks with 1-0, 0.5, 0-1 at the end of the
search Then choosing that path/route which has more wins or draws and fewest losses.

edit-
least=fewest

jhellis3 · Post by **jhellis3** » Wed Dec 06, 2017 5:22 pm

Tell me the precise code

I lol'd.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:23 pm

MikeGL wrote:Maybe this thread should be merged to:
http://www.talkchess.com/forum/viewtopi ... w=&start=0

As the topic is the same.

We might as well compete which will get longer.

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo