AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Milos · Post by **Milos** » Wed Dec 06, 2017 4:33 pm

kranium wrote:AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...

Alpha0 iz basically behaving like huge highly selective opening book.
However, beside hardware other stuff are highly questionable in this work.
I guess ppl are a bit intimidated to ask question because it is Google, but many things are fishy and unfavourable to SF.
One big disadvantage was TC, 1min/move means SF spent only 1 minute for each of the opening moves while in normal TC like 40/40 it would spend easily 5-10 minutes per each of opening moves. That made it much weaker 20 maybe even 30Elo since most of loses for SF already happen in the opening.
Second is no-book play, where Alpha0 mainly forces openings and lines that it spent most of the time training and SF had no help from book whatsever, so in this case to make it at least a bit more fair one should use strong book such as Cerebellum as a support to SF.
Starting from 12 typical human openings (only 4 moves deep at max), the gap Alpha0 had over SF reduced from 100 to 77Elo which can be seen from the paper.
Third even though they used last year TCEC winner, SF8 has untested behaviour on 64 cores, and on that hardware is at least 30 if not more Elo weaker than the current SFdev.
So taking all into consideration it is pretty safe to assume that latest Brainfish at normal TC like 40/40 would be at list on par if not stronger than Alpha0. And all that on much weaker hardware.
If they really wanted to make fair comparison instead of running Alpha0 on regular x64 one could also run SF on custom hardware where all the evaluation is handled with fully custom implemented FPGAs (like DeepBlue did) and then one would see how much weaker Alpha0 really is, when comparison is not apples and oranges.

MikeGL · Post by **MikeGL** » Wed Dec 06, 2017 4:36 pm

Maybe this thread should be merged to:
http://www.talkchess.com/forum/viewtopi ... w=&start=0

As the topic is the same.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:06 pm

kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.

The hardware advantage is 16/1, so simply ridiculous.
I could not accept such a test in any way.
But then, what have Google done right?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:08 pm

kranium wrote:
Code: Select all
Program Chess Shogi Go
AlphaZero 80k 40k 16k
Stockfish 70,000k
Elmo 35,000k
Table S4: Evaluation speed (positions/second) of AlphaZero, Stockfish, and Elmo in chess,
shogi and Go.

This makes no sense at all.
Alpha 1000 slower?
This would mean 1000 more evaluation terms in its code!
What kind of terms?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:11 pm

kranium wrote:
Milos wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
It actually is, instead of 4TPUs required to run Alpha0 so far, on x64 hardware one would need around 2000 Haswell cores to achieve the same speed of NN (80k patterns evaluated per second). Since NNs are huge, with smaller resources matrix multiplication would have to be broken into smaller sub-matrices which would exponentially slow down the calculation.

AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...

There is no deep eval, eval is static.
How could an entity have 1000 times more terms than SF?

Milos · Post by **Milos** » Wed Dec 06, 2017 5:15 pm

Lyudmil Tsvetkov wrote:
kranium wrote:
Code: Select all
Program Chess Shogi Go
AlphaZero 80k 40k 16k
Stockfish 70,000k
Elmo 35,000k
Table S4: Evaluation speed (positions/second) of AlphaZero, Stockfish, and Elmo in chess,
shogi and Go.
This makes no sense at all.
Alpha 1000 slower?
This would mean 1000 more evaluation terms in its code!
What kind of terms?

Just read the paper, not only 1000 times more, much more than that, there are 4,672 planes just for possible pieces/move to/side to move.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:17 pm

Daniel Shawul wrote:What is different is that alphazero's evaluation selects features of eval by itself (via a nerual network), while in the standard approach the programmer select features (e.g. passsed pawns, king safety, rook on open file etc) and just tunes the weights. The downside of the neural-network approach is that you may not understand why it does what it does.

Daniel

Tell me the precise code, that says nothing to me.
How does it select features, based on what?
Playing 100 000 games, many wins with pawn on d4 or e5, so this is good, or interpreting 100 000 games from some large himan database, e5 pawn is more common in winnig games then d5 pawn, so increase its value.

But that has its limits.
What about mobility, how they figure out mobility from human games?
More importantly, what would an evaluation pattern consist of?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 5:21 pm

Milos wrote:
Lyudmil Tsvetkov wrote:
kranium wrote:
Code: Select all
Program Chess Shogi Go
AlphaZero 80k 40k 16k
Stockfish 70,000k
Elmo 35,000k
Table S4: Evaluation speed (positions/second) of AlphaZero, Stockfish, and Elmo in chess,
shogi and Go.
This makes no sense at all.
Alpha 1000 slower?
This would mean 1000 more evaluation terms in its code!
What kind of terms?
Just read the paper, not only 1000 times more, much more than that, there are 4,672 planes just for possible pieces/move to/side to move.

I browsed over it, but did not pay attention to it.
So, evaluation will depend not only where the piece lands, but also where it comes from.
That is ridiculous, that is already not static evaluation, but indeed, working like some kind of a very sophisticated book.

MikeGL · Post by **MikeGL** » Wed Dec 06, 2017 5:22 pm

Lyudmil Tsvetkov wrote:
kranium wrote:
Milos wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
It actually is, instead of 4TPUs required to run Alpha0 so far, on x64 hardware one would need around 2000 Haswell cores to achieve the same speed of NN (80k patterns evaluated per second). Since NNs are huge, with smaller resources matrix multiplication would have to be broken into smaller sub-matrices which would exponentially slow down the calculation.

AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...
There is no deep eval, eval is static.
How could an entity have 1000 times more terms than SF?

Come on Lyudmil, read the paper. AlphaZero don't use eval functions.
It probably don't know about a Rook being 5.0 or a pawn being 1.0 just like most engines.
It is using MCTS, and doing a lot of probability checks with 1-0, 0.5, 0-1 at the end of the
search Then choosing that path/route which has more wins or draws and fewest losses.

edit-
least=fewest

jhellis3 · Post by **jhellis3** » Wed Dec 06, 2017 5:22 pm

Tell me the precise code

I lol'd.

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo