AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sat Dec 09, 2017 4:41 pm

IanO wrote:
Lyudmil Tsvetkov wrote:
BeyondCritics wrote:
cdani wrote:
Will be very interesting to know which was the typical deep achieved by AlphaZero. I bet that much less than Stockfish..
You lost that bet
AlphaZero uses MCTS https://en.wikipedia.org/wiki/Monte_Carlo_tree_search. From this source https://www.arxiv-vanity.com/papers/1712.01815v1/:

Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general-purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root to leaf.
...
At the end of the game, the terminal position is scored according to the rules of the game to compute the game outcome -1 for a loss, 0 for a draw, and +1 for a win.
This is done for training, not for play, so it is not actually a search, but a tuning method.
Why everyone confuses tuning with search?
They are still doing alpha-beta.
Wrong. Unless stated otherwise, the player is identical to the trainer. That is one of the reasons this is such an interesting advance, both a new eval and tuning method and a previously discarded search method.

Your statement is just based on their claims.
Could you explain this MTCS in a bit more detail to me, to tell you why it is simple alpha-beta.
Alpha-beta is synonynous with picking the best move; an aprroach that does not pick the best move is simply impossible theoretically.
Again, why don't you think a bit: 2800 elo on single core, what kind of a breakthrough is that?

Uri Blass · Post by **Uri Blass** » Sat Dec 09, 2017 5:54 pm

Ras wrote:
Lyudmil Tsvetkov wrote:Would not SF on the very same hardware, if adapted, be still 400 elos stronger?
No, it wouldn't, for the same reason that Stockfish doesn't harness the power of available GPUs. These TPUs are quite a different design than CPUs. TPUs are a bit like GPUs modified in hardware to be more efficient in neural networks instead of graphics.

The big thing here is that they have a truck load of simple modules than can perform identical operations on different data at the same time. Perfect for neural networks - and for graphics, which is why graphic hards have long been used for neural networks purposes.

By contrast, conventional CPUs are good at performing different and complex operations on input data at a high rate. That is good for if/then/else-branching, which chess engines essentially do.

Software that was designed for CPUs cannot take advantage of TPUs. The TPU hardware would be useless for Stockfish. The other way round is also not that promising: trying to run a neural network on a CPU doesn't yield performant results.

Is there a problem to emulate alpha zero on a CPU(same algorithm on a CPU)?

I do not care if it is going to be 10 times slower and most engines cannot beat stockfish even with 10:1 time handicap.

Uri Blass · Post by **Uri Blass** » Sat Dec 09, 2017 5:57 pm

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
tmokonen wrote:
Lyudmil Tsvetkov wrote:Eelco, how can you buy into the SCAM too?
The hardware advantage was 50/1.
It plays 1850-elo chess on a single core.

Above diagram is already way way won for black; Stockfish blundered already in the opening with Nce5, this is already lost.

Alpha beating me? Gosh, I will shred it to pieces.
It understands absolutely nothing of closed positions, no such were encountered in the sample.

It is all about the hardware, 2 or 3 beautiful games, with the d4-e5-f6 chain outperforming a whole black minor piece, one great attack on the bare SF king and one more, all the rest is just exceding computations.
Nothing special about its eval.
Just another crappy, misinformed, false bravado post from a guy who quit his job to pursue the quixotic dream of finding the perfect old-school end point evaluation for an alpha beta searcher. He can't accept the fact that his years of painstaking effort have been rendered moot by a project that was just a "meh, let's spend a few hours and see what happens" lark by the team that already conquered Go, a much more complex game than chess.
http://davidsmerdon.com/?p=1970

Go is much simpler than chess, Go's evaluation patterns are exponentially fewer(1000/1) than those in chess.
So you are really very bad at basic knowledge and etiquette, lad.

Alpha is 1850 currently, and will stay like that, a weak engine running on tremendous hardware.

My project will still conquer the world.
1850 engine cannot beat stockfish regardless of hardware.

Even if you assume 100 elo per doubling then you need to be 1000 times faster only to get 2850 that is clearly weaker than top programs and Alpha certainly did not have 1000:1 hardware advantage.
Was not Deep Blue a 2600 engine?

Deep blue was a special hardware designed for chess so it was not an engine.

I believe that
Deep Blue with 200,000,000 nodes per second lose today against Stockfish even if stockfished run on a very slow hardware so it search only 200,000 nodes per second.

Jesse Gersenson · Post by **Jesse Gersenson** » Sat Dec 09, 2017 6:04 pm

Uri Blass wrote:[snip]Alpha certainly did not have 1000:1 hardware advantage.[/snip]

What are you basing that on?

A dual-socket 16-core xeon e5-26xx v4 is about 1-1.5 teraflops.
1 TPU is 180 teraflops. AlphaZero used 4 TPU's and a cpu.

Regardless, a stunning accomplishment - a sharp turn towards the future...there will be many similar turns in the coming years.

Ras · Post by **Ras** » Sat Dec 09, 2017 6:14 pm

Lyudmil Tsvetkov wrote:That is why I said 'adapted'.

Only that "adapted" would mean "complete rewrite", and then it wouldn't be Stockfish anymore. The revolution in computing is both having the self-learning software framework and the hardware to meaningfully run it.

Ras · Post by **Ras** » Sat Dec 09, 2017 6:18 pm

Uri Blass wrote:Is there a problem to emulate alpha zero on a CPU(same algorithm on a CPU)?

Theoretically not, in practice there is. The sequential nature of conventional CPUs means that they have to process neuron by neuron sequentially, more or less. That won't be 10:1, but much more.

And no, this does not mean that Alphazero had a hardware advantage of much more than 10:1 because the requirement to run on x86 is completely arbitrary.

pilgrimdan · Post by **pilgrimdan** » Sun Dec 10, 2017 7:17 am

Lyudmil Tsvetkov wrote:
Ras wrote:
Guenther wrote:Rasmus do you really think he will understand that after all he said before?
I've never seen Lyudmil posting derailing stuff. The main point isn't even technical understanding, it's the perspective that equates computing with doing stuff on x86-like CPUs (or ARMs or whatever). It takes a while to let it sink in that this is a completely different way of computing.

I'm also somewhat astonished about the whole discussion whether the hardware was fair or not. As if Stockfish's defeat had been the point here. We're looking at nothing less than a revolution in computing, and people discuss whether some change here and there might have given a handful of Elo more to Stockfish.
Those are 2 completely different things: revolution in computing and revolution in artificial intelligence.
You rightly state this is a revolution in computing, but why do they only stress the AI part then?

it may be because of this Lyudmil ...

QML 35 days ago [-]
What makes this different from a minimax algorithm with alpha-beta pruning?

gwern 35 days ago [-]

Aside from MCTS being a different tree search method, there is no 'closing of the loop'. In regular MCTS, it is far from unheard of to do the random playouts with instead some 'heavier' heuristic to make the playouts a little better estimators of the node value, but the heavy playouts do not do any kind of learning, the heuristic you start with is what you end with; what makes this analogous to policy iteration (hence the names for the Zero algorithm of 'tree iteration' or 'expert iteration') is that the refined estimates from the multiple heavy playouts are then used to improve the heavy playout heuristic (ie. a NN which can be optimized via backpropagation in a supervised learning of board position -> value). Then in a self-play setting, the MCTS continually refines its heavy heuristic (the NN) until it's so good that the NN+MCTS is superhuman. Then at play time you can drop the MCTS entirely and just use the heavy heuristic to do a very simple tree search to choose a move (which I think might actually be a minimax with a fixed depth but I forget).

https://news.ycombinator.com/item?id=15627340

Milos · Post by **Milos** » Sun Dec 10, 2017 7:30 am

pilgrimdan wrote: gwern 35 days ago [-]

Aside from MCTS being a different tree search method, there is no 'closing of the loop'. In regular MCTS, it is far from unheard of to do the random playouts with instead some 'heavier' heuristic to make the playouts a little better estimators of the node value, but the heavy playouts do not do any kind of learning, the heuristic you start with is what you end with; what makes this analogous to policy iteration (hence the names for the Zero algorithm of 'tree iteration' or 'expert iteration') is that the refined estimates from the multiple heavy playouts are then used to improve the heavy playout heuristic (ie. a NN which can be optimized via backpropagation in a supervised learning of board position -> value). Then in a self-play setting, the MCTS continually refines its heavy heuristic (the NN) until it's so good that the NN+MCTS is superhuman. Then at play time you can drop the MCTS entirely and just use the heavy heuristic to do a very simple tree search to choose a move (which I think might actually be a minimax with a fixed depth but I forget).

https://news.ycombinator.com/item?id=15627340

I don't know who wrote this, but this is just a load of bollocks. It has absolutely nothing to do with how Alpha0 is trained or run.
I really wonder why it is so hard to read AGZ Nature paper??? Or are some simply incapable of comprehending it?

jack512 · Post by **jack512** » Wed Dec 13, 2017 12:21 am

Mr. Kaufman, in the chess.com article on AlphaZero, you were quoted as saying "...AlphaZero had effectively built its own opening book...". Could you elaborate on that statement? Did it build its opening book in the sense that it stored moves in a lookup table for opening positions? Or did its training session compute values for its neural network parameters that were used in its search of the opening positions, in the same way that they would be applied to any other position?

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo