AlphaZero Chess is not that strong ...

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 20, 2017 5:42 pm

mclane wrote:Its very obvious that the bean counter who computed 70.000.000 NPS
and came 23-27 searches deep were completely smashed away by the 80.000 NPS b-strategy neural network.

This is what all humans see, all humans who play chess, no matter which elo they have, see how stupid the bean counter stockfish looked.
With its pieces nearly in the beginning position while alpha zero had everything developed and lots of space, stockfish had to move backwards and almost had Zugzwang positions were it was helpless.

Ask the grandmasters who comment or the world chess champions who commented.

Stop trolling . What is the thing in the games you don’t understand ?

Alpha zero played very idealistic and speculative chess.
It played for a plan while stockfish had no clue what was going on.

The approach to create chess strength by alpha beta failed.
The paradigm came to an end.
There is now a new boss in town. And it plays a completely different chess.

Google: the new boss in town...

They even can't search their own search engine.

How good you are at chess to be able to assess how Alpha and SF played?

Adam Hair · Post by **Adam Hair** » Wed Dec 20, 2017 5:44 pm

CheckersGuy wrote:Isn't the "doubling time control" gives +50 elo"-statement from ages ago ? Does it still hold today ? I don't belive it Is there any statistics from Komodo,Houdini, SF on how well they do with double the time ?

Search for "Elo doubling" in this forum. Several of us have investigated the Elo difference when the time control is doubled, most definitively by Andreas Strangmuller.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 20, 2017 5:45 pm

mclane wrote:
Milos wrote:
mclane wrote:Its very obvious that the bean counter who computed 70.000.000 NPS
and came 23-27 searches deep were completely smashed away by the 80.000 NPS b-strategy neural network.

This is what all humans see, all humans who play chess, no matter which elo they have, see how stupid the bean counter stockfish looked.
With its pieces nearly in the beginning position while alpha zero had everything developed and lots of space, stockfish had to move backwards and almost had Zugzwang positions were it was helpless.

Ask the grandmasters who comment or the world chess champions who commented.

Stop trolling . What is the thing in the games you don’t understand ?

Alpha zero played very idealistic and speculative chess.
It played for a plan while stockfish had no clue what was going on.

The approach to create chess strength by alpha beta failed.
The paradigm came to an end.
There is now a new boss in town. And it plays a completely different chess.
Czub, you are a proven troll that can't play chess. This is a known fact.
So I'll just leave you in your delusional bubble. Sorry for disturbance, my bad.
Any chess player in the world sees what was going on.
It seems to me the only person in this forum who is not willing to see the truth is you.
Alpha zero beated the search tree beasts.
The bean counters.
Yesterday Komodo , Houdini and stockfish were the best chess program chess programming of the last 50 years was capable to build, and suddenly today, within a day after the event, these whole 50 years was reset by a machine that taught itself how to play chess.

And nothing can turn the wheel back.

It’s a new paradigm and this paradigm is not called a strategy but b strategy.

You can of course continue to call me an idiot, but the idiot are the people who replay the 10 games we have and don’t see the plot/Clou.

Unfortunately, Alpha is the bean-counter, as it is 2800 elo engine, 500 elos below the top 3.
This is obvious, why deny it?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 20, 2017 5:52 pm

hgm wrote:
Lyudmil Tsvetkov wrote:Did not it? What about all those tuning games on the Framework?
Is not this a kind of a reinforcement learning?
No, it is not.

* Reinforcement learning is where you let the system do its thing, and then change it to encourage or discourage what it did, depending on whether it did what you want.
* Supervised learning is where you give the system examples of what it should so, and encourage it to do the same
* Tuning is where you first change the system, and then see if it does now better what you want, and keep the change if it does.

All by itself, so you still insist there was no code involved in Alpha apart from the game rules?
Of course. It is what they say, and there is no reason at all to doubt them.

Lyudmil Tsvetkov wrote:You got me totally confused, what the hell is the NN, is it a code, a machine, a combination of patterns or a self-learning oddity?
That seems to be your natural state...

In principle, a NN is a machine, very similar to the human brain. It consists of 'cells' that can be stimulated to get active or stay passive by other cells connected to them. The strength of the connections ('weights') is adaptable, and by changing them the NN can be programmed ('trained') to generate a response (like a move) from a certain input (like an image or a Chess diagram).

In practice the NN is simulated as a virtual machine inside another computer. This means that there is a program that specifies how the cells are connected, keeps track of the current weight of all the conections, and shuttles the activation signals around.

CPUs like found in PCs are not very good in doing the things needed to simulate a NN; most of their transistors are doing other things not useful for the NN at all. So Google developed TPUs, which are chips that only do what is needed to simulate a NN. So a given NN isn't slowed down as much when running on a TPU as it would be when running on an ordinary CPU. The TPU is programmed to simulate a NN of general capabilities (but adapted in size to the board of the game it will be used for), and calculate what the output of the NN would be when a given Chess position would be presented to the net as input. Initially the NN contains no useful knowledge (all conections have random weights).

So, according to you, there is no code at all involved in Alpha?
There is code to simulate the NN, which in principle can be configured to handle NN of any size, and in practice is programmed to simulate an NN with a design good for learning board games. This code must know the board size and the number of participating piece types.

There is code to perform a Monte-Carlo Tree Search, presenting the positions that the search encounters to the NN, forcing legality upon the moves suggested by the NN in response (i.e. ignoring any illegal suggestions), and then searching the legal moves that it suggests. This code knows the rules for moving pieces, checking, game end.

Eventually all knowledge about strategy and tactics is in the weights of the NN, which were tuned during self-play according to the programmed rules, on basis of the programmed recogition of the game result. Without any human touch.

So, it does not even know winning is good?
Is not that rule a human-written code?

hgm wrote: There is code to perform a Monte-Carlo Tree Search, presenting the positions that the search encounters to the NN, forcing legality upon the moves suggested by the NN in response (i.e. ignoring any illegal suggestions), and then searching the legal moves that it suggests.

And I thought there was not a code...

hgm · Post by **hgm** » Wed Dec 20, 2017 5:55 pm

Michael Neish wrote:If AlphaGo had a x16 hardware advantage (as someone mentioned sometime ago), then assuming a branching factor of 2, a x16 speed improvement for Stockfish would amount to an extra 4 plies or 200 Elo points (if the above figure is correct).

But it didn't really have a hardware advantage. It had similarly powerful hardware. It was just better adapted to what running general neural networks need, in which CPUs designed to run conventional programs are very poor. These use most of their transistors for doing things that help Stockfish, but would be completely useless for NNs.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 20, 2017 5:55 pm

Have you run any games yourself to know that alphaZero is 2800 elo ?

If you haven't your point is not valid

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 20, 2017 5:58 pm

Michael Neish wrote:For traditional engines, how many extra Elo points does an extra ply of search add to its strength? I thought 50 points, but that's from memory. It could have been 30 or something entirely different.

If AlphaGo had a x16 hardware advantage (as someone mentioned sometime ago), then assuming a branching factor of 2, a x16 speed improvement for Stockfish would amount to an extra 4 plies or 200 Elo points (if the above figure is correct).

Any comments?

Completely true, just that the hardware advantage was more than 16/1 and, most importantly, the lack of opening book for SF decided most of the games.

hgm · Post by **hgm** » Wed Dec 20, 2017 6:00 pm

Lyudmil Tsvetkov wrote:So, it does not even know winning is good?
Is not that rule a human-written code?

Winning is by definition the good outcome. As said, the rules have to be programmed; the machine cannot know by magic or telepathy whether you want it to play Chess, Go, Shogi or whatever. The wining condition is part of the game rules.

And I thought there was not a code...

So you were wrong. Like everything you think is always wrong. You should invert everything you think as a standard procedure...

CheckersGuy · Post by **CheckersGuy** » Wed Dec 20, 2017 6:00 pm

That "50 points for 1 ply"-figure is really old and doesn't hold anymore. Sure, stockfish would have been stronger on 16x the hardware but nowhere near +200 elo

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 20, 2017 6:03 pm

Laskos wrote:
Vinvin wrote:The results of the match AZ vs SF8 is translated in +100 Elo for AZ.
The weakened part of Stockfish for this match is already known :
1) Only 1 GB hashtable (64GB to 128GB would have been more way more suitable)
2) 1 minute by move fixed time control
3) No opening book for Stockfish (AZ saved his opening knowledge from its previous games)
4) Only version 8 (the current development version is already about 40 Elo above)

With this 4 points upgraded to a regular level, the current Stockfish version would been already 100 Elo stronger.

One more thing, when you look at the graph : https://www.dropbox.com/s/kaq37zf0z66d6 ... g.png?dl=0
you see there's no progress from 300,000 to 700,000 steps. This AZ is already at his best !
To summarize a bit about what I read in this thread (aside that troll beating Stockfish).

1/ Elo terminology is a bit misleading here.

Take these 3 results:

+2 -0 =8
+20 -0 =80
+200 -0 =800

Elo-wise the difference is the same. What can be said about the last result is that the advantage is HUGE, and only the drawishness of the Chess is distorting Elo perceptions. Elo becomes a bit irrelevant, and something like Normalized Elo or WILO is more useful to describe the results more straightforwardly.

I don't know why people are forgetful about Checkers paradigm, where +2 -0 =98 result was considered decisive. Elo-wise it's +7 Elo points, more precisely 7 +/- 10 Elo (2SD) points, not even decisive.

And all in all, all your objections of handicapping SF8 are almost irrelevant.

2/ I hope DM will soon improve on their NN MCTS, for example by including chess specific features in training and by using stronger hardware in their actual games (it scales well). Then aside playing Stockfish in whatever favorable conditions (huge hash size and such ), let them play an almost deterministic super-A0 against Cerebellum enabled super-A0 in self-games. We might see that all our best opening books are junk full of blunders. And if a super-improved almost deterministic A0 will want to play almost the same openings with transpositions all the time, it might mean that other openings are blunders.

3/ Elo Plateauing.
The result does indicate an Elo plateau not far away. It is only a hint, but I hope DM will continue improve, as they did with Go, and we will see how much Elo they gain. Again, Elo becomes a bit of nuisance for measuring strength in these conditions.

4/ By introducing non-weakening small non-determinism (which anyway happens with parallel processing), I am curious to see the draw rate of improved A0 in self games at this long time control. Without an opening book, again, it's entire possible that all our opening books are junk distorting the outcome.

5/ I don't understand people's criticism of a preprint in arXiv. They came with an outstanding result and rushed to publicize it. Google or me, like Google, I also do my best to get publicity when I find something important and interesting. It's a normal practice, Google or whoever

I guess you should have read precisely 'the troll', to scientifically understand what happens.
I don't understand why people believe so much in established theory: there is no such thing, all the most important theory is to be built in the future, so I simply don't read any theory at all, preferring to think on my own instead.

AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...

Re: AlphaZero Chess is not that strong ...