In chess,AlphaZero outperformed Stockfish after just 4 hours

vvarkey · Post by **vvarkey** » Mon Dec 18, 2017 1:47 pm

And yet they trained against the most common positions in a human database and they did it 100 games per position and they did it against SF.

100 games with most common positions from human database were matches just like the main match. this was after all the training was complete. training was only self-play and nothing else.

Rebel · Post by **Rebel** » Mon Dec 18, 2017 1:50 pm

kranium wrote:
Rebel wrote:I don't care that SF lost, it's totally irrelevant in the light of the huge claim by the Deepmind company, the alleged 4 hours self-play, quoting the document again: without any additional domain knowledge except the rules of the game.

Have you already let it sink in what is stated here?

No mobility, no king safety, no passed pawn evaluation, no castling knowledge, not even piece values?

How would that first self-play game look like? Something 1.a3 a6 2.a4 a5 3. b3 b6 etc and how would that lead to anything for the second self-pay game?

And so I voted for option 3.
Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning.

And how are wins, losses, draws defined? With or without domain knowledge?

hgm · Post by **hgm** » Mon Dec 18, 2017 1:50 pm

yurikvelo wrote:Please clarify on A0.

Can it analyze arbitrary FEN position or it's learn-tree is based on games of strong engines?

A0 = Google's Alpha Zero. See the many recent threads about this in the various forum sections.

kranium · Post by **kranium** » Mon Dec 18, 2017 2:08 pm

Rebel wrote:
kranium wrote:
Rebel wrote:I don't care that SF lost, it's totally irrelevant in the light of the huge claim by the Deepmind company, the alleged 4 hours self-play, quoting the document again: without any additional domain knowledge except the rules of the game.

Have you already let it sink in what is stated here?

No mobility, no king safety, no passed pawn evaluation, no castling knowledge, not even piece values?

How would that first self-play game look like? Something 1.a3 a6 2.a4 a5 3. b3 b6 etc and how would that lead to anything for the second self-pay game?

And so I voted for option 3.
Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning.
And how are wins, losses, draws defined? With or without domain knowledge?

w/d/l record for each position, is simply the accumulated result of the self-play games.
The training begins with no stored knowledge...a blank slate.

Rebel · Post by **Rebel** » Mon Dec 18, 2017 2:10 pm

Michael Sherwin wrote:There are so many contradictory claims and contradictions in the pre release paper. No domain specific knowledge, no human games database, only self play. And yet they trained against the most common positions in a human database and they did it 100 games per position and they did it against SF. And we are supposed to believe it was after the main 100 game match. For this press release why mention the training games supposedly after the main match. What purpose does it serve except to cloud the issue and cast doubt. In RomiChess over the years it was demonstrated many times that Romi could win 100 game matches against the top engines when each game started from the same position. And here they are doing the exact same thing. But of course that was after the match reported when it served no purpose as far as the reported match. It just looks fishy!

Very Stockfishy

About the red, if that's true (and with their computer power I suppose it is possible) they more or less played these games from a big (learned) book and that would make their statement "no domain knowledge" not even a lie

Glad to read someone with experience in this area.

Vinvin · Post by **Vinvin** » Mon Dec 18, 2017 2:16 pm

Rebel wrote:...
the alleged 4 hours self-play
...

"4 hours" can be misleading because Google team can use 1 machine or 10 machines or 100 machines or ...

Rebel · Post by **Rebel** » Mon Dec 18, 2017 2:31 pm

Vinvin wrote:
Rebel wrote:...
the alleged 4 hours self-play
...
"4 hours" can be misleading because Google team can use 1 machine or 10 machines or 100 machines or ...

You got it.

hgm · Post by **hgm** » Mon Dec 18, 2017 2:33 pm

Rebel wrote:And how are wins, losses, draws defined? With or without domain knowledge?

Just according to FIDE rules. Wins = checkmate, draws = stalemate, 50-move, 3-fold-rep or insufficient material. This counts as game rules, which of course are domain specific. You obviously cannot learn a game without knowing the rules.

I think you overestimate the difficulty of learning something that pervades Chess as universally as piece values. Initially the games will be fully random, but 15% of those will already end in checkmate. Usually it will be the side with the superior material that inflicts the checkmate, so there will be a correlation between winning a game and capturing more material value (and promoting) during it. So patterns that happen to detect a LxH capture (say a NxQ possibility in the 3x3 board area they are examining) will quickly be enhaced to recommend that move, and to recognize it more accurately (with fewer false positives).

I guess it would be pretty easy to experiment with this: just write a random mover that can adapt the probability with which it selects the various kind of captures. (Distinguished by attacker type and victim type, so basically a 6x6 table. With legal moves King will not be a possible victim, but empty squares will be.) Initialize these probabilities randomly, and let it play games against itself. After every game that ended in checkmate, increase the probability of all the move types that the winner played by a tiny amount, and decrease the probablilty of all the move types the loser played by a similar amount (but not to below zero). Then renormalize the probablities so that they again sum up to 1. This should fairly quickly teach it which captures are always profitable, after which it would play far better than purely random. Of course there is a limit to how well you can play by judging moves only by attacker and victim, but it should be enough to crush a purely random mover, which doesn't judge the moves at all.

If you want to make somewhat more effort, you could assign each colored piece type a random material key (fixed during the experiment), and add the keys of all attackers (of both colors) to a square, plus 666 (or whatever) times the key of the occupant and the capturing piece, and use that modulo 4096 as index to a 4K-entry table with probabilities to play the move. Initialize the table randomly. So now you have a program that plays with 4096 parameters instead of 36. Then apply the same training procedure: keep track during the game of how often each table entry provided the a move for player A and player B, and, after the game, increase those used by the winner, and decrease those used by the loser (proportional to the number of times they were used). This should teach it to approximate SEE with the table.

It would probably work even better when you consider 7th-rank Pawns as a different piece type from other Pawns (i.e. give them their own key), so that it would be possible for the program to figure in promotions.

Rebel wrote:About the red, if that's true (and with their computer power I suppose it is possible) they more or less played these games from a big (learned) book and that would make their statement "no domain knowledge" not even a lie

Playing full games from a prepared book is completely impossible. The game tree of Chess is just waaaaay too big for that. You really must be able to have a algorithm to select a winning move in positions you have never seen before, because you will be out of book before one quarter of the game is over, most likely in a nearly equal position if you have a serious opponent.

If you can train against a (nearly) deterministic opponent, you can of course 'book him up'. But that was not done here. The NN in the Alpha Zero that was playing Stockfish in the match games had never seen Stockfish before. Seeing what moves Stockfish prefers in given positions counts as domain-specific knowledge.

shrapnel · Post by **shrapnel** » Mon Dec 18, 2017 2:58 pm

Ovyron wrote: Yup, I think true Artificial Intelligence has finally arrived, and it can do things like this and others that I would have never imagined to be possible.

Some examples of similar AIs:

AI can extract the style of a photo and turn another photo into that style
AI can learn how to make paintings of any artist of history and use any image to show how that artist would have painted it.
AI takes text as input and creates new photo realistic images indistingishable from actual photos.
AI learns how humans lips move when talking, so it can sync a video of anybody to any audio talking.
AI learns how celebrities look like and can invent new faces for fake ones that look real.
AI learns how art looks like, so it can turn your doodles into works of art.
AI learns how video works, so it can predict the future and create videos from still images
AI learns how images become pixelated when you scale them down and manages to reverse the process, turning pixelated messes into High Resolution images.
AI learns how visual expressions work and can swap the expressions of two people.
AI can turn your sketches into photo realistic images.
AI learns how to play non-deterministic video games just like humans.

And a lot more things...

Frankly, I find some of these much more impressive than a chess engine with 3500 elo, that we knew was eventually coming.

Coming from the AI field I can say I find nothing strange about AI learning things from scratch, you just teach it what can it can do (say, the rules of chess) and the output you want (say, winning the game), and the AI learns a way about how to do it.

I expect that soon you can invent new games and teach your AI the rules, and soon get the 3500 elo equivalent of that game in ELO. I handn't seen A0 lose a single chess game so who knows if it already plays perfectly.

We're still early on this, though. In the future we might have AIs able to learn to write books with useful info, or write the next part of a book in a trilogy writen by a human. What about an AI that can generate movies? You can feed it all the Disney Classics before Toy Story, and it could output a brand new classic that some other AI can't tell apart from the originals.

Down the road, this chess playing AI will look like peanuts.

Good luck trying to convince the closed minds (of whom there are many) here about anything new.
Its OK. I suppose such people have always been around since Time began.
Galileo and Copernicus were ridiculed, taunted and persecuted for stating that the Earth revolved around the Sun by people just like here in the Forum.
Closed minds were there in every Century and that too in significant numbers ; why should we expect things to be any different in this one ?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Mon Dec 18, 2017 3:01 pm

Rebel wrote:I don't care that SF lost, it's totally irrelevant in the light of the huge claim by the Deepmind company, the alleged 4 hours self-play, quoting the document again: without any additional domain knowledge except the rules of the game.

Have you already let it sink in what is stated here?

No mobility, no king safety, no passed pawn evaluation, no castling knowledge, not even piece values?

How would that first self-play game look like? Something 1.a3 a6 2.a4 a5 3. b3 b6 etc and how would that lead to anything for the second self-pay game?

And so I voted for option 3.

Where is your 4th option: BS?

In chess,AlphaZero outperformed Stockfish after just 4 hours

From the document - In chess, AlphaZero outperformed Stockfish after just 4 hours. How believable is that?

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h

Re: In chess,AlphaZero outperformed Stockfish after just 4 h