AlphaGo Zero And AlphaZero, RomiChess done better

Steve Maughan · Post by **Steve Maughan** » Thu Dec 07, 2017 4:43 pm

I remember the experiments at the time. Could you briefly explain what you did? From memory I recall you did the following:

At the end of the game you parsed the list of moves and adjusted the score up or down a certain number of centipawns based on the outcome. You then hashed each position and stored it in a learning file. I assume this is then loaded into the hash table at the start of each game. Is this broadly correct?

Thanks,

Steve

Milos · Post by **Milos** » Thu Dec 07, 2017 5:50 pm

kranium wrote:No human games were loaded. Learning was accomplished thru millions of self-play games
The monte carlo search algorithm simply chose the move in each position with the highest win probability.

How do you explain these paragraphs from the paper:

"Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters"

"We represent the policy π(a|s) by a 8 × 8 × 73 stack of planes encoding a probability distribution over 4,672 possible moves. Each of the 8×8 positions identiﬁes the square from which to “pick up” a piece."

"The number of games, positions, and thinking time varied per game due largely to different board sizes and game lengths, and are shown in Table S3."

So when playing self-played games positions used for training are taken from the games randomly (since position is part of set of training parameters). So what about starting positions of those 44 million training games? You think they were all random, or initial starting position and they had no chess knowledge in them????
Give me a break, thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.
Ofc in Table 2 they nicely show just percentages not actual numbers so you can't judge how many training games in total were from the starting position, because someone could be smart and sum up all those games from Table 2 and figure the number doesn't match 44 million...

Btw. 700'000 training iterations times 800 MTCS is already 56 million, not 44, so where did 12 million games disappear?

kranium · Post by **kranium** » Thu Dec 07, 2017 6:27 pm

Milos wrote:
kranium wrote:No human games were loaded. Learning was accomplished thru millions of self-play games
The monte carlo search algorithm simply chose the move in each position with the highest win probability.
How do you explain these paragraphs from the paper:
"Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters"

"We represent the policy π(a|s) by a 8 × 8 × 73 stack of planes encoding a probability distribution over 4,672 possible moves. Each of the 8×8 positions identiﬁes the square from which to “pick up” a piece."

"The number of games, positions, and thinking time varied per game due largely to different board sizes and game lengths, and are shown in Table S3."
So when playing self-played games positions used for training are taken from the games randomly (since position is part of set of training parameters). So what about starting positions of those 44 million training games? You think they were all random, or initial starting position and they had no chess knowledge in them????
Give me a break, thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.
Ofc in Table 2 they nicely show just percentages not actual numbers so you can't judge how many training games in total were from the starting position, because someone could be smart and sum up all those games from Table 2 and figure the number doesn't match 44 million...

Btw. 700'000 training iterations times 800 MTCS is already 56 million, not 44, so where did 12 million games disappear?

My understanding is that "randomly initialised parameters" is not the same as loading human games.

Yes I assume (because it has not been made clear by Google) that the self-play games all started from the traditional start position.
AlphaZero would quickly realize that it was winning more often after 1. d4 than after 1. f3 for ex.

Milos · Post by **Milos** » Thu Dec 07, 2017 6:42 pm

kranium wrote:Yes I assume (because it has not been made clear by Google) that the self-play games all started from the traditional start position.
AlphaZero would quickly realize that it was winning more often after 1. d4 than after 1. f3 for ex.

They quote 100'000 games from chessbase and their batches are per 100'000 iterations of 800 MTCS simulations, another "coincidence"?

Again let me quote myself (btw. what do you think how many f3 openings are between those 100'000 openings from chessbase):

thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.

kranium · Post by **kranium** » Thu Dec 07, 2017 6:57 pm

Milos wrote:
kranium wrote:Yes I assume (because it has not been made clear by Google) that the self-play games all started from the traditional start position.
AlphaZero would quickly realize that it was winning more often after 1. d4 than after 1. f3 for ex.
They quote 100'000 games from chessbase and their batches are per 100'000 iterations of 800 MTCS simulations, another "coincidence"?

Again let me quote myself (btw. what do you think how many f3 openings are between those 100'000 openings from chessbase):
thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.

Ah I see what you're saying...

From the PDF:

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the
most common human openings (those played more than 100,000 times in an online database
of human chess games (1)). Each of these openings is independently discovered and played
frequently by AlphaZero during self-play training. When starting from each human opening,
AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum
of chess play.

I guess it can be interpreted in a couple ways, I understood that they analyzed the finished games to see how often common human openings were followed.

(How else to explain "Each of these openings is independently discovered and played
frequently by AlphaZero during self-play training" ?)

Milos · Post by **Milos** » Thu Dec 07, 2017 7:07 pm

kranium wrote:I guess it can be interpreted in a couple ways, I understood that they analyzed the finished games to see how often common human openings were followed. (It does say that these opening were independently discovered by Alpha0).

Ofc, this is what I would do and it is a no-brainer. I don't doubt they came up with a more elaborate and efficient training scheme.
Out of 100'000 interations with 800 sims per iteration, 50'000 I would take root position, the rest from those 100'000 opening positions, I limit them to 10 moves or something (removing transpositions), sort the them per frequency and give them as starting position for those 50'000 iterations proportional to their frequency.
Those 50k root iterations are more than enough to derive those statistics from Table 2 and further bias the network towards those opening it assumes as advantages.
Even what they have now is kind of embarrassing, coz for B40 Sicilian, they get only +38Elo (20 wins to 9 losses), huge difference from +100 Elo from root (much more than what standard engines have), so constructing an anti-alpha0 book that would completely naturalize it would be piece of cake once one had access to those training games!

MonteCarlo · Post by **MonteCarlo** » Thu Dec 07, 2017 8:42 pm

Well, it doesn't follow that an anti book would be a piece of cake.

The example opening you picked where it performed relatively (heavy emphasis on "relatively") poorly is one you couldn't force no matter what book you gave SF.

No amount of book magic will let SF force an opponent that always meets 1.e4 with 1...e5 to get in to the 2...e6 Sicilian

Milos · Post by **Milos** » Thu Dec 07, 2017 8:47 pm

MonteCarlo wrote:Well, it doesn't follow that an anti book would be a piece of cake.

The example opening you picked where it performed relatively (heavy emphasis on "relatively") poorly is one you couldn't force no matter what book you gave SF.

No amount of book magic will let SF force an opponent that always meets 1.e4 with 1...e5 to get in to the 2...e6 Sicilian

SF lost almost all games as black. Don't you think if SF played 1...c5 and eventually 2..e6 would cut down that number of losses significantly (30% according to Table 2 data)?
And this is just by using 2 moves book (of size 2 bytes).

MonteCarlo · Post by **MonteCarlo** » Thu Dec 07, 2017 8:52 pm

Same problem, though.

Hard for SF to get into the 2...e6 Sicilian as black against an opponent that opens almost exclusively 1.d4

Also, SF's score as black in that opening was still not particularly good; most of the reason AlphaZero's overall score in the B40 games was so low was because of the negative score it had with black, not because of its score when SF was black.

Overall AlphaZero won 40% of games with white; in B40, it won 34% of games with white.

That counts as something, for sure (although 100 games is a relatively small sample), but again, kind of moot if your opponent almost invariably plays 1.d4

Milos · Post by **Milos** » Thu Dec 07, 2017 9:03 pm

MonteCarlo wrote:Same problem, though.

Hard for SF to get into the 2...e6 Sicilian as black against an opponent that opens almost exclusively 1.d4

How did you conclude that?
From Table 2 it is not obvious. Yes most Alpha0 wins came from d4, but that tells more about SF weakness in particular opening, not that d4 was mostly played.

Again constructing the opening book would be quite easy if one had access to training games from last 100'000 iterations. So enough is last 8 million games or so.
You'd know statistics exactly i.e. openings that Alpha0 played the most, you just need to avoid those as much as possible and steer the game into positions that were never trained. Since you'd have full games, you could actually make statistics on similarity of positions reached after move 10 or 15 and try to steer your book into opposite direction.
Steering is easy because you'd know always what Alpha0 would play and with which probability.
It is the same way ppl construct anti-books for Cerebellum for example.

AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better