AlphaGo Zero And AlphaZero, RomiChess done better

Milos · Post by **Milos** » Thu Dec 07, 2017 7:07 pm

kranium wrote:I guess it can be interpreted in a couple ways, I understood that they analyzed the finished games to see how often common human openings were followed. (It does say that these opening were independently discovered by Alpha0).

Ofc, this is what I would do and it is a no-brainer. I don't doubt they came up with a more elaborate and efficient training scheme.
Out of 100'000 interations with 800 sims per iteration, 50'000 I would take root position, the rest from those 100'000 opening positions, I limit them to 10 moves or something (removing transpositions), sort the them per frequency and give them as starting position for those 50'000 iterations proportional to their frequency.
Those 50k root iterations are more than enough to derive those statistics from Table 2 and further bias the network towards those opening it assumes as advantages.
Even what they have now is kind of embarrassing, coz for B40 Sicilian, they get only +38Elo (20 wins to 9 losses), huge difference from +100 Elo from root (much more than what standard engines have), so constructing an anti-alpha0 book that would completely naturalize it would be piece of cake once one had access to those training games!

MonteCarlo · Post by **MonteCarlo** » Thu Dec 07, 2017 8:42 pm

Well, it doesn't follow that an anti book would be a piece of cake.

The example opening you picked where it performed relatively (heavy emphasis on "relatively") poorly is one you couldn't force no matter what book you gave SF.

No amount of book magic will let SF force an opponent that always meets 1.e4 with 1...e5 to get in to the 2...e6 Sicilian

Milos · Post by **Milos** » Thu Dec 07, 2017 8:47 pm

MonteCarlo wrote:Well, it doesn't follow that an anti book would be a piece of cake.

The example opening you picked where it performed relatively (heavy emphasis on "relatively") poorly is one you couldn't force no matter what book you gave SF.

No amount of book magic will let SF force an opponent that always meets 1.e4 with 1...e5 to get in to the 2...e6 Sicilian

SF lost almost all games as black. Don't you think if SF played 1...c5 and eventually 2..e6 would cut down that number of losses significantly (30% according to Table 2 data)?
And this is just by using 2 moves book (of size 2 bytes).

MonteCarlo · Post by **MonteCarlo** » Thu Dec 07, 2017 8:52 pm

Same problem, though.

Hard for SF to get into the 2...e6 Sicilian as black against an opponent that opens almost exclusively 1.d4

Also, SF's score as black in that opening was still not particularly good; most of the reason AlphaZero's overall score in the B40 games was so low was because of the negative score it had with black, not because of its score when SF was black.

Overall AlphaZero won 40% of games with white; in B40, it won 34% of games with white.

That counts as something, for sure (although 100 games is a relatively small sample), but again, kind of moot if your opponent almost invariably plays 1.d4

Milos · Post by **Milos** » Thu Dec 07, 2017 9:03 pm

MonteCarlo wrote:Same problem, though.

Hard for SF to get into the 2...e6 Sicilian as black against an opponent that opens almost exclusively 1.d4

How did you conclude that?
From Table 2 it is not obvious. Yes most Alpha0 wins came from d4, but that tells more about SF weakness in particular opening, not that d4 was mostly played.

Again constructing the opening book would be quite easy if one had access to training games from last 100'000 iterations. So enough is last 8 million games or so.
You'd know statistics exactly i.e. openings that Alpha0 played the most, you just need to avoid those as much as possible and steer the game into positions that were never trained. Since you'd have full games, you could actually make statistics on similarity of positions reached after move 10 or 15 and try to steer your book into opposite direction.
Steering is easy because you'd know always what Alpha0 would play and with which probability.
It is the same way ppl construct anti-books for Cerebellum for example.

MonteCarlo · Post by **MonteCarlo** » Thu Dec 07, 2017 9:10 pm

Because the games for Table 2 were not just organically played games.

They trained the neural nets with self play, and once training was done they took the 12 most popular openings according to their measure in human play and had SF and AlphaZero play matches from the resulting positions of each 12 (this is different than the main 100 game match from the start position).

They did that to test whether its strength extended past the openings it chose to play or not, and they conclude it does.

For each of those 12 openings, they include a graph that shows how often the opening occurred in self play during training.

1.e4 openings just stop occurring by the end of training, while 1.d4 openings start climbing, indicating a shift from 1.e4 to 1.d4.

1...c5 was never all that popular in self-play, even when 1.e4 was being played, and before 1.e4 openings stopped showing up altogether, Ruy Lopez spiked and other e4 defenses declined (and no Sicilian was ever popular in self-play).

Combine that with both black games in the reported 10 games from their main match having AlphaZero respond to 1.e4 with 1...e5 and start nearly every white game with 1.d4, and the other inferences are supported.

That was my reasoning

jdart · Post by **jdart** » Thu Dec 07, 2017 9:12 pm

It is a neural network based system, and quite a bit has been written about the Go program that preceded it. I do not think it is a big mystery what they did.

Re reinforcement learning, Andrew Tridgell applied this to chess in the late 90's:

https://chessprogramming.wikispaces.com/KnightCap

https://www.cs.princeton.edu/courses/ar ... ess-RL.pdf

He got good learning progress but not great results in terms of final program strength.

--Jon

Michael Sherwin · Post by **Michael Sherwin** » Thu Dec 07, 2017 11:22 pm

Steve Maughan wrote:I remember the experiments at the time. Could you briefly explain what you did? From memory I recall you did the following:

At the end of the game you parsed the list of moves and adjusted the score up or down a certain number of centipawns based on the outcome. You then hashed each position and stored it in a learning file. I assume this is then loaded into the hash table at the start of each game. Is this broadly correct?

Thanks,

Steve

Yes, that is very accurate except the moves were not hashed in the learn file. The learn file was just a giant tree data structure connected with sibling and descendant pointers. Before the search the entire subtree (if there was one) was loaded into the game hash.

Michael Sherwin · Post by **Michael Sherwin** » Thu Dec 07, 2017 11:58 pm

jdart wrote:It is a neural network based system, and quite a bit has been written about the Go program that preceded it. I do not think it is a big mystery what they did.

Re reinforcement learning, Andrew Tridgell applied this to chess in the late 90's:

https://chessprogramming.wikispaces.com/KnightCap

https://www.cs.princeton.edu/courses/ar ... ess-RL.pdf

He got good learning progress but not great results in terms of final program strength.

--Jon

So I was not the first. Like Bob said, there is nothing new under the Sun. However, Romi did achive superior results in Leo Dicksman's class tournaments gaining two classes and about to gain a third class before his hard drive crashed and he lost Romi's learn file.

I googled reinforcement learning and found no connection to Pavlov's dog experiments in which he rewarded correct behavior and punished wrong behavior, except when Romi is mentioned.

My goal was to create computer chess learning that mimicked how humans learn. So I took two examples of that and adapted them for computer chess. Humans copy moves and that is, monkey see monkey do learning, that gets it name from monkeys watching humans wash potatoes in a stream and then doing it themselves. The second one (reinforcement learning) is just staying with what is working or trying something else if it is not. I think somewhere in what I did was some originality?

corres · Post by **corres** » Fri Dec 08, 2017 1:15 am

I have some question to you:
How much gigabyte was the learning file of Romi and how much Elo had it at that tournament?
Thanks
Robert

AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better