Alphazero news

IanO · Post by **IanO** » Fri Dec 07, 2018 2:19 am

Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!

https://deepmind.com/research/alphago/a ... resources/

I've been looking at some of the shogi games (hand-selected by Habu, the Kasparov of shogi), and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!

The Deep Mind blog post is here: https://deepmind.com/blog/alphazero-she ... gi-and-go/

jp · Post by jp » Fri Dec 07, 2018 2:20 am

matthewlai wrote: ↑Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.

What were the best values/functions for CPUCT used for playing & training?

matthewlai · Post by **matthewlai** » Fri Dec 07, 2018 2:24 am

jp wrote: ↑Fri Dec 07, 2018 2:20 am
matthewlai wrote: ↑Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?

They are all in the pseudo-code in supplementary materials.

Code: Select all

class AlphaZeroConfig(object):

  def __init__(self):
    ### Self-Play
    self.num_actors = 5000

    self.num_sampling_moves = 30
    self.max_moves = 512  # for chess and shogi, 722 for Go.
    self.num_simulations = 800

    # Root prior exploration noise.
    self.root_dirichlet_alpha = 0.3  # for chess, 0.03 for Go and 0.15 for shogi.
    self.root_exploration_fraction = 0.25

    # UCB formula
    self.pb_c_base = 19652
    self.pb_c_init = 1.25

    ### Training
    self.training_steps = int(700e3)
    self.checkpoint_interval = int(1e3)
    self.window_size = int(1e6)
    self.batch_size = 4096

    self.weight_decay = 1e-4
    self.momentum = 0.9
    # Schedule for chess and shogi, Go starts at 2e-2 immediately.
    self.learning_rate_schedule = {
        0: 2e-1,
        100e3: 2e-2,
        300e3: 2e-3,
        500e3: 2e-4
    }

clumma · Post by **clumma** » Fri Dec 07, 2018 2:26 am

Matthew: Congrats on the phenomenal success.

Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn

Thank you!

-Carl

matthewlai · Post by **matthewlai** » Fri Dec 07, 2018 2:31 am

noobpwnftw wrote: ↑Fri Dec 07, 2018 1:09 am I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?

Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?

180' + 15" is the time control for Season 9 Superfinal.

In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.

Albert Silver · Post by **Albert Silver** » Fri Dec 07, 2018 2:45 am

matthewlai wrote: ↑Fri Dec 07, 2018 2:31 am
noobpwnftw wrote: ↑Fri Dec 07, 2018 1:09 am I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?

Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
180' + 15" is the time control for Season 9 Superfinal.

In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.

When you ran it on 4 TPUs, was this the upper limit of what you could exploit, or just a standard you set. Meaning would you get a serious boost from 8 TPUs or more? Leela has hit a bit of a wall on how fast it can get so that beyond a certain point it gains nothing. Were you able to solve this, and if so how?

matthewlai · Post by **matthewlai** » Fri Dec 07, 2018 3:45 am

clumma wrote: ↑Fri Dec 07, 2018 2:26 am Matthew: Congrats on the phenomenal success.

Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn

Thank you!

-Carl

Thanks!

Only games from the primary evaluation and TCEC openings have been released (no opening books).

clumma · Post by **clumma** » Fri Dec 07, 2018 4:18 am

matthewlai wrote: ↑Fri Dec 07, 2018 3:45 am
clumma wrote: ↑Fri Dec 07, 2018 2:26 am Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn
Only games from the primary evaluation and TCEC openings have been released (no opening books).

D'oh!

Why?

I've been wanting to see AZ v BF since last year and the first thing I checked with this paper is whether you tried it and 99% of my excitement about it is that you did.

Also the results look really weird. White wins went down but black wins went up??

-Carl

Geonerd · Post by **Geonerd** » Fri Dec 07, 2018 4:51 am

IanO wrote: ↑Fri Dec 07, 2018 2:19 am Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!

https://deepmind.com/research/alphago/a ... resources/

Thank you for the link!

yanquis1972 · Post by **yanquis1972** » Fri Dec 07, 2018 5:06 am

matthewlai wrote: ↑Fri Dec 07, 2018 2:24 am
jp wrote: ↑Fri Dec 07, 2018 2:20 am
matthewlai wrote: ↑Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
They are all in the pseudo-code in supplementary materials.
Code: Select all
class AlphaZeroConfig(object):

  def __init__(self):
    ### Self-Play
    self.num_actors = 5000

    self.num_sampling_moves = 30
    self.max_moves = 512  # for chess and shogi, 722 for Go.
    self.num_simulations = 800

    # Root prior exploration noise.
    self.root_dirichlet_alpha = 0.3  # for chess, 0.03 for Go and 0.15 for shogi.
    self.root_exploration_fraction = 0.25

    # UCB formula
    self.pb_c_base = 19652
    self.pb_c_init = 1.25

    ### Training
    self.training_steps = int(700e3)
    self.checkpoint_interval = int(1e3)
    self.window_size = int(1e6)
    self.batch_size = 4096

    self.weight_decay = 1e-4
    self.momentum = 0.9
    # Schedule for chess and shogi, Go starts at 2e-2 immediately.
    self.learning_rate_schedule = {
        0: 2e-1,
        100e3: 2e-2,
        300e3: 2e-3,
        500e3: 2e-4
    }

I just read that as code for training, would the 1.25 value apply to match play and does it correlate to lc0s search variable?

Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news

Re: Alphazero news