Page 4 of 39

Re: Alphazero news

Posted: Fri Dec 07, 2018 2:19 am
by IanO
Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!

https://deepmind.com/research/alphago/a ... resources/

I've been looking at some of the shogi games (hand-selected by Habu, the Kasparov of shogi), and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!

The Deep Mind blog post is here: https://deepmind.com/blog/alphazero-she ... gi-and-go/

Re: Alphazero news

Posted: Fri Dec 07, 2018 2:20 am
by jp
matthewlai wrote: Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?

Re: Alphazero news

Posted: Fri Dec 07, 2018 2:24 am
by matthewlai
jp wrote: Fri Dec 07, 2018 2:20 am
matthewlai wrote: Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
They are all in the pseudo-code in supplementary materials.

Code: Select all

class AlphaZeroConfig(object):

  def __init__(self):
    ### Self-Play
    self.num_actors = 5000

    self.num_sampling_moves = 30
    self.max_moves = 512  # for chess and shogi, 722 for Go.
    self.num_simulations = 800

    # Root prior exploration noise.
    self.root_dirichlet_alpha = 0.3  # for chess, 0.03 for Go and 0.15 for shogi.
    self.root_exploration_fraction = 0.25

    # UCB formula
    self.pb_c_base = 19652
    self.pb_c_init = 1.25

    ### Training
    self.training_steps = int(700e3)
    self.checkpoint_interval = int(1e3)
    self.window_size = int(1e6)
    self.batch_size = 4096

    self.weight_decay = 1e-4
    self.momentum = 0.9
    # Schedule for chess and shogi, Go starts at 2e-2 immediately.
    self.learning_rate_schedule = {
        0: 2e-1,
        100e3: 2e-2,
        300e3: 2e-3,
        500e3: 2e-4
    }

Re: Alphazero news

Posted: Fri Dec 07, 2018 2:26 am
by clumma
Matthew: Congrats on the phenomenal success.

Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn

Thank you!

-Carl

Re: Alphazero news

Posted: Fri Dec 07, 2018 2:31 am
by matthewlai
noobpwnftw wrote: Fri Dec 07, 2018 1:09 am I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?

Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
180' + 15" is the time control for Season 9 Superfinal.

In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.

Re: Alphazero news

Posted: Fri Dec 07, 2018 2:45 am
by Albert Silver
matthewlai wrote: Fri Dec 07, 2018 2:31 am
noobpwnftw wrote: Fri Dec 07, 2018 1:09 am I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?

Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
180' + 15" is the time control for Season 9 Superfinal.

In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.
When you ran it on 4 TPUs, was this the upper limit of what you could exploit, or just a standard you set. Meaning would you get a serious boost from 8 TPUs or more? Leela has hit a bit of a wall on how fast it can get so that beyond a certain point it gains nothing. Were you able to solve this, and if so how?

Re: Alphazero news

Posted: Fri Dec 07, 2018 3:45 am
by matthewlai
clumma wrote: Fri Dec 07, 2018 2:26 am Matthew: Congrats on the phenomenal success.

Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn

Thank you!

-Carl
Thanks!

Only games from the primary evaluation and TCEC openings have been released (no opening books).

Re: Alphazero news

Posted: Fri Dec 07, 2018 4:18 am
by clumma
matthewlai wrote: Fri Dec 07, 2018 3:45 am
clumma wrote: Fri Dec 07, 2018 2:26 am Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn
Only games from the primary evaluation and TCEC openings have been released (no opening books).
D'oh!

Why?

I've been wanting to see AZ v BF since last year and the first thing I checked with this paper is whether you tried it and 99% of my excitement about it is that you did.

Also the results look really weird. White wins went down but black wins went up??

-Carl

Re: Alphazero news

Posted: Fri Dec 07, 2018 4:51 am
by Geonerd
IanO wrote: Fri Dec 07, 2018 2:19 am Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!

https://deepmind.com/research/alphago/a ... resources/
Thank you for the link! :D

Re: Alphazero news

Posted: Fri Dec 07, 2018 5:06 am
by yanquis1972
matthewlai wrote: Fri Dec 07, 2018 2:24 am
jp wrote: Fri Dec 07, 2018 2:20 am
matthewlai wrote: Fri Dec 07, 2018 2:15 am
Daniel Shawul wrote: Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
They are all in the pseudo-code in supplementary materials.

Code: Select all

class AlphaZeroConfig(object):

  def __init__(self):
    ### Self-Play
    self.num_actors = 5000

    self.num_sampling_moves = 30
    self.max_moves = 512  # for chess and shogi, 722 for Go.
    self.num_simulations = 800

    # Root prior exploration noise.
    self.root_dirichlet_alpha = 0.3  # for chess, 0.03 for Go and 0.15 for shogi.
    self.root_exploration_fraction = 0.25

    # UCB formula
    self.pb_c_base = 19652
    self.pb_c_init = 1.25

    ### Training
    self.training_steps = int(700e3)
    self.checkpoint_interval = int(1e3)
    self.window_size = int(1e6)
    self.batch_size = 4096

    self.weight_decay = 1e-4
    self.momentum = 0.9
    # Schedule for chess and shogi, Go starts at 2e-2 immediately.
    self.learning_rate_schedule = {
        0: 2e-1,
        100e3: 2e-2,
        300e3: 2e-3,
        500e3: 2e-4
    }
I just read that as code for training, would the 1.25 value apply to match play and does it correlate to lc0s search variable?