Alphazero news

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
IanO
Posts: 464
Joined: Wed Mar 08, 2006 8:45 pm
Location: Portland, OR
Contact:

Re: Alphazero news

Post by IanO » Fri Dec 07, 2018 1:19 am

Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!

https://deepmind.com/research/alphago/a ... resources/

I've been looking at some of the shogi games (hand-selected by Habu, the Kasparov of shogi), and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!

The Deep Mind blog post is here: https://deepmind.com/blog/alphazero-she ... gi-and-go/

jp
Posts: 259
Joined: Mon Apr 23, 2018 5:54 am

Re: Alphazero news

Post by jp » Fri Dec 07, 2018 1:20 am

matthewlai wrote:
Fri Dec 07, 2018 1:15 am
Daniel Shawul wrote:
Thu Dec 06, 2018 11:35 pm
While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?

matthewlai
Posts: 778
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: Alphazero news

Post by matthewlai » Fri Dec 07, 2018 1:24 am

jp wrote:
Fri Dec 07, 2018 1:20 am
matthewlai wrote:
Fri Dec 07, 2018 1:15 am
Daniel Shawul wrote:
Thu Dec 06, 2018 11:35 pm
While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
They are all in the pseudo-code in supplementary materials.

Code: Select all

class AlphaZeroConfig(object):

  def __init__(self):
    ### Self-Play
    self.num_actors = 5000

    self.num_sampling_moves = 30
    self.max_moves = 512  # for chess and shogi, 722 for Go.
    self.num_simulations = 800

    # Root prior exploration noise.
    self.root_dirichlet_alpha = 0.3  # for chess, 0.03 for Go and 0.15 for shogi.
    self.root_exploration_fraction = 0.25

    # UCB formula
    self.pb_c_base = 19652
    self.pb_c_init = 1.25

    ### Training
    self.training_steps = int(700e3)
    self.checkpoint_interval = int(1e3)
    self.window_size = int(1e6)
    self.batch_size = 4096

    self.weight_decay = 1e-4
    self.momentum = 0.9
    # Schedule for chess and shogi, Go starts at 2e-2 immediately.
    self.learning_rate_schedule = {
        0: 2e-1,
        100e3: 2e-2,
        300e3: 2e-3,
        500e3: 2e-4
    }
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

clumma
Posts: 165
Joined: Fri Oct 10, 2014 8:05 pm
Location: Berkeley, CA

Re: Alphazero news

Post by clumma » Fri Dec 07, 2018 1:26 am

Matthew: Congrats on the phenomenal success.

Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn

Thank you!

-Carl

matthewlai
Posts: 778
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: Alphazero news

Post by matthewlai » Fri Dec 07, 2018 1:31 am

noobpwnftw wrote:
Fri Dec 07, 2018 12:09 am
I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?

Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
180' + 15" is the time control for Season 9 Superfinal.

In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

Albert Silver
Posts: 2779
Joined: Wed Mar 08, 2006 8:57 pm
Location: Rio de Janeiro, Brazil

Re: Alphazero news

Post by Albert Silver » Fri Dec 07, 2018 1:45 am

matthewlai wrote:
Fri Dec 07, 2018 1:31 am
noobpwnftw wrote:
Fri Dec 07, 2018 12:09 am
I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?

Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
180' + 15" is the time control for Season 9 Superfinal.

In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.
When you ran it on 4 TPUs, was this the upper limit of what you could exploit, or just a standard you set. Meaning would you get a serious boost from 8 TPUs or more? Leela has hit a bit of a wall on how fast it can get so that beyond a certain point it gains nothing. Were you able to solve this, and if so how?
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."

matthewlai
Posts: 778
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: Alphazero news

Post by matthewlai » Fri Dec 07, 2018 2:45 am

clumma wrote:
Fri Dec 07, 2018 1:26 am
Matthew: Congrats on the phenomenal success.

Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn

Thank you!

-Carl
Thanks!

Only games from the primary evaluation and TCEC openings have been released (no opening books).
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

clumma
Posts: 165
Joined: Fri Oct 10, 2014 8:05 pm
Location: Berkeley, CA

Re: Alphazero news

Post by clumma » Fri Dec 07, 2018 3:18 am

matthewlai wrote:
Fri Dec 07, 2018 2:45 am
clumma wrote:
Fri Dec 07, 2018 1:26 am
Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn
Only games from the primary evaluation and TCEC openings have been released (no opening books).
D'oh!

Why?

I've been wanting to see AZ v BF since last year and the first thing I checked with this paper is whether you tried it and 99% of my excitement about it is that you did.

Also the results look really weird. White wins went down but black wins went up??

-Carl

Geonerd
Posts: 48
Joined: Fri Mar 10, 2017 12:44 am

Re: Alphazero news

Post by Geonerd » Fri Dec 07, 2018 3:51 am

IanO wrote:
Fri Dec 07, 2018 1:19 am
Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!

https://deepmind.com/research/alphago/a ... resources/
Thank you for the link! :D

yanquis1972
Posts: 1700
Joined: Tue Jun 02, 2009 10:14 pm

Re: Alphazero news

Post by yanquis1972 » Fri Dec 07, 2018 4:06 am

matthewlai wrote:
Fri Dec 07, 2018 1:24 am
jp wrote:
Fri Dec 07, 2018 1:20 am
matthewlai wrote:
Fri Dec 07, 2018 1:15 am
Daniel Shawul wrote:
Thu Dec 06, 2018 11:35 pm
While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
That would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.
What were the best values/functions for CPUCT used for playing & training?
They are all in the pseudo-code in supplementary materials.

Code: Select all

class AlphaZeroConfig(object):

  def __init__(self):
    ### Self-Play
    self.num_actors = 5000

    self.num_sampling_moves = 30
    self.max_moves = 512  # for chess and shogi, 722 for Go.
    self.num_simulations = 800

    # Root prior exploration noise.
    self.root_dirichlet_alpha = 0.3  # for chess, 0.03 for Go and 0.15 for shogi.
    self.root_exploration_fraction = 0.25

    # UCB formula
    self.pb_c_base = 19652
    self.pb_c_init = 1.25

    ### Training
    self.training_steps = int(700e3)
    self.checkpoint_interval = int(1e3)
    self.window_size = int(1e6)
    self.batch_size = 4096

    self.weight_decay = 1e-4
    self.momentum = 0.9
    # Schedule for chess and shogi, Go starts at 2e-2 immediately.
    self.learning_rate_schedule = {
        0: 2e-1,
        100e3: 2e-2,
        300e3: 2e-3,
        500e3: 2e-4
    }
I just read that as code for training, would the 1.25 value apply to match play and does it correlate to lc0s search variable?

Post Reply