Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!
https://deepmind.com/research/alphago/a ... resources/
I've been looking at some of the shogi games (hand-selected by Habu, the Kasparov of shogi), and they are utterly impenetrable. All known joseki (openings) and king-safety principles are thrown out the window! In some of these games, the king doesn't just sit undeveloped in the center but does the chess equivalent of heading out to the middle of the board in the middle game before coming back to the corner for safety and then winning. Astounding!
The Deep Mind blog post is here: https://deepmind.com/blog/alphazero-she ... gi-and-go/
Alphazero news
Moderators: hgm, chrisw, Rebel
-
- Posts: 1473
- Joined: Mon Apr 23, 2018 7:54 am
Re: Alphazero news
What were the best values/functions for CPUCT used for playing & training?matthewlai wrote: ↑Fri Dec 07, 2018 2:15 amThat would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Alphazero news
They are all in the pseudo-code in supplementary materials.jp wrote: ↑Fri Dec 07, 2018 2:20 amWhat were the best values/functions for CPUCT used for playing & training?matthewlai wrote: ↑Fri Dec 07, 2018 2:15 amThat would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
Code: Select all
class AlphaZeroConfig(object):
def __init__(self):
### Self-Play
self.num_actors = 5000
self.num_sampling_moves = 30
self.max_moves = 512 # for chess and shogi, 722 for Go.
self.num_simulations = 800
# Root prior exploration noise.
self.root_dirichlet_alpha = 0.3 # for chess, 0.03 for Go and 0.15 for shogi.
self.root_exploration_fraction = 0.25
# UCB formula
self.pb_c_base = 19652
self.pb_c_init = 1.25
### Training
self.training_steps = int(700e3)
self.checkpoint_interval = int(1e3)
self.window_size = int(1e6)
self.batch_size = 4096
self.weight_decay = 1e-4
self.momentum = 0.9
# Schedule for chess and shogi, Go starts at 2e-2 immediately.
self.learning_rate_schedule = {
0: 2e-1,
100e3: 2e-2,
300e3: 2e-3,
500e3: 2e-4
}
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 186
- Joined: Fri Oct 10, 2014 10:05 pm
- Location: Berkeley, CA
Re: Alphazero news
Matthew: Congrats on the phenomenal success.
Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn
Thank you!
-Carl
Can you help me locate the games AZ played against Brainfish? They don't seem to have their own file, and I don't see any identifying info in alphazero_vs_stockfish_all.pgn
Thank you!
-Carl
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Alphazero news
180' + 15" is the time control for Season 9 Superfinal.noobpwnftw wrote: ↑Fri Dec 07, 2018 1:09 am I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?
Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 3026
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: Alphazero news
When you ran it on 4 TPUs, was this the upper limit of what you could exploit, or just a standard you set. Meaning would you get a serious boost from 8 TPUs or more? Leela has hit a bit of a wall on how fast it can get so that beyond a certain point it gains nothing. Were you able to solve this, and if so how?matthewlai wrote: ↑Fri Dec 07, 2018 2:31 am180' + 15" is the time control for Season 9 Superfinal.noobpwnftw wrote: ↑Fri Dec 07, 2018 1:09 am I have a few questions:
TCEC SuFi used 120' + 15" TC, Division P used 90' + 10" TC, since DM went for mimicking TCEC conditions this time like using a 44-core machine and with the same openings, why mess with time controls(again)?
Also, a year ago the NPS of A0 was 80K, now it is only around 60K, that's about 30% a nerf, what happened, people overclocked the TPUs a year ago?
In the preprint the NPS figures I believe were taken from the start position. For the peer-reviewed final paper we looked at all moves to compute those statistics.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: Alphazero news
Thanks!
Only games from the primary evaluation and TCEC openings have been released (no opening books).
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 186
- Joined: Fri Oct 10, 2014 10:05 pm
- Location: Berkeley, CA
Re: Alphazero news
D'oh!matthewlai wrote: ↑Fri Dec 07, 2018 3:45 amOnly games from the primary evaluation and TCEC openings have been released (no opening books).
Why?
I've been wanting to see AZ v BF since last year and the first thing I checked with this paper is whether you tried it and 99% of my excitement about it is that you did.
Also the results look really weird. White wins went down but black wins went up??
-Carl
-
- Posts: 79
- Joined: Fri Mar 10, 2017 1:44 am
Re: Alphazero news
Thank you for the link!IanO wrote: ↑Fri Dec 07, 2018 2:19 am Even more exciting: they released the full game scores of the hundred game matches for all three games, chess, shogi, and go!
https://deepmind.com/research/alphago/a ... resources/
-
- Posts: 1766
- Joined: Wed Jun 03, 2009 12:14 am
Re: Alphazero news
I just read that as code for training, would the 1.25 value apply to match play and does it correlate to lc0s search variable?matthewlai wrote: ↑Fri Dec 07, 2018 2:24 amThey are all in the pseudo-code in supplementary materials.jp wrote: ↑Fri Dec 07, 2018 2:20 amWhat were the best values/functions for CPUCT used for playing & training?matthewlai wrote: ↑Fri Dec 07, 2018 2:15 amThat would be good, but it would also be a lot of work for us (AZ is tightly-coupled with DM and Google's systems) for not really much value to the scientific community. We feel that it's our ideas and algorithms that are important, not our implementation. That's why we have published all the algorithms we developed in detail, with almost-runnable pseudo-code, so that they can be replicated easily.Daniel Shawul wrote: ↑Fri Dec 07, 2018 12:35 am While I sympathize with that statement, releasing A0 source code and networks for anyone to test sounds better.
Many will not be satisfied with in-house testing with supposedly fair conditions.
Code: Select all
class AlphaZeroConfig(object): def __init__(self): ### Self-Play self.num_actors = 5000 self.num_sampling_moves = 30 self.max_moves = 512 # for chess and shogi, 722 for Go. self.num_simulations = 800 # Root prior exploration noise. self.root_dirichlet_alpha = 0.3 # for chess, 0.03 for Go and 0.15 for shogi. self.root_exploration_fraction = 0.25 # UCB formula self.pb_c_base = 19652 self.pb_c_init = 1.25 ### Training self.training_steps = int(700e3) self.checkpoint_interval = int(1e3) self.window_size = int(1e6) self.batch_size = 4096 self.weight_decay = 1e-4 self.momentum = 0.9 # Schedule for chess and shogi, Go starts at 2e-2 immediately. self.learning_rate_schedule = { 0: 2e-1, 100e3: 2e-2, 300e3: 2e-3, 500e3: 2e-4 }