Alpha Zero question
Moderator: Ras
-
Leo
- Posts: 1107
- Joined: Fri Sep 16, 2016 6:55 pm
- Location: USA/Minnesota
- Full name: Leo Anger
Re: Alpha Zero question
So maybe the big thing about Alpha Zero was its learning logarithm and its huge hardware.
Advanced Micro Devices fan.
-
syzygy
- Posts: 5807
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Alpha Zero question
There was certainly a lot of back and forth, and there were many people complaining that nobody would be able to reproduce the results based on Deepmind's papers.supersharp77 wrote: ↑Wed Feb 15, 2023 7:43 pmWell You will have to double back to the original 2018 match links/debates...there was quite alot of back and forth & push and pull about that "so called match" as no one was actually able to reproduce the results!
But while those people were complaining, others were busy reproducing the results based on Deepmind's papers.
Yes, from a pure "which engine is the strongest" point of view much was still unclear. But Deepmind had shown that their approach, which not only used NNs but amazingly did NOT use anything resembling alpha-beta, could compete with the very best engines. That it was anywhere near SF's strength was a miracle already.What I recall (just like it was yesterday) was that Google....(Alpha Zeros Team) did not allow Stockfish a opening book and there was no "Stockfish Team" present during the so called "match" just a base Stockfish 8....."Match" was 100 games with 8 wins by Alpha Zero (98 draws?) but most were unpublished...and only the wins were published?!.......Alpha Zero at this present time (5 years later) remains a mystery...LC0 by most accounts may well be stronger than Alpha Zero currently (although opinions will vary) so until someone gets a working copy of Alpha Zero.......we will never know.....original arguments were CPU vs TPU....(Now GPU) and power supply issues and also time control & opening book issues...."Who Knows For Sure!!"....![]()
![]()
-
M ANSARI
- Posts: 3734
- Joined: Thu Mar 16, 2006 7:10 pm
Re: Alpha Zero question
I think the question should be how would SF today do without NNUE against Alpha Zero that beat SF8. That would be more interesting as SF with NNUE could be argued that it is sort of SF with Alpha Zero components. Personally I think that SF with NNUE is just an amazing combination and most likely a new LC Zero with some SF components would also be equally amazing. If you look at some of Alpha Zero games, there are some obvious glaring flaws that could be best described as "bugs" in software that was still in beta mode. The same flaws carried over to LC Zero ... but somehow I think LC 0 has ironed those things out. My guess would be that a new Alpha Zero would most likely be able to learn from the gains of LC 0 and play much stronger. Add to that the dramatic increase in AI hardware power and you would get a pretty impressive chess playing entity!
-
Werewolf
- Posts: 2058
- Joined: Thu Sep 18, 2008 10:24 pm
-
jkominek
- Posts: 98
- Joined: Tue Sep 04, 2018 5:33 am
- Full name: John Kominek
Re: Alpha Zero question
Decided to look it up. From their 2017 paper that sent shockwaves through the computer chess community, Mastering chess and shogi by self-play with a general reinforcement learning algorithm:
A year later, in December 2018, after responding to the referees with more thorough testing, they put the title words through the jumbler and published A general reinforcement learning algorithm that masters chess shogi and go through self-play. The reported training specs changed on one point.Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly
initialised parameters, using 5,000 first-generation TPUs to generate self-play
games and 64 second-generation TPUs to train the neural networks.
re. Evaluation: AlphaZero and the previous AlphaGo Zero used a single machine with 4 TPUs.
It could be that the first claim of 64 2nd-gen TPUs for model training was a mis-statement. Or resources could have been reduced during the follow-up work. The learning progress curve for Chess (Figure 1) is identical in both papers, but different for Shogi and Go.We trained separate instances of AlphaZero for chess, shogi and Go. Training proceeded for
700,000 steps (in mini-batches of 4,096 training positions) starting from randomly initialized
parameters. During training only, 5,000 first-generation tensor processing units (TPUs)
were used to generate self-play games, and 16 second-generation TPUs were used to train the
neural networks. Training lasted for approximately 9 hours in chess, 12 hours in shogi and 13
days in Go
re. Evaluation: AlphaZero and AlphaGo Zero used a single machine with four first-generation TPUs and 44 CPU cores.
Endnote 24. A first generation TPU is roughly similar in inference speed to a Titan V GPU, although
the architectures are not directly comparable.
Using the comparison to a Titan V, 4 TPUs for game playing is roughly equivalent to two Nvidia A100s, the supply of GPU compute power currently available in the TCEC competition.