They can say or omit anything. One article definitely stated that a database of human games were loaded. I don't know what article is correct. I just know from years of experience with that type of learning that the moves in the games by AlphaZero against Stockfish has the same type and feel. Take what I said with a grain of salt but take what is said in those articles with a grain of salt as well.
Edit: Think about it in these terms. It took 4 hours for AlphaZero to amass all human knowledge about the game of chess. At one minute per move maybe 4 games of self play could have been accomplished. However, if 6 million games were loaded in and analysed then 4 hours is about the time it would take. Self play would take years to get to that level. 'Their' story does not add up!
AlphaGo Zero And AlphaZero, RomiChess done better
Moderators: hgm, Rebel, chrisw
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: AlphaGo Zero And AlphaZero, RomiChess done better
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 2129
- Joined: Thu May 29, 2008 10:43 am
Re: AlphaGo Zero And AlphaZero, RomiChess done better
No human games were loaded. Learning was accomplished thru millions of self-play gamesMichael Sherwin wrote:They can say or omit anything. One article definitely stated that a database of human games were loaded. I don't know what article is correct. I just know from years of experience with that type of learning that the moves in the games by AlphaZero against Stockfish has the same type and feel. Take what I said with a grain of salt but take what is said in those articles with a grain of salt as well.
Edit: Think about it in these terms. It took 4 hours for AlphaZero to amass all human knowledge about the game of chess. At one minute per move maybe 4 games of self play could have been accomplished. However, if 6 million games were loaded in and analysed then 4 hours is about the time it would take. Self play would take years to get to that level. 'Their' story does not add up!
The monte carlo search algorithm simply chose the move in each position with the highest win probability.
Code: Select all
Mini-batches 700k 700k 700k
Training Time 9h 12h 34h
Training Games 44 million 24 million 21 million
Thinking Time 800 sims 800 sims 800 sims
40 ms 80 ms 200 ms
They used 5,000 first-generation TPUs to generate self-play games.
and 64 second-generation TPUs to train the neural networks.
They ended up with 44 million training games.
-
- Posts: 2129
- Joined: Thu May 29, 2008 10:43 am
Re: AlphaGo Zero And AlphaZero, RomiChess done better
There's a version of AlphaZero available here:
https://github.com/Zeta36/chess-alpha-zero
for anyone interested
https://github.com/Zeta36/chess-alpha-zero
for anyone interested
-
- Posts: 172
- Joined: Thu May 27, 2010 3:32 am
Re: AlphaGo Zero And AlphaZero, RomiChess done better
It's a completely different project that uses some of the techniques from Alpha Zero. Not the same thing.
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: AlphaGo Zero And AlphaZero, RomiChess done better
[quote="kranium"]
There's a version of AlphaZero available here:
https://github.com/Zeta36/chess-alpha-zero
for anyone interested
[/quote]
Please read in the "readme.md" paragraph about "New Supervised Learning Pipeline".
You can read that human games was used before starting the learning process in the case of Zeta36 and AlphaZero too.
"...maybe chess is too complicated for a self training alone.."
There's a version of AlphaZero available here:
https://github.com/Zeta36/chess-alpha-zero
for anyone interested
[/quote]
Please read in the "readme.md" paragraph about "New Supervised Learning Pipeline".
You can read that human games was used before starting the learning process in the case of Zeta36 and AlphaZero too.
"...maybe chess is too complicated for a self training alone.."
-
- Posts: 2129
- Joined: Thu May 29, 2008 10:43 am
Re: AlphaGo Zero And AlphaZero, RomiChess done better
yes that's why I said 'a version'schack wrote:It's a completely different project that uses some of the techniques from Alpha Zero. Not the same thing.
-
- Posts: 1222
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: AlphaGo Zero And AlphaZero, RomiChess done better
I remember the experiments at the time. Could you briefly explain what you did? From memory I recall you did the following:
At the end of the game you parsed the list of moves and adjusted the score up or down a certain number of centipawns based on the outcome. You then hashed each position and stored it in a learning file. I assume this is then loaded into the hash table at the start of each game. Is this broadly correct?
Thanks,
Steve
At the end of the game you parsed the list of moves and adjusted the score up or down a certain number of centipawns based on the outcome. You then hashed each position and stored it in a learning file. I assume this is then loaded into the hash table at the start of each game. Is this broadly correct?
Thanks,
Steve
http://www.chessprogramming.net - Maverick Chess Engine
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: AlphaGo Zero And AlphaZero, RomiChess done better
How do you explain these paragraphs from the paper:kranium wrote:No human games were loaded. Learning was accomplished thru millions of self-play games
The monte carlo search algorithm simply chose the move in each position with the highest win probability.
So when playing self-played games positions used for training are taken from the games randomly (since position is part of set of training parameters). So what about starting positions of those 44 million training games? You think they were all random, or initial starting position and they had no chess knowledge in them????"Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters"
"We represent the policy π(a|s) by a 8 × 8 × 73 stack of planes encoding a probability distribution over 4,672 possible moves. Each of the 8×8 positions identifies the square from which to “pick up” a piece."
"The number of games, positions, and thinking time varied per game due largely to different board sizes and game lengths, and are shown in Table S3."
Give me a break, thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.
Ofc in Table 2 they nicely show just percentages not actual numbers so you can't judge how many training games in total were from the starting position, because someone could be smart and sum up all those games from Table 2 and figure the number doesn't match 44 million...
Btw. 700'000 training iterations times 800 MTCS is already 56 million, not 44, so where did 12 million games disappear?
-
- Posts: 2129
- Joined: Thu May 29, 2008 10:43 am
Re: AlphaGo Zero And AlphaZero, RomiChess done better
My understanding is that "randomly initialised parameters" is not the same as loading human games.Milos wrote:How do you explain these paragraphs from the paper:kranium wrote:No human games were loaded. Learning was accomplished thru millions of self-play games
The monte carlo search algorithm simply chose the move in each position with the highest win probability.So when playing self-played games positions used for training are taken from the games randomly (since position is part of set of training parameters). So what about starting positions of those 44 million training games? You think they were all random, or initial starting position and they had no chess knowledge in them????"Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters"
"We represent the policy π(a|s) by a 8 × 8 × 73 stack of planes encoding a probability distribution over 4,672 possible moves. Each of the 8×8 positions identifies the square from which to “pick up” a piece."
"The number of games, positions, and thinking time varied per game due largely to different board sizes and game lengths, and are shown in Table S3."
Give me a break, thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.
Ofc in Table 2 they nicely show just percentages not actual numbers so you can't judge how many training games in total were from the starting position, because someone could be smart and sum up all those games from Table 2 and figure the number doesn't match 44 million...
Btw. 700'000 training iterations times 800 MTCS is already 56 million, not 44, so where did 12 million games disappear?
Yes I assume (because it has not been made clear by Google) that the self-play games all started from the traditional start position.
AlphaZero would quickly realize that it was winning more often after 1. d4 than after 1. f3 for ex.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: AlphaGo Zero And AlphaZero, RomiChess done better
They quote 100'000 games from chessbase and their batches are per 100'000 iterations of 800 MTCS simulations, another "coincidence"?kranium wrote:Yes I assume (because it has not been made clear by Google) that the self-play games all started from the traditional start position.
AlphaZero would quickly realize that it was winning more often after 1. d4 than after 1. f3 for ex.
Again let me quote myself (btw. what do you think how many f3 openings are between those 100'000 openings from chessbase):
thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.