My understanding is that "randomly initialised parameters" is not the same as loading human games.Milos wrote:How do you explain these paragraphs from the paper:kranium wrote:No human games were loaded. Learning was accomplished thru millions of self-play games
The monte carlo search algorithm simply chose the move in each position with the highest win probability.So when playing self-played games positions used for training are taken from the games randomly (since position is part of set of training parameters). So what about starting positions of those 44 million training games? You think they were all random, or initial starting position and they had no chess knowledge in them????"Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters"
"We represent the policy π(a|s) by a 8 × 8 × 73 stack of planes encoding a probability distribution over 4,672 possible moves. Each of the 8×8 positions identifies the square from which to “pick up” a piece."
"The number of games, positions, and thinking time varied per game due largely to different board sizes and game lengths, and are shown in Table S3."
Give me a break, thinking those ppl in Google are so stupid to train their network in such a lousy way, instead of sorting those 100'000 openings from the same chessbase they quote in the paper by probability of occurrence and using those statistics as starting positions for those self-played games.
Ofc in Table 2 they nicely show just percentages not actual numbers so you can't judge how many training games in total were from the starting position, because someone could be smart and sum up all those games from Table 2 and figure the number doesn't match 44 million...
Btw. 700'000 training iterations times 800 MTCS is already 56 million, not 44, so where did 12 million games disappear?
Yes I assume (because it has not been made clear by Google) that the self-play games all started from the traditional start position.
AlphaZero would quickly realize that it was winning more often after 1. d4 than after 1. f3 for ex.