You seem to think that A0 is like human and that can learn anything. What you write about training is pure science fiction and has absolute nothing to do with reinforcement learning or how A0 was trained, not to mention that TC is not one fixed number but can have a continuous spectrum of values.Ovyron wrote:Suppose that DeepMind created another Neural Network that was responsible to manage time in time control games, it would play random games against itself, and soon learn that too much time used would lose on time.
Then it would learn that playing too fast dramatically decreases the strength of its moves (say, A0 with 2:30 minutes remaining on the clock would play half as strong as one with a few seconds at the end of a 5 minute game.)
Eventually you'd get the best time managers and A0 would be equipped with the best one they were able to find, for her match against Pablo.
If you actually had a clue about A0, you'd know that A0 in actual games (not in self-training) has zero randomness apart from (not even mentioned in the paper) SMP randomness of SMP UCT implementation.In the actual match, the whole time management wouldn't even make any difference most of the time, since A0 wouldn't allow Pablo to lock the position in the first place...
But eventually, and with determination (and so far no draws or wins, just loses for Pablo) Pablo gets his holy grail, and locks the position! Then he moves and premoves as fast as he can, trying to flag Alpha Zero, until...
Coming to this point, is it your claim that Pablo would be able to move faster than the best time management DeepMind could build, and by "decent chances" do you mean a single win after hundreds of lost games?
With fixed time per move, that SMP randomness disappears completely and not only Pablo, but practically anyone with half decent memory would be able to at least draw every single game against A0 after playing a hundred games or even less against it.