Dear Google AlphaZero chess team.

Milos · Post by **Milos** » Thu Jan 04, 2018 5:35 pm

Ovyron wrote:Suppose that DeepMind created another Neural Network that was responsible to manage time in time control games, it would play random games against itself, and soon learn that too much time used would lose on time.

Then it would learn that playing too fast dramatically decreases the strength of its moves (say, A0 with 2:30 minutes remaining on the clock would play half as strong as one with a few seconds at the end of a 5 minute game.)

Eventually you'd get the best time managers and A0 would be equipped with the best one they were able to find, for her match against Pablo.

You seem to think that A0 is like human and that can learn anything. What you write about training is pure science fiction and has absolute nothing to do with reinforcement learning or how A0 was trained, not to mention that TC is not one fixed number but can have a continuous spectrum of values.

In the actual match, the whole time management wouldn't even make any difference most of the time, since A0 wouldn't allow Pablo to lock the position in the first place...

But eventually, and with determination (and so far no draws or wins, just loses for Pablo) Pablo gets his holy grail, and locks the position! Then he moves and premoves as fast as he can, trying to flag Alpha Zero, until...

Coming to this point, is it your claim that Pablo would be able to move faster than the best time management DeepMind could build, and by "decent chances" do you mean a single win after hundreds of lost games?

If you actually had a clue about A0, you'd know that A0 in actual games (not in self-training) has zero randomness apart from (not even mentioned in the paper) SMP randomness of SMP UCT implementation.
With fixed time per move, that SMP randomness disappears completely and not only Pablo, but practically anyone with half decent memory would be able to at least draw every single game against A0 after playing a hundred games or even less against it.

Adam Hair · Post by **Adam Hair** » Thu Jan 04, 2018 6:47 pm

Milos wrote:
MonteCarlo wrote:Well, re: the "as shown in the very same paper" bit, let's not go too crazy

Nothing in the paper indicates that A0 loses several hundred points relative to SF at game in 5 minutes (for that matter, such a time control is not discussed at all).

The only part of the paper remotely close to talking about that is the part that discusses scaling with time, and the results do indeed show that at very fast time controls, A0 is weaker than SF, but their strength curves meet before 10^0 seconds per move, and A0 is already stronger than SF at 10^0 seconds per move, which is still a faster game than game in 5 minutes for the vast majority of games.

Now, it's somewhat reasonable to suppose (as has been stated ad nauseum in the seemingly hundreds of threads about this) that SF would get some boost from getting to use normal time management in a time control like game in 5 minutes relative to using a fixed time per move, but the extent to which that would help it against A0 is a somewhat speculative matter, and certainly there's nothing about that in the paper

In short, is it possible that A0 is several hundred points weaker than SF at G/5? Sure, it's conceivable, I suppose, but claiming that you could somehow read or deduce that from the paper is a bit much
Several hundred Elo is not devised from the paper and I never claimed that. I claimed (and that is totally clear from the paper) that A0 is much weaker than 3300Elo (assuming that it is A0 strength at 1min/move TC) at 5min/game TC.

It wasn't clear in your reply to Uly that this is what you meant.

Milos wrote: Several hundred Elo weaker than SF is due to the one important fact that you forgot in your comment. At 1min/move in the paper SF has been severely handicapped to up to probably 100Elo.

Compared to what? Game in 80 minutes?

Milos wrote: However, already at 0.3sec/move even that severely handicapped SF becomes stronger than A0.
With 5min/game TC, those handicap effects on SF become much, much more pronounced and they can easily go from 100Elo to 200Elo or more.

200 Elo may be an exaggeration. Here is the result from a test I did in 2014:

Code: Select all

   # PLAYER                       : RATING    POINTS  PLAYED    (%) 
   1 SF 141130 1'+1"              :   28.5     332.0     600   55.3% 
   2 SF 141130 2'+0               :    6.7     307.5     600   51.2% 
   3 SF 141130 40/1'              :   -8.0     291.0     600   48.5% 
   4 SF 141130 2" per move        :  -27.1     269.5     600   44.9%

2" per move may be a little too much time when comparing to G/2', but this result does not support the claim that the penalty is 200 Elo.

Using data from Andreas Strangmuller, the rating lists, benchmarks, and the result of Pocket Fritz 3 in the 2008 Mercosur Cup, Stockfish 8 on the CCRL reference computer at 40/4 is roughly (and I do mean roughly) 3300 FIDE Elo (standard). It is conceivable that AlphaZero is less than 200 Elo weaker at 2" to 3" per move.

Milos · Post by **Milos** » Thu Jan 04, 2018 7:10 pm

Adam Hair wrote:
It wasn't clear in your reply to Uly that this is what you meant.

Milos wrote: Several hundred Elo weaker than SF is due to the one important fact that you forgot in your comment. At 1min/move in the paper SF has been severely handicapped to up to probably 100Elo.
Compared to what? Game in 80 minutes?

Compared to non-handicapped SF version, i.e. TC 40/40, 64 times more hash with large pages, latest SF dev, Syz 6-men, Cerebellum.
That is most probably worth close to 100 Elo if not even more compared to how SF was used in A0-paper.

Milos wrote: However, already at 0.3sec/move even that severely handicapped SF becomes stronger than A0.
With 5min/game TC, those handicap effects on SF become much, much more pronounced and they can easily go from 100Elo to 200Elo or more.
200 Elo may be an exaggeration. Here is the result from a test I did in 2014:
Code: Select all
   # PLAYER                       : RATING    POINTS  PLAYED    (%) 
   1 SF 141130 1'+1"              :   28.5     332.0     600   55.3% 
   2 SF 141130 2'+0               :    6.7     307.5     600   51.2% 
   3 SF 141130 40/1'              :   -8.0     291.0     600   48.5% 
   4 SF 141130 2" per move        :  -27.1     269.5     600   44.9%
2" per move may be a little too much time when comparing to G/2', but this result does not support the claim that the penalty is 200 Elo.

Using data from Andreas Strangmuller, the rating lists, benchmarks, and the result of Pocket Fritz 3 in the 2008 Mercosur Cup, Stockfish 8 on the CCRL reference computer at 40/4 is roughly (and I do mean roughly) 3300 FIDE Elo (standard). It is conceivable that AlphaZero is less than 200 Elo weaker at 2" to 3" per move.

Again I was talking cumulatively about all the handicap conditions not just TC.

Adam Hair · Post by **Adam Hair** » Thu Jan 04, 2018 8:42 pm

Milos wrote:
Adam Hair wrote:
It wasn't clear in your reply to Uly that this is what you meant.

Milos wrote: Several hundred Elo weaker than SF is due to the one important fact that you forgot in your comment. At 1min/move in the paper SF has been severely handicapped to up to probably 100Elo.
Compared to what? Game in 80 minutes?
Compared to non-handicapped SF version, i.e. TC 40/40, 64 times more hash with large pages, latest SF dev, Syz 6-men, Cerebellum.
That is most probably worth close to 100 Elo if not even more compared to how SF was used in A0-paper.

I don't necessarily agree that SF was handicapped by not having Syzygy tablebases or the Cerebellum book or the fixed time per move tc, but the relatively small amount of hash was definitely a handicap.

Milos wrote:
Milos wrote: However, already at 0.3sec/move even that severely handicapped SF becomes stronger than A0.
With 5min/game TC, those handicap effects on SF become much, much more pronounced and they can easily go from 100Elo to 200Elo or more.
200 Elo may be an exaggeration. Here is the result from a test I did in 2014:
Code: Select all
   # PLAYER                       : RATING    POINTS  PLAYED    (%) 
   1 SF 141130 1'+1"              :   28.5     332.0     600   55.3% 
   2 SF 141130 2'+0               :    6.7     307.5     600   51.2% 
   3 SF 141130 40/1'              :   -8.0     291.0     600   48.5% 
   4 SF 141130 2" per move        :  -27.1     269.5     600   44.9%
2" per move may be a little too much time when comparing to G/2', but this result does not support the claim that the penalty is 200 Elo.

Using data from Andreas Strangmuller, the rating lists, benchmarks, and the result of Pocket Fritz 3 in the 2008 Mercosur Cup, Stockfish 8 on the CCRL reference computer at 40/4 is roughly (and I do mean roughly) 3300 FIDE Elo (standard). It is conceivable that AlphaZero is less than 200 Elo weaker at 2" to 3" per move.
Again I was talking cumulatively about all the handicap conditions not just TC.

Sorry, my mistake.

pilgrimdan · Post by **pilgrimdan** » Fri Jan 05, 2018 12:03 am

Milos wrote:
Ovyron wrote:Suppose that DeepMind created another Neural Network that was responsible to manage time in time control games, it would play random games against itself, and soon learn that too much time used would lose on time.

Then it would learn that playing too fast dramatically decreases the strength of its moves (say, A0 with 2:30 minutes remaining on the clock would play half as strong as one with a few seconds at the end of a 5 minute game.)

Eventually you'd get the best time managers and A0 would be equipped with the best one they were able to find, for her match against Pablo.
You seem to think that A0 is like human and that can learn anything. What you write about training is pure science fiction and has absolute nothing to do with reinforcement learning or how A0 was trained, not to mention that TC is not one fixed number but can have a continuous spectrum of values.

In the actual match, the whole time management wouldn't even make any difference most of the time, since A0 wouldn't allow Pablo to lock the position in the first place...

But eventually, and with determination (and so far no draws or wins, just loses for Pablo) Pablo gets his holy grail, and locks the position! Then he moves and premoves as fast as he can, trying to flag Alpha Zero, until...

Coming to this point, is it your claim that Pablo would be able to move faster than the best time management DeepMind could build, and by "decent chances" do you mean a single win after hundreds of lost games?
If you actually had a clue about A0, you'd know that A0 in actual games (not in self-training) has zero randomness apart from (not even mentioned in the paper) SMP randomness of SMP UCT implementation.
With fixed time per move, that SMP randomness disappears completely and not only Pablo, but practically anyone with half decent memory would be able to at least draw every single game against A0 after playing a hundred games or even less against it.

Milos ... is there anything ... that you don't know ... for certain ...

Father · Post by **Father** » Fri Jan 05, 2018 4:59 am

Jhon. Thanks foro your words.

I have been thinking that now.our commun target it is.to beat.AO...Humans and machines vrs A0... Time to built a better chess program that A0 is.
The battle it is just Starting...

Laskos · Post by **Laskos** » Sat Jan 06, 2018 11:22 am

Adam Hair wrote:
Milos wrote:
Adam Hair wrote:
It wasn't clear in your reply to Uly that this is what you meant.

Milos wrote: Several hundred Elo weaker than SF is due to the one important fact that you forgot in your comment. At 1min/move in the paper SF has been severely handicapped to up to probably 100Elo.
Compared to what? Game in 80 minutes?
Compared to non-handicapped SF version, i.e. TC 40/40, 64 times more hash with large pages, latest SF dev, Syz 6-men, Cerebellum.
That is most probably worth close to 100 Elo if not even more compared to how SF was used in A0-paper.
I don't necessarily agree that SF was handicapped by not having Syzygy tablebases or the Cerebellum book or the fixed time per move tc, but the relatively small amount of hash was definitely a handicap.

Why you say that? Your own test shows time control mattering quite a bit. Cerebellum book can help a lot, but maybe A0 indeed plays openings very well, it seems that the design favorst that, and the help of a book is not much. But again, endgames are another matter. I am not sure A0 converts always even KBNk mates, so 6-men Syzygy can be of significant help.

hgm · Post by **hgm** » Sat Jan 06, 2018 11:33 am

But AlphaZero was not using 6-men Syzygy, right? So it was seriously handicapped.

Adam Hair · Post by **Adam Hair** » Sat Jan 06, 2018 10:09 pm

Laskos wrote:
Adam Hair wrote:
Milos wrote:
Adam Hair wrote:
It wasn't clear in your reply to Uly that this is what you meant.

Milos wrote: Several hundred Elo weaker than SF is due to the one important fact that you forgot in your comment. At 1min/move in the paper SF has been severely handicapped to up to probably 100Elo.
Compared to what? Game in 80 minutes?
Compared to non-handicapped SF version, i.e. TC 40/40, 64 times more hash with large pages, latest SF dev, Syz 6-men, Cerebellum.
That is most probably worth close to 100 Elo if not even more compared to how SF was used in A0-paper.
I don't necessarily agree that SF was handicapped by not having Syzygy tablebases or the Cerebellum book or the fixed time per move tc, but the relatively small amount of hash was definitely a handicap.

Why you say that? Your own test shows time control mattering quite a bit. Cerebellum book can help a lot, but maybe A0 indeed plays openings very well, it seems that the design favorst that, and the help of a book is not much. But again, endgames are another matter. I am not sure A0 converts always even KBNk mates, so 6-men Syzygy can be of significant help.

It seems to me that the focus was about quality of move selection, not winning chess or shogi or go games. The lack of hash negatively impacted Stockfish's move selection. The lack of a book did not. The fixed tc did not.

I am more conflicted about about Syzygy. Miguel has not encoded certain endgame knowledge into Gaviota precisely because it can use GTBs. While I am not sure this applies to Stockfish (IIRC some endgame knowledge was not (still?) added to Stockfish for aesthetic or ideological reasons), it does seem reasonable to say that Stockfish should be able to use Syzygy.

Milos · Post by **Milos** » Sat Jan 06, 2018 11:45 pm

hgm wrote:But AlphaZero was not using 6-men Syzygy, right? So it was seriously handicapped.

A0 is not capable of using 6-men Syzygy otherwise they would state it in the paper.
It is rather ridiculous to assume that one is handicapped because it cannot use something it is not capable of using.
You seems to struggle with simple definitions of terms like handicap. I'm not sure if it is a language barrier or some other kind of handicap (pun intended

).

Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.

Now.our target it is.to beat.AO...Humans and machines v A0

Re: Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.

Re: Dear Google AlphaZero chess team.