Bad scaling of AlphaZero to long time control (LTC)?
Posted: Sun Dec 09, 2018 1:42 pm
Just browsing again for 10 minutes the paper and the additional material, I stumbled upon this picture:
Initially, it didn't attract much attention from me as the description begins with:
(B) Scalability of AlphaZero with thinking time compared with Stockfish.
So, at first quick glance, I inferred that A0 scales better than SF8 at all time controls.
But, now, heck no, this is not what this reads like. They are varying only the TC used by A0 to 1/3, 1/10, 1/30, 1/100, keeping SF8 at fixed TCEC time control. As I read fast these diagrams and tables (I am very bright person, you know ), it instantly struck me that the scaling of A0 is bad, at least starting from 1/10 TCEC time, if not earlier. I don't have the results of these matches (maybe I missed them somewhere in materials posted), only for 1:1 1000 games match from single Initial Board position (pretty crazy and bad practice, I will show that later), but from picture given:
1/10 TCEC time: A0 is ~14 Elo points stronger than SF8 TCEC time with ~86% draw rate
1/3 TCEC time: A0 is ~35 Elo points stronger than SF8 TCEC time with ~85% draw rate
1/1 TCEC time (here I saw raw numbers): A0 is 52 Elo points stronger than SF8 TCEC time with 84% draw rate.
from 1/10 TCEC TC to 1 TCEC TC A0 improved only some 40 Elo points, or some 12 Elo points per doubling.
This is bad. In TCEC conditios SF and Komodo show at least 25, more likely 30 Elo points per doubling.
Also, to observe that A0 beats conclusively SF8 at 1/10 TC or "Leela Ratio" of 0.1. Lc0 needs at least a "Leela Ratio" of about 0.3 or higher to beat SF8 in the same conditions. So A0 is significantly stronger than Lc0 with good nets, but they similarly scale badly to LTC (see http://talkchess.com/forum3/viewtopic.php?f=2&t=69068 about Lc0).
One thing to consider: all the games in this "scaling graph" are from 1 Initial Board Positions. I suppose each match was of 1000 games at those long time controls from the same initial position. Pretty crazy. It's easy to verify that Initial Board position is one of the most favorable to Lc0 (and probably A0) starting positions. This set-up can distort pretty heavily the outcome in many ways, and had to be avoided. For example, using this one position one can have the impression that A0 is almost undefeatable when it is stronger. The draw rate seems also inflated. Example for Lc0:
Leela Ratio is about 2, short time control:
Initial Board position:
Score of lc0_v19_11261 vs SF8: 18 - 0 - 22 [0.725] 40
Elo difference: 168.40 +/- 67.96
Finished match
Lc0 seems undefeatable here. But from
Adam Hair's opening 4-mover PGN
Score of lc0_v19_11261 vs SF8: 16 - 7 - 17 [0.613] 40
Elo difference: 79.53 +/- 84.63
Finished match
Lc0 performs almost 100 Elo points weaker (I showed that long ago in another thread), and has a significant number of losses, even if it is significantly stronger overall.
Their scaling results were obtained in these skewed conditions, so it's possible that the picture of bad scaling of A0 to LTC is inaccurate.
Initially, it didn't attract much attention from me as the description begins with:
(B) Scalability of AlphaZero with thinking time compared with Stockfish.
So, at first quick glance, I inferred that A0 scales better than SF8 at all time controls.
But, now, heck no, this is not what this reads like. They are varying only the TC used by A0 to 1/3, 1/10, 1/30, 1/100, keeping SF8 at fixed TCEC time control. As I read fast these diagrams and tables (I am very bright person, you know ), it instantly struck me that the scaling of A0 is bad, at least starting from 1/10 TCEC time, if not earlier. I don't have the results of these matches (maybe I missed them somewhere in materials posted), only for 1:1 1000 games match from single Initial Board position (pretty crazy and bad practice, I will show that later), but from picture given:
1/10 TCEC time: A0 is ~14 Elo points stronger than SF8 TCEC time with ~86% draw rate
1/3 TCEC time: A0 is ~35 Elo points stronger than SF8 TCEC time with ~85% draw rate
1/1 TCEC time (here I saw raw numbers): A0 is 52 Elo points stronger than SF8 TCEC time with 84% draw rate.
from 1/10 TCEC TC to 1 TCEC TC A0 improved only some 40 Elo points, or some 12 Elo points per doubling.
This is bad. In TCEC conditios SF and Komodo show at least 25, more likely 30 Elo points per doubling.
Also, to observe that A0 beats conclusively SF8 at 1/10 TC or "Leela Ratio" of 0.1. Lc0 needs at least a "Leela Ratio" of about 0.3 or higher to beat SF8 in the same conditions. So A0 is significantly stronger than Lc0 with good nets, but they similarly scale badly to LTC (see http://talkchess.com/forum3/viewtopic.php?f=2&t=69068 about Lc0).
One thing to consider: all the games in this "scaling graph" are from 1 Initial Board Positions. I suppose each match was of 1000 games at those long time controls from the same initial position. Pretty crazy. It's easy to verify that Initial Board position is one of the most favorable to Lc0 (and probably A0) starting positions. This set-up can distort pretty heavily the outcome in many ways, and had to be avoided. For example, using this one position one can have the impression that A0 is almost undefeatable when it is stronger. The draw rate seems also inflated. Example for Lc0:
Leela Ratio is about 2, short time control:
Initial Board position:
Score of lc0_v19_11261 vs SF8: 18 - 0 - 22 [0.725] 40
Elo difference: 168.40 +/- 67.96
Finished match
Lc0 seems undefeatable here. But from
Adam Hair's opening 4-mover PGN
Score of lc0_v19_11261 vs SF8: 16 - 7 - 17 [0.613] 40
Elo difference: 79.53 +/- 84.63
Finished match
Lc0 performs almost 100 Elo points weaker (I showed that long ago in another thread), and has a significant number of losses, even if it is significantly stronger overall.
Their scaling results were obtained in these skewed conditions, so it's possible that the picture of bad scaling of A0 to LTC is inaccurate.