mirek wrote:Exactly, and what's even more remarkable is that according to the A0 paper (figure 2) 4xTPUs will do about 80k payouts in 1s and at 80k playots A0 is only 100 - 150 elo weaker than at 1 min / move (5000k playots)
Also 1x 1080Ti (11 TFLOPS) vs 4xTPU (180 TFLOPS) means nps gets reduced to 4.8k nps Even if we assumed that the TPU is somehow more effective flops to flops by factor of 4x the resulting 1080Ti playouts would be still close to 80k per minute. Thus to me it seems quite convincing that A0 on 1080Ti would be with good confidence max 150 elo weaker at 1min / move compared to 4xTPU configuration. (and most likely not more than 100 elo weaker)
Gee, you got it almost all wrong. Mainly because figure 2 is totally bogus.
First scaling of SF is bogus.
It can be easily demonstrated that SF8 on 64 cores when going from TC = 1s/move to 1min/move gains at least 40*6 = 240Elo.
They show in figure 60Elo?!???!
Second, A0 might scale better or worse than SF, but will get at least 200Elo, more probably over 300Elo when going from 1s/move to 1min/move.
Third, one second generation TPU is 180TFLOPS so 4 TPUs means 4x180TFLOPS (you are clearly confused by Google's misleading terminology about "Cloud TPUs" where each TPU contains 4 chips, but actual second generation TPU = Cloud TPU).
However, for actual matches Google used first gen TPUs (because 4TPUs for matches give exactly 4x nps that 1TPU used for self-play gives - 800sims in 40ns) that are actually around 92
TOPS. So one 1080Ti is approximately 40x slower than 4TPUs used for playing matches with SF.
Finally, A0 on 1080Ti might be 300Elo weaker then current SFdev on 64 cores and 1min/move, but LC0 is at least 800 Elo weaker, which means that LC0 is atm at least 500Elo weaker than A0 on the same hardware.
Will LC0 ever reach performance of A0?
I strongly doubt it, because performance of A0 is simply bogus.