Scaling of Lc0 at high Leela Ratio
Posted: Tue Nov 27, 2018 8:54 am
"Leela Ratio" notion is overused and often misleading, but in general terms, with my very strong RTX 2070 GPU and mediocre CPU, an OCed 4-core i7-4790, I have a high "Leela Ratio". "Leela Ratio" is an abused term, as it depends on time control (Lc0 gains speed steadily with TC), and it only relates to NPS. But we don't know what the effective speed-up is on 64 threads used to get 70 million NPS reported for SF8 in A0 paper, we don't know what the effective speed-up is on 4 TPUs used to get 80 thousand NPS for A0. Also, we don't know on what position these speeds were achieved.
But let's say my "effective Leela Ratio" is about ~ 2-3 in the games I performed, depending somewhat on time control and positions. This is pretty high. In some range of time controls, Lc0 ID11261 is stronger than even recently released Stockfish 10 dev on all 4 cores, being the strongest engine overall on my PC. But only on some range of usual time controls. I tried to check the scaling of Lc0 ID11261 compared to SF10. And it came as non-trivial.
Games:
Lc0 ID11261 on 2 CPU threads and RTX 2070 GPU.
SF10 on 4 threads on 4 i7-4790 OCed cores.
No Syzygy are used during the games, but the games are adjudicated in Cutechess-Cli by 6-men Syzygy. I used no Syzygy during the games because during long tests, my machine with 16GB RAM can have instabilities using 6-men Syzygy from SSD. I use no swap file, because when using 6-men Syzygy during the games, most other applications are moved from RAM to HDD swap file, and the PC is barely moving, aside the playing chess engines. But without a swap file, in 1-2 days of tests with Syzygy, my PC can start giving warnings and even crash.
I have the following results at different time controls Lc0 vs SF10 (not very many games, though):
0.25m + 0.25s
Score of lc0_v19_11261 vs SF_10: 19 - 26 - 55 [0.465] 100
Elo difference: -24.36 +/- 45.86
1m + 1s
Score of lc0_v19_11261 vs SF_10: 20 - 17 - 63 [0.515] 100
Elo difference: 10.43 +/- 41.55
Finished match
4m + 4s
Score of lc0_v19_11261 vs SF_10: 7 - 12 - 81 [0.475] 100
Elo difference: -17.39 +/- 29.59
Finished match
16m + 16s
Score of lc0_v19_11261 vs SF_10: 0 - 4 - 16 [0.400] 20
Elo difference: -70.44 +/- 64.16
Finished match
Time control is going in factors of 4x from 0.25 minutes + 0.25 seconds increment, up to 16 minutes + 16 seconds increment time controls. In this, latter LTC test, the nodes per move are about 0.5-1 million for Lc0, even higher in the first phase of the game, where more time per move is often allotted.
The plot of these performances is here:
We see that Lc0 scales better than SF 10 only at short TC, and is the strongest engine (stronger than SF 10 on 4 cores) in the time control range of about 30s + 0.5s to 2m + 2s. But the scaling to longer TC is worse than that of SF 10, becoming significantly weaker than SF 10 at 16m + 16s. So, to LTC, Lc0 seems to scale worse than SF 10. Interesting that for Komodo MCTS I observed the same kind of behavior. Either both MCTS searches are tuned to fairly short time controls in both cases, or MCTS generally scales worse to LTC than finely tuned and pruned AB.
But let's say my "effective Leela Ratio" is about ~ 2-3 in the games I performed, depending somewhat on time control and positions. This is pretty high. In some range of time controls, Lc0 ID11261 is stronger than even recently released Stockfish 10 dev on all 4 cores, being the strongest engine overall on my PC. But only on some range of usual time controls. I tried to check the scaling of Lc0 ID11261 compared to SF10. And it came as non-trivial.
Games:
Lc0 ID11261 on 2 CPU threads and RTX 2070 GPU.
SF10 on 4 threads on 4 i7-4790 OCed cores.
No Syzygy are used during the games, but the games are adjudicated in Cutechess-Cli by 6-men Syzygy. I used no Syzygy during the games because during long tests, my machine with 16GB RAM can have instabilities using 6-men Syzygy from SSD. I use no swap file, because when using 6-men Syzygy during the games, most other applications are moved from RAM to HDD swap file, and the PC is barely moving, aside the playing chess engines. But without a swap file, in 1-2 days of tests with Syzygy, my PC can start giving warnings and even crash.
I have the following results at different time controls Lc0 vs SF10 (not very many games, though):
0.25m + 0.25s
Score of lc0_v19_11261 vs SF_10: 19 - 26 - 55 [0.465] 100
Elo difference: -24.36 +/- 45.86
1m + 1s
Score of lc0_v19_11261 vs SF_10: 20 - 17 - 63 [0.515] 100
Elo difference: 10.43 +/- 41.55
Finished match
4m + 4s
Score of lc0_v19_11261 vs SF_10: 7 - 12 - 81 [0.475] 100
Elo difference: -17.39 +/- 29.59
Finished match
16m + 16s
Score of lc0_v19_11261 vs SF_10: 0 - 4 - 16 [0.400] 20
Elo difference: -70.44 +/- 64.16
Finished match
Time control is going in factors of 4x from 0.25 minutes + 0.25 seconds increment, up to 16 minutes + 16 seconds increment time controls. In this, latter LTC test, the nodes per move are about 0.5-1 million for Lc0, even higher in the first phase of the game, where more time per move is often allotted.
The plot of these performances is here:
We see that Lc0 scales better than SF 10 only at short TC, and is the strongest engine (stronger than SF 10 on 4 cores) in the time control range of about 30s + 0.5s to 2m + 2s. But the scaling to longer TC is worse than that of SF 10, becoming significantly weaker than SF 10 at 16m + 16s. So, to LTC, Lc0 seems to scale worse than SF 10. Interesting that for Komodo MCTS I observed the same kind of behavior. Either both MCTS searches are tuned to fairly short time controls in both cases, or MCTS generally scales worse to LTC than finely tuned and pruned AB.