AFAIK Leela uses 8 moves to fill her history planes, and the game-play is dependent on them. While people seem to have observed that in test-suites it plays a role, I wanted to check the importance of this aspect using regular 8-mover opening positions. I used GM2600.pgn to depth=16 ples in Cutechess-Cli and an EPD file which recorded GM2600.pgn FEN positions after 16 plies, to compare the game-play of Leela on these two kinds of opening formats. Although the starting positions are the same, on file fills history planes, another not. For regular engines it makes almost no difference (just in time control usage), Leela might exhibit improvement using the PGN file. All experiments are using Lc0 v17 with ID11261.
Total 400+400=800 games for each case (PGN and EPD).
- gm2600.pgn, PGN file to depth=16 plies.
Score of lc0_v17 vs Andscacs: 167 - 118 - 115 [0.561] 400
Elo difference: 42.78 +/- 28.90
Finished match
Score of lc0_v17 vs Andscacs: 159 - 131 - 110 [0.535] 400
Elo difference: 24.36 +/- 29.01
Finished match
=============================
+33.6 +/- 10 1SD
- gm2600_16.epd, EPD file from the PGN after 16 plies.
Score of lc0_v17 vs Andscacs: 140 - 155 - 105 [0.481] 400
Elo difference: -13.03 +/- 29.28
Finished match
Score of lc0_v17 vs Andscacs: 134 - 150 - 116 [0.480] 400
Elo difference: -13.90 +/- 28.73
Finished match
==============================
-13.5 +/- 10 1SD
Difference in performance of Lc0 between PGN and EPD opening files (same positions):
+47 +/- 14 1SD
Significant improvement of Lc0 play by about 50 Elo points by using history planes for opening positions. That could explain some of discrepancies occuring in testing Lc0.
2/ Check for scaling of Lc0 at fixed nodes
Fixed time scaling is somewhat hard to extrapolate from STC to LTC, because my NPS for Lc0 keep increasing even after say 15 seconds search, only stabilizing somewhat at 30s or more. As I cannot test at such long times per move, I used fixed nodes scaling, which plays a more significant role at LTC, where the speeds of both Lc0 and AB engines are pretty stable.
I used 1:1000 ratio comparing to Stockfish, or 0.875 Leela Ratio.
SF8:
- Lc0 v17 ID11261 versus Stockfih 8, fixed nodes.
100 : 100,000
Score of lc0_v17 vs SF8: 13 - 55 - 32 [0.290] 100
Elo difference: -155.54
W/L = 4.2
Normalized Elo = -0.59
1,000 : 1,000,000
Score of lc0_v17 vs SF8: 19 - 38 - 43 [0.405] 100
Elo difference: -66.82
W/L = 2.0
Normalized Elo = -0.26
10,000 : 10,000,000
Score of lc0_v17 vs SF8: 9 - 24 - 67 [0.425] 100
Elo difference: -52.51
W/L = 2.7
Normalized Elo = -0.27
http://hardy.uhasselt.be/Toga/normalized_elo.pdf
or by Wilo (Wins/Losses ratio, or drawless Elo). Or somewhere in between these two. Observing the results, one can say that Lc0 scales significantly better than SF8 going from 100 to 1,000 nodes, but scales similarly or even worse than SF8 going from 1,000 to 10,000 nodes. That is, with Leela ratio of 0.875.
SF dev:
- Lc0 v17 ID11261 versus Stockfih dev, fixed nodes.
100 : 100,000
Score of lc0_v17 vs SF_dev: 10 - 59 - 31 [0.255] 100
Elo difference: -186.25
W/L = 5.9
Normalized Elo = -0.73
1,000 : 1,000,000
Score of lc0_v17 vs SF_dev: 10 - 43 - 47 [0.335] 100
Elo difference: -119.11
W/L = 4.3
Normalized Elo = -0.51
10,000 : 10,000,000
Score of lc0_v17 vs SF_dev: 4 - 30 - 66 [0.370] 100
Elo difference: -92.46
W/L = 7.5
Normalized Elo = -0.50
All in all, to longer TC, Lc0 doesn't seem to scale better than Stockfish (using Leela Ratio 0.875). This is observed in CCCC too, as both the GPU and the CPU there have an effective speed about 8x higher than my PC, but Lc0 fares almost exactly as on my PC (GPU versus full CPU), the level of Fire 7.1. There seem to be little scaling difference going to long TC / big hardware between Lc0 and regular engines.