Some properties of Lc0 playing

Laskos · Post by **Laskos** » Fri Sep 14, 2018 9:31 pm

1/ Check for history planes usefulness and importance

AFAIK Leela uses 8 moves to fill her history planes, and the game-play is dependent on them. While people seem to have observed that in test-suites it plays a role, I wanted to check the importance of this aspect using regular 8-mover opening positions. I used GM2600.pgn to depth=16 ples in Cutechess-Cli and an EPD file which recorded GM2600.pgn FEN positions after 16 plies, to compare the game-play of Leela on these two kinds of opening formats. Although the starting positions are the same, on file fills history planes, another not. For regular engines it makes almost no difference (just in time control usage), Leela might exhibit improvement using the PGN file. All experiments are using Lc0 v17 with ID11261.

Total 400+400=800 games for each case (PGN and EPD).

gm2600.pgn, PGN file to depth=16 plies.

Score of lc0_v17 vs Andscacs: 167 - 118 - 115 [0.561] 400
Elo difference: 42.78 +/- 28.90
Finished match

Score of lc0_v17 vs Andscacs: 159 - 131 - 110 [0.535] 400
Elo difference: 24.36 +/- 29.01
Finished match
=============================
+33.6 +/- 10 1SD

gm2600_16.epd, EPD file from the PGN after 16 plies.

Score of lc0_v17 vs Andscacs: 140 - 155 - 105 [0.481] 400
Elo difference: -13.03 +/- 29.28
Finished match

Score of lc0_v17 vs Andscacs: 134 - 150 - 116 [0.480] 400
Elo difference: -13.90 +/- 28.73
Finished match
==============================
-13.5 +/- 10 1SD

Difference in performance of Lc0 between PGN and EPD opening files (same positions):
+47 +/- 14 1SD
Significant improvement of Lc0 play by about 50 Elo points by using history planes for opening positions. That could explain some of discrepancies occuring in testing Lc0.

2/ Check for scaling of Lc0 at fixed nodes

Fixed time scaling is somewhat hard to extrapolate from STC to LTC, because my NPS for Lc0 keep increasing even after say 15 seconds search, only stabilizing somewhat at 30s or more. As I cannot test at such long times per move, I used fixed nodes scaling, which plays a more significant role at LTC, where the speeds of both Lc0 and AB engines are pretty stable.

I used 1:1000 ratio comparing to Stockfish, or 0.875 Leela Ratio.

SF8:

Lc0 v17 ID11261 versus Stockfih 8, fixed nodes.

100 : 100,000

Score of lc0_v17 vs SF8: 13 - 55 - 32 [0.290] 100
Elo difference: -155.54
W/L = 4.2
Normalized Elo = -0.59

1,000 : 1,000,000

Score of lc0_v17 vs SF8: 19 - 38 - 43 [0.405] 100
Elo difference: -66.82
W/L = 2.0
Normalized Elo = -0.26

10,000 : 10,000,000

Score of lc0_v17 vs SF8: 9 - 24 - 67 [0.425] 100
Elo difference: -52.51
W/L = 2.7
Normalized Elo = -0.27

As we know, the scaling of chess engines is not so much an Elo difference with time control (or nodes), as the increasing draw rate with LTC compresses Elo differences. Studies had shown that scaling is best represented either by Normalized Elo of Michel Van den Bergh
http://hardy.uhasselt.be/Toga/normalized_elo.pdf
or by Wilo (Wins/Losses ratio, or drawless Elo). Or somewhere in between these two. Observing the results, one can say that Lc0 scales significantly better than SF8 going from 100 to 1,000 nodes, but scales similarly or even worse than SF8 going from 1,000 to 10,000 nodes. That is, with Leela ratio of 0.875.

SF dev:

Lc0 v17 ID11261 versus Stockfih dev, fixed nodes.

100 : 100,000

Score of lc0_v17 vs SF_dev: 10 - 59 - 31 [0.255] 100
Elo difference: -186.25
W/L = 5.9
Normalized Elo = -0.73

1,000 : 1,000,000

Score of lc0_v17 vs SF_dev: 10 - 43 - 47 [0.335] 100
Elo difference: -119.11
W/L = 4.3
Normalized Elo = -0.51

10,000 : 10,000,000

Score of lc0_v17 vs SF_dev: 4 - 30 - 66 [0.370] 100
Elo difference: -92.46
W/L = 7.5
Normalized Elo = -0.50

The same conclusion as with SF8.

All in all, to longer TC, Lc0 doesn't seem to scale better than Stockfish (using Leela Ratio 0.875). This is observed in CCCC too, as both the GPU and the CPU there have an effective speed about 8x higher than my PC, but Lc0 fares almost exactly as on my PC (GPU versus full CPU), the level of Fire 7.1. There seem to be little scaling difference going to long TC / big hardware between Lc0 and regular engines.

chrisw · Post by **chrisw** » Fri Sep 14, 2018 10:21 pm

Laskos wrote: ↑Fri Sep 14, 2018 9:31 pm 1/ Check for history planes usefulness and importance

AFAIK Leela uses 8 moves to fill her history planes, and the game-play is dependent on them. While people seem to have observed that in test-suites it plays a role, I wanted to check the importance of this aspect using regular 8-mover opening positions. I used GM2600.pgn to depth=16 ples in Cutechess-Cli and an EPD file which recorded GM2600.pgn FEN positions after 16 plies, to compare the game-play of Leela on these two kinds of opening formats. Although the starting positions are the same, on file fills history planes, another not. For regular engines it makes almost no difference (just in time control usage), Leela might exhibit improvement using the PGN file. All experiments are using Lc0 v17 with ID11261.

Total 400+400=800 games for each case (PGN and EPD).

gm2600.pgn, PGN file to depth=16 plies.

Score of lc0_v17 vs Andscacs: 167 - 118 - 115 [0.561] 400
Elo difference: 42.78 +/- 28.90
Finished match

Score of lc0_v17 vs Andscacs: 159 - 131 - 110 [0.535] 400
Elo difference: 24.36 +/- 29.01
Finished match
=============================
+33.6 +/- 10 1SD

gm2600_16.epd, EPD file from the PGN after 16 plies.

Score of lc0_v17 vs Andscacs: 140 - 155 - 105 [0.481] 400
Elo difference: -13.03 +/- 29.28
Finished match

Score of lc0_v17 vs Andscacs: 134 - 150 - 116 [0.480] 400
Elo difference: -13.90 +/- 28.73
Finished match
==============================
-13.5 +/- 10 1SD

Difference in performance of Lc0 between PGN and EPD opening files (same positions):
+47 +/- 14 1SD
Significant improvement of Lc0 play by about 50 Elo points by using history planes for opening positions. That could explain some of discrepancies occuring in testing Lc0.

2/ Check for scaling of Lc0 at fixed nodes

Fixed time scaling is somewhat hard to extrapolate from STC to LTC, because my NPS for Lc0 keep increasing even after say 15 seconds search, only stabilizing somewhat at 30s or more. As I cannot test at such long times per move, I used fixed nodes scaling, which plays a more significant role at LTC, where the speeds of both Lc0 and AB engines are pretty stable.

I used 1:1000 ratio comparing to Stockfish, or 0.875 Leela Ratio.

SF8:

Lc0 v17 ID11261 versus Stockfih 8, fixed nodes.

100 : 100,000

Score of lc0_v17 vs SF8: 13 - 55 - 32 [0.290] 100
Elo difference: -155.54
W/L = 4.2
Normalized Elo = -0.59

1,000 : 1,000,000

Score of lc0_v17 vs SF8: 19 - 38 - 43 [0.405] 100
Elo difference: -66.82
W/L = 2.0
Normalized Elo = -0.26

10,000 : 10,000,000

Score of lc0_v17 vs SF8: 9 - 24 - 67 [0.425] 100
Elo difference: -52.51
W/L = 2.7
Normalized Elo = -0.27
As we know, the scaling of chess engines is not so much an Elo difference with time control (or nodes), as the increasing draw rate with LTC compresses Elo differences. Studies had shown that scaling is best represented either by Normalized Elo of Michel Van den Bergh
http://hardy.uhasselt.be/Toga/normalized_elo.pdf
or by Wilo (Wins/Losses ratio, or drawless Elo). Or somewhere in between these two. Observing the results, one can say that Lc0 scales significantly better than SF8 going from 100 to 1,000 nodes, but scales similarly or even worse than SF8 going from 1,000 to 10,000 nodes. That is, with Leela ratio of 0.875.

SF dev:

Lc0 v17 ID11261 versus Stockfih dev, fixed nodes.

100 : 100,000

Score of lc0_v17 vs SF_dev: 10 - 59 - 31 [0.255] 100
Elo difference: -186.25
W/L = 5.9
Normalized Elo = -0.73

1,000 : 1,000,000

Score of lc0_v17 vs SF_dev: 10 - 43 - 47 [0.335] 100
Elo difference: -119.11
W/L = 4.3
Normalized Elo = -0.51

10,000 : 10,000,000

Score of lc0_v17 vs SF_dev: 4 - 30 - 66 [0.370] 100
Elo difference: -92.46
W/L = 7.5
Normalized Elo = -0.50
The same conclusion as with SF8.

All in all, to longer TC, Lc0 doesn't seem to scale better than Stockfish (using Leela Ratio 0.875). This is observed in CCCC too, as both the GPU and the CPU there have an effective speed about 8x higher than my PC, but Lc0 fares almost exactly as on my PC (GPU versus full CPU), the level of Fire 7.1. There seem to be little scaling difference going to long TC / big hardware between Lc0 and regular engines.

Well, for starters, history provides trajectory information, where and/or if the position is going, move by move. Possibly its a good idea to move towards the region you've been already building towards. Move = policy network. Maybe.

I didn't meet one person yet in computer chess that took the following idea in any sort of positive way. It usually got the equivalent of "you are mad", followed by silence. However, I am sticking with it.

Chess positions provide two physical dimensions. Mass (the amount of chess piece) and distance (where it is on the board). In physics you can only do so much with mass and distance. Most concepts require a time dimension. Energy, momentum bla di bla. History provides time dimension. We have no idea what a neural net can do with mdt, but opened up are the possibilities of more parallel concepts to those in physics and strong chess players use concepts from physics.

Javier Ros · Post by **Javier Ros** » Fri Sep 14, 2018 11:06 pm

Laskos wrote: ↑Fri Sep 14, 2018 9:31 pm 1/ Check for history planes usefulness and importance

AFAIK Leela uses 8 moves to fill her history planes, and the game-play is dependent on them. While people seem to have observed that in test-suites it plays a role, I wanted to check the importance of this aspect using regular 8-mover opening positions. I used GM2600.pgn to depth=16 ples in Cutechess-Cli and an EPD file which recorded GM2600.pgn FEN positions after 16 plies, to compare the game-play of Leela on these two kinds of opening formats. Although the starting positions are the same, on file fills history planes, another not. For regular engines it makes almost no difference (just in time control usage), Leela might exhibit improvement using the PGN file. All experiments are using Lc0 v17 with ID11261.

Total 400+400=800 games for each case (PGN and EPD).

gm2600.pgn, PGN file to depth=16 plies.

2/ Check for scaling of Lc0 at fixed nodes

Fixed time scaling is somewhat hard to extrapolate from STC to LTC, because my NPS for Lc0 keep increasing even after say 15 seconds search, only stabilizing somewhat at 30s or more. As I cannot test at such long times per move, I used fixed nodes scaling, which plays a more significant role at LTC, where the speeds of both Lc0 and AB engines are pretty stable.

I used 1:1000 ratio comparing to Stockfish, or 0.875 Leela Ratio.

SF8:

Lc0 v17 ID11261 versus Stockfih 8, fixed nodes.

100 : 100,000

Score of lc0_v17 vs SF8: 13 - 55 - 32 [0.290] 100
Elo difference: -155.54
W/L = 4.2
Normalized Elo = -0.59

Very interesting experiments!

I have a doubt. In the first experiment, Which was the time control of the games?
In the second experiment, Which opening book was employed?

Laskos · Post by **Laskos** » Fri Sep 14, 2018 11:20 pm

Javier Ros wrote: ↑Fri Sep 14, 2018 11:06 pm
Laskos wrote: ↑Fri Sep 14, 2018 9:31 pm 1/ Check for history planes usefulness and importance

AFAIK Leela uses 8 moves to fill her history planes, and the game-play is dependent on them. While people seem to have observed that in test-suites it plays a role, I wanted to check the importance of this aspect using regular 8-mover opening positions. I used GM2600.pgn to depth=16 ples in Cutechess-Cli and an EPD file which recorded GM2600.pgn FEN positions after 16 plies, to compare the game-play of Leela on these two kinds of opening formats. Although the starting positions are the same, on file fills history planes, another not. For regular engines it makes almost no difference (just in time control usage), Leela might exhibit improvement using the PGN file. All experiments are using Lc0 v17 with ID11261.

Total 400+400=800 games for each case (PGN and EPD).

gm2600.pgn, PGN file to depth=16 plies.

2/ Check for scaling of Lc0 at fixed nodes

Fixed time scaling is somewhat hard to extrapolate from STC to LTC, because my NPS for Lc0 keep increasing even after say 15 seconds search, only stabilizing somewhat at 30s or more. As I cannot test at such long times per move, I used fixed nodes scaling, which plays a more significant role at LTC, where the speeds of both Lc0 and AB engines are pretty stable.

I used 1:1000 ratio comparing to Stockfish, or 0.875 Leela Ratio.

SF8:

Lc0 v17 ID11261 versus Stockfih 8, fixed nodes.

100 : 100,000

Score of lc0_v17 vs SF8: 13 - 55 - 32 [0.290] 100
Elo difference: -155.54
W/L = 4.2
Normalized Elo = -0.59

Very interesting experiments!

I have a doubt. In the first experiment, Which was the time control of the games?
In the second experiment, Which opening book was employed?

First experiment: 0.2s/move, short time control.
In the second 8-mover balanced PGN book.

Laskos · Post by **Laskos** » Sat Sep 22, 2018 1:07 pm

Another feature: the graph of performance versus eval for Lc0 and SF dev equaled in strength by adjusting the time control for each. I used 3s per move for Lc0 ID11261 on GTX 1060 and 1.2s per move for SF dev on one i7 core. This is a longer time control than I used previously in these graphs, but I only managed to have 200 games:

Score of lc0_v17 vs SF_dev: 40 - 42 - 118 [0.495] 200
Elo difference: -3.47 +/- 30.64
Finished match

They are almost equal in strength in these conditions, so no distortions due to the difference in absolute strength are expected.

[/url]

The eval graph of Lc0 is, as before, shifted to the right, meaning that it is too optimistic. And not only shifted, it has a bit different shape compared to SF dev eval curve.
In short:

A SF eval of +2.00 and above in the middlegame against Lc0 is an almost certain win for SF dev.
A Lc0 eval of +2.00 in the middlegame against SF dev means a performance of 67%, which is still closer to draw than to win. Say, 50% draw, 42% win and 8% loss. So, Lc0 converts against SF dev only about 42% of +2.00 situations in the middlegames to wins.

A SF dev eval of -2.00 against Lc0 still doesn't mean a sure loss for SFdev, it shows 18% performance, which is say 68% loss, 28% draw, 4% win. So, having -2.00 eval in the middlegame against Lc0, SF dev still manages to not lose 32% of games.
A Lc0 eval of -2.00 and below in the middlegame against SF dev is an almost certain loss.

Again, keep in mind that I adjusted time controls to have Lc0 and SF dev at approximately same strength.

Some properties of Lc0 playing

Some properties of Lc0 playing

Re: Some properties of Lc0 playing

Re: Some properties of Lc0 playing

Re: Some properties of Lc0 playing

Re: Some properties of Lc0 playing