LCZero: Progress and Scaling. Relation to CCRL Elo

noobpwnftw · Post by **noobpwnftw** » Sun May 20, 2018 11:32 pm

Are you sure that at low node count condition the new networks perform better but scale worse is not because of over-fitting?

jkiliani · Post by **jkiliani** » Sun May 20, 2018 11:44 pm

noobpwnftw wrote: ↑Sun May 20, 2018 11:32 pm Are you sure that at low node count condition the new networks perform better but scale worse is not because of over-fitting?

The value head of recent nets used to be still very slight worse than that of 237. With 321, this may not be the case anymore according to some measurements. The reduced quality of the value head after the regression phase was indeed because of over-fitting, but it has been recovering starting from 287 and getting better ever since. When completely recovered (which may or may not already have happened), scaling properties should also be fully restored, and the new nets will become better at any time control.

Laskos · Post by **Laskos** » Mon May 21, 2018 12:05 am

jkiliani wrote: ↑Sun May 20, 2018 10:14 pm
Laskos wrote: ↑Sun May 20, 2018 8:24 pm It seems to be a bit more complicated. At 1'+ 1'' on GTX 1060 with LC0 CUDA, latest nets seem even stronger than NN237. But at 15'+ 15'', NN237 seems stronger. I left NN319 against Komodo 10.2, it lost 5 games in a row due to tactical blunders. Eval graph was also unstable. I interrupted the match and reverted to NN237, and in 5 games until now, there are 2 wins of Komodo and 3 draws. Only one game was lost due to blunder. Still waiting for one win of LC0 in 10 games. The sample is too small, but I saw a similar thing in games against Houdini 1.5a. It seems NN237 scales better with TC or playouts, having a better value head eval. It is strange, as at nodes=1, latest nets are some 150-200 Elo points stronger than NN237. Really, they have to roll back to v0.7 engine, the current nets are trained in some schizophrenic way with v0.10.
There was a discussion about rollback on Discord yesterday, it isn't happening. At low node counts (800), current nets are far stronger than Id 237, although as you observed they don't scale quite as well (yet). But the quality of the value head is still improving, which is also the deciding factor in determining scaling properties. I'm not too worried this won't fix itself in the end, since we're going to upgrade to a 256x20 network eventually when there's no more improvement on the 192x15 architecture. Lc0 beating Komodo on your setup may not be happening yet, but I'm optimistic that it will soon, either still on 192x15 or at the latest once we go 256x20 (the AlphaZero size).

I hope the quality of the value head will continue to improve substantially, as for now it seems very strange that at nodes=1 the NN320 is almost 200 Elo points stronger than NN237, but scales worse and tactically it is still weaker. There seem to be a lot of room of improvement with 192x15 net, with a bug-fixed engine. I already got a win from LC0 CUDA NN237 in game 7 against Komodo 10.2 at 15'+ 15'' time control, the score for LC0 against Komodo 10.2 is +1 -3 =3, or about 100 Elo points difference, putting LC0 NN237 in these longer time control conditions above 3200 CCRL 40/4' Elo level, as in the games in similar conditions against Houdini 1.5a. You have to keep in mind that longer the TC, better is the rating of LC0 (at least with NN237 value head), because it scales better. Here is the game won by LC0 against Komodo 10.2. Komodo is a tough opponent, as it has a very good eval of the initial phases of the games and imbalances, where LC0 usually gains large advantages against weaker opponents. But in this game, Komodo 10.2 was pretty clueless of what happens up to move 35, considering that it has a large advantage, while the game was proceeding very well for LC0. The match will end in 2-3 hours with a total of 10 games. Anyway, I am pretty amazed by these performances against Houdini 1.5a and Komodo 10.2 at longer TC, some of recent top dogs, and objections that the samples are too small don't impress me. All in all combined (I have some other games played), LC0 CUDA NN237 with Albert's settings on GTX 1060 6GB on 2 CPU threads performs unexpectedly well to me at longer TC.

[Event "My Tournament"]
[Site "?"]
[Date "2018.05.20"]
[Round "7"]
[White "LC0_GPU_CUDA"]
[Black "Komodo 10.2"]
[Result "1-0"]
[FEN "r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPPQPPP/RNB1KB1R w KQkq - 0 1"]
[PlyCount "151"]
[SetUp "1"]
[TimeControl "900+15"]

1. g3 {+0.09/2 49s} Bc5 {+0.10/25 58s} 2. c3 {+0.11/2 52s} Nf6 {+0.06/25 37s}
3. Bg2 {+0.14/2 34s} O-O {+0.07/26 42s} 4. b4 {+0.15/2 46s} Bb6 {+0.02/25 49s}
5. O-O {+0.14/2 63s} a6 {+0.09/26 44s} 6. d3 {+0.15/2 42s} Re8 {+0.10/26 49s}
7. Bg5 {+0.22/2 39s} d6 {+0.07/26 65s} 8. Nbd2 {+0.21/2 34s} h6 {+0.08/27 47s}
9. Bh4 {+0.14/2 60s} g5 {+0.74/24 25s} 10. Nxg5 {0.00/2 20s} hxg5 {+0.38/25 44s}
11. Bxg5 {0.00/2 8.8s} Bg4 {+0.58/24 15s} 12. Bf3 {+0.04/2 38s}
Be6 {+0.52/24 45s} 13. Nc4 {+0.20/2 38s} Ba7 {+0.41/25 62s}
14. Ne3 {+0.29/2 42s} Kg7 {+0.43/24 16s} 15. Ng4 {+0.25/2 22s}
Bxg4 {+0.57/25 20s} 16. Bxg4 {+0.22/2 6.7s} Rh8 {+0.49/25 30s}
17. Kg2 {+0.21/2 31s} Qe8 {+0.44/25 73s} 18. f4 {+0.20/2 34s}
Nxg4 {+0.69/25 18s} 19. Qxg4 {+0.73/2 32s} Qc8 {+0.50/26 31s}
20. f5 {+0.59/2 33s} Kf8 {+0.80/21 15s} 21. Rad1 {+0.46/2 44s}
Qd7 {+0.72/25 45s} 22. h4 {+0.86/2 34s} Rh7 {+0.74/24 17s} 23. a4 {+0.67/2 48s}
Re8 {+0.62/23 63s} 24. Qf3 {+0.54/2 48s} Rh8 {+0.62/25 45s}
25. Qe2 {+0.71/2 42s} Ne7 {+0.71/24 35s} 26. f6 {+0.66/2 32s} Nc8 {+0.64/23 18s}
27. Qd2 {+0.83/2 47s} Rd8 {+0.76/22 13s} 28. Bh6+ {+0.72/2 21s}
Ke8 {+0.92/24 10s} 29. Bg7 {+0.73/2 9.1s} Rh7 {+1.03/26 18s}
30. Qe2 {+0.81/2 12s} Qxa4 {+1.12/23 21s} 31. h5 {+1.02/2 28s}
Qd7 {+0.43/24 54s} 32. Qf3 {+1.03/2 24s} Qe6 {+0.95/20 13s} 33. h6 {+1.12/2 17s}
Nb6 {+1.12/23 39s} 34. d4 {+1.22/2 9.0s} c6 {+0.44/22 59s} 35. g4 {+1.42/2 26s}
exd4 {-0.18/25 78s} 36. cxd4 {+1.76/2 25s} Kd7 {-0.62/25 68s}
37. g5 {+1.67/2 17s} Rdh8 {-0.90/21 27s} 38. Qh3 {+2.49/2 36s}
Qxh3+ {-0.52/22 25s} 39. Kxh3 {+2.83/2 25s} Nc4 {-0.67/25 33s}
40. Rfe1 {+2.94/2 42s} Nb2 {-0.12/26 17s} 41. e5 {+3.18/2 17s}
Ke6 {-1.07/20 8.3s} 42. exd6+ {+2.95/2 33s} Kxd6 {-1.49/22 7.5s}
43. Rd2 {+3.05/2 5.0s} Nc4 {-1.40/24 5.7s} 44. Rdd1 {+3.22/2 5.6s}
Nb2 {-1.39/26 16s} 45. Rb1 {+4.05/2 20s} Bxd4 {-1.29/24 17s}
46. Re4 {+4.22/2 17s} c5 {-1.35/24 11s} 47. bxc5+ {+5.93/2 23s}
Kxc5 {-1.70/24 37s} 48. Rxd4 {+6.43/2 15s} Kxd4 {-2.95/19 4.4s}
49. Rxb2 {+6.61/2 8.8s} b5 {-4.63/22 27s} 50. Kg4 {+6.98/2 12s}
Rb8 {-4.35/22 7.1s} 51. g6 {+8.27/2 23s} fxg6 {-6.32/24 23s}
52. f7+ {+8.92/2 35s} Kd5 {-6.48/20 4.4s} 53. Re2 {+10.80/2 32s}
Rf8 {-7.31/22 12s} 54. Bxf8 {+11.30/2 16s} Rxf7 {-7.52/24 6.6s}
55. Bg7 {+11.45/2 7.9s} Rf1 {-12.01/25 36s} 56. Rh2 {+11.67/2 17s}
Rg1+ {-12.07/24 5.3s} 57. Kf4 {+12.07/2 31s} Rf1+ {-12.07/20 19s}
58. Kg5 {+12.06/2 20s} Rg1+ {-12.07/25 7.0s} 59. Kf6 {+12.08/2 10s}
Rf1+ {-12.07/21 16s} 60. Kxg6 {+12.91/2 22s} Rg1+ {-12.10/26 12s}
61. Kf7 {+13.08/2 14s} Rc1 {-250.00/25 22s} 62. Rh5+ {+12.87/2 33s}
Ke4 {-250.00/21 18s} 63. h7 {+12.75/2 19s} Rc8 {-M40/23 18s}
64. Re5+ {+16.10/2 27s} Kf3 {-M34/21 1.1s} 65. Re8 {+17.93/2 22s}
Rxe8 {-M32/21 0.95s} 66. Kxe8 {+17.83/2 13s} Ke3 {-M32/21 1.5s}
67. Kd7 {+19.28/2 31s} Kf4 {-M30/20 1.6s} 68. h8=Q {+20.38/2 42s}
b4 {-M26/20 2.6s} 69. Ke6 {+21.24/3 26s} Kg3 {-M16/20 3.1s}
70. Kf5 {+25.35/2 18s} Kf2 {-M10/35 1.7s} 71. Ke4 {+35.99/3 13s}
Kg2 {-M8/99 0.59s} 72. Be5 {+51.55/2 14s} b3 {-M6/99 0.037s}
73. Qh2+ {+M75/2 13s} Kf1 {-M4/5 0s} 74. Kf3 {+122.28/2 9.9s} a5 {-M2/99 0.008s}
75. Qe2+ {+127.03/2 19s} Kg1 {-M2/5 0s} 76. Qg2# {+128.00/2 12s, White mates}
1-0

Albert Silver · Post by **Albert Silver** » Mon May 21, 2018 12:42 am

Laskos wrote: ↑Mon May 21, 2018 12:05 am
jkiliani wrote: ↑Sun May 20, 2018 10:14 pm
Laskos wrote: ↑Sun May 20, 2018 8:24 pm It seems to be a bit more complicated. At 1'+ 1'' on GTX 1060 with LC0 CUDA, latest nets seem even stronger than NN237. But at 15'+ 15'', NN237 seems stronger. I left NN319 against Komodo 10.2, it lost 5 games in a row due to tactical blunders. Eval graph was also unstable. I interrupted the match and reverted to NN237, and in 5 games until now, there are 2 wins of Komodo and 3 draws. Only one game was lost due to blunder. Still waiting for one win of LC0 in 10 games. The sample is too small, but I saw a similar thing in games against Houdini 1.5a. It seems NN237 scales better with TC or playouts, having a better value head eval. It is strange, as at nodes=1, latest nets are some 150-200 Elo points stronger than NN237. Really, they have to roll back to v0.7 engine, the current nets are trained in some schizophrenic way with v0.10.
There was a discussion about rollback on Discord yesterday, it isn't happening. At low node counts (800), current nets are far stronger than Id 237, although as you observed they don't scale quite as well (yet). But the quality of the value head is still improving, which is also the deciding factor in determining scaling properties. I'm not too worried this won't fix itself in the end, since we're going to upgrade to a 256x20 network eventually when there's no more improvement on the 192x15 architecture. Lc0 beating Komodo on your setup may not be happening yet, but I'm optimistic that it will soon, either still on 192x15 or at the latest once we go 256x20 (the AlphaZero size).
I hope the quality of the value head will continue to improve substantially, as for now it seems very strange that at nodes=1 the NN320 is almost 200 Elo points stronger than NN237, but scales worse and tactically it is still weaker.

Not sure why, but NN321 is now the best in tactics, beating NN237 in WAC Revised by one position. Also interesting, is that the 20x256 Net scored the same in tactics to NN237, when both are tested with LC0 Optimized. I had not expected this frankly, with half speed.

jp · Post by jp » Mon May 21, 2018 10:45 am

Albert Silver wrote: ↑Mon May 21, 2018 12:42 am Not sure why, but NN321 is now the best in tactics, beating NN237 in WAC Revised by one position. Also interesting, is that the 20x256 Net scored the same in tactics to NN237, when both are tested with LC0 Optimized. I had not expected this frankly, with half speed.

What is the 20x256 Net??

Laskos · Post by **Laskos** » Tue May 22, 2018 9:01 am

Albert Silver wrote: ↑Mon May 21, 2018 12:42 am
Laskos wrote: ↑Mon May 21, 2018 12:05 am
jkiliani wrote: ↑Sun May 20, 2018 10:14 pm
There was a discussion about rollback on Discord yesterday, it isn't happening. At low node counts (800), current nets are far stronger than Id 237, although as you observed they don't scale quite as well (yet). But the quality of the value head is still improving, which is also the deciding factor in determining scaling properties. I'm not too worried this won't fix itself in the end, since we're going to upgrade to a 256x20 network eventually when there's no more improvement on the 192x15 architecture. Lc0 beating Komodo on your setup may not be happening yet, but I'm optimistic that it will soon, either still on 192x15 or at the latest once we go 256x20 (the AlphaZero size).
I hope the quality of the value head will continue to improve substantially, as for now it seems very strange that at nodes=1 the NN320 is almost 200 Elo points stronger than NN237, but scales worse and tactically it is still weaker.
Not sure why, but NN321 is now the best in tactics, beating NN237 in WAC Revised by one position. Also interesting, is that the 20x256 Net scored the same in tactics to NN237, when both are tested with LC0 Optimized. I had not expected this frankly, with half speed.

The later NNs are already some 3200 CCRL Elo level on GTX 1060 at even short 1m + 1s time control. Here is the result against Houdini 1.5a (3170 CCRL):

Code: Select all

1m + 1s
Score of LC0_CUDA_NN322 vs Houdini 1.5a: 43 - 33 - 24 [0.550]
Elo difference: 34.86 +/- 60.09

100 of 100 games finished.

At TC like 15m + 15s, the CCRL rating is even higher, maybe 3250 or so. The snag is that these later nets scale not that well with TC (or playouts) compared to NN237.

jkiliani · Post by **jkiliani** » Tue May 22, 2018 9:49 pm

Laskos wrote: ↑Tue May 22, 2018 9:01 am At TC like 15m + 15s, the CCRL rating is even higher, maybe 3250 or so. The snag is that these later nets scale not that well with TC (or playouts) compared to NN237.

You might try Id 329 sometime. A test of the value head today yielded the best result of any network so far, for 329, which is a very promising indication that the net may also scale very well.

Laskos · Post by **Laskos** » Tue May 22, 2018 11:45 pm

jkiliani wrote: ↑Tue May 22, 2018 9:49 pm
Laskos wrote: ↑Tue May 22, 2018 9:01 am At TC like 15m + 15s, the CCRL rating is even higher, maybe 3250 or so. The snag is that these later nets scale not that well with TC (or playouts) compared to NN237.
You might try Id 329 sometime. A test of the value head today yielded the best result of any network so far, for 329, which is a very promising indication that the net may also scale very well.

Yes, I myself was curious, and upon arriving home, played LTC games to check a bit, they take time. Here is the result (I found it sufficiently interesting to post in a new thread):
http://talkchess.com/forum3/viewtopic.php?f=2&t=67537
It seems that by now the newest nets are the best at all time controls.

Kanizsa · Post by **Kanizsa** » Thu May 24, 2018 1:25 pm

Laskos wrote: ↑Sun Apr 01, 2018 11:37 am
peter wrote:Hi Robin!
CheckersGuy wrote:That's indeed a very impressive result but that's probably what neural-nets are good at. It's kind of intresting. Weaker traditional alpha-beta engines are decent at tactics and suffer from bad positional play while with Leela0 it's the other way around
LC0 seems already close to very strong engines in this opening suite. At this pace of advancement in positional understanding, I will be very curious how it develops.
Hi Kai,
what's about your last experiments with this opening suite ?
Are last nets of LC0 (those >300) positionally better than Stockfish & Komodo ?

Albert Silver · Post by **Albert Silver** » Thu May 24, 2018 4:11 pm

Kanizsa wrote: ↑Thu May 24, 2018 1:25 pm
Laskos wrote: ↑Sun Apr 01, 2018 11:37 am Hi Robin!

LC0 seems already close to very strong engines in this opening suite. At this pace of advancement in positional understanding, I will be very curious how it develops.
Hi Kai,
what's about your last experiments with this opening suite ?
Are last nets of LC0 (those >300) positionally better than Stockfish & Komodo ?

Offhand, I'd say maybe, but that is a very speculative maybe. One cannot remove tactics from the equation, so oversights in its calculations will affect its decisions. An argument such as ''this would be a great move if.... it didn't lose a piece" holds no water in my book.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo