LCZero: Progress and Scaling. Relation to CCRL Elo

Laskos · Post by **Laskos** » Fri May 11, 2018 12:46 pm

peter wrote: ↑Fri May 11, 2018 12:42 pm
peter wrote: ↑Fri May 11, 2018 10:46 am Many clearly best moves could be denied then, mate in 2 against other anyhow anywhen also winning moves e.g., as Albert wrote about WAC nr.1. here:
viewtopic.php?p=761680#p761680
Same could be said about your position nr.8 and 17 e.g., further on i didn't check till now.

Yet I'd be confident with this one tactical suite too, (I was with WAC also anyhow), even if 60 positions isn't very much, that's what I meant before, not the easy level, that's ok, just comparable with WAC, I guess.

And if you think your result with LC0 with it would be 1800 Elo, as much as I doubt measuring that in such a way (you'd have to call it TCelo at least for Tactical Computer Elo, even more exact LTTCelo, LaskosTacticalTestCelo), your 1800 still would be about 1000 Elo less than what's said about LC0's game playing level at the moment, isn't it?

As for this one engine- engine- game- playing Celo- measurement, it depends on opening- books or starting-positions too very much of course.
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines?

That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.

I purged some positions and added some new more, to a total of 64, check it in the previous post. Sure, 1800 Elo performance compared to AB engines is very poor one, while the overall strength in normal games is at about 3000 Elo points. It's not a secret for most that LC0 is often caught by tactical blunders.

jp · Post by jp » Fri May 11, 2018 1:11 pm

peter wrote: ↑Fri May 11, 2018 12:42 pm
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines? Or even better his Sharp Gambit Lines starting-positions-collection?
That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.

I'm interested in all results. If you did try a little already, as I think you're saying, what were the results?

peter · Post by **peter** » Fri May 11, 2018 1:28 pm

Laskos wrote: ↑Fri May 11, 2018 12:39 pm Anyway, I tried to purge and to add some new positions. 2 or 3 were even worse than the one you have showed, at longer analysis. Here are 64 maybe better (more decisive) game-changers. Maybe you can find again some ambiguity as game-changing goes for some positions, I didn't analyze to such depths as yours.

Code: Select all

8/5K2/kp6/p1p5/P2p4/1P3P2/2P5/8 b - - bm b5; id "ECM.602";
8/2k3p1/2p4p/5P2/2K3PP/8/8/8 w - - bm g5; id "ECM.603";
8/1kp1b3/1p4K1/4P2p/P1P3p1/5pP1/P4P2/4B3 b - - bm h4; id "ECM.604";
8/8/3K1k2/5p1p/4p1p1/4P1P1/5PP1/8 b - - bm f4; id "ECM.606";
4r1k1/5p1p/3q2p1/1p1P4/1P6/2p4P/2Q1nPB1/4RK2 b - - bm Ng3+; id "ECM.612";
3q1k2/5p2/p5pN/1b2Q2P/8/8/5PPK/8 w - - bm Qh8+; id "ECM.622";
6k1/p3b1pp/4p3/4Pp2/Pp1r1P1P/1P4P1/2p2R2/5RK1 b - - bm Rc4; id "ECM.623";
rn1q2k1/pp3pb1/3p2pp/2pP2N1/3r1P2/7Q/PP4PP/R1B2RK1 w - - bm Nxf7; id "ECM.628";
8/6Bp/6p1/2k1p3/4PPP1/1pb4P/8/2K5 b - - bm b2+; id "ECM.629";
r4rk1/ppq3pp/2p1Pn2/4p1Q1/8/2N5/PP4PP/2KR1R2 w - - bm Rxf6; id "ECM.636";
6k1/p4pp1/Pp2r3/1QPq3p/8/6P1/2P2P1P/1R4K1 w - - bm cxb6; id "ECM.641";
6k1/p4pbp/Bp2p1p1/n2P4/q3P3/B1rQP3/P5PP/5RK1 w - - bm dxe6; id "ECM.642";
8/2k5/2p5/2pb2K1/pp4P1/1P1R4/P7/8 b - - bm Bxb3; id "ECM.646";
8/1R2P3/6k1/3B4/2P2P2/1p2r3/1Kb4p/8 w - - bm Be6; id "ECM.650";
2kr2r1/pp2bQ1p/2b1P3/2qN4/8/1B2p2P/PPP3P1/3R1R1K b - - bm e2; id "ECM.651";
r1b2rk1/1p2qppp/p3p3/2n5/3N4/3B1R2/PPP1Q1PP/R5K1 w - - bm Bxh7+; id "ECM.652";
6rk/3nrpbp/p1bq1npB/1p2p1N1/4P1PQ/P2B3R/1PP1N2P/5R1K w - - bm Nxh7; id "ECM.655";
1rb2rk1/3nqppp/p1n1p3/1p1pP3/5P2/2NBQN2/PPP3PP/2KR3R w - - bm Bxh7+; id "ECM.656";
2k5/ppp3pp/8/NQ2n2q/2Pp1n2/R4bP1/1P3P1P/4R1K1 b - - bm Qxh2+; id "ECM.657";
2r2r2/p2qppkp/3p2p1/3P1P2/2n2R2/7R/P5PP/1B1Q2K1 w - - bm Rxh7+; id "ECM.662";

Having come so far till now only with your new suite, here nr.20 now:

2r2r2/p2qppkp/3p2p1/3P1P2/2n2R2/7R/P5PP/1B1Q2K1 w - -

For sure Rxh7 is best move, yet 1.fxg6(?) seems to win too:

2r2r2/p2qppkp/3p2P1/3P4/2n2R2/7R/P5PP/1B1Q2K1 b - -

Engine: asmFishW_2018-05-07_popcnt0 (8192 MB)
by TypingALot

40/77 1:42 +3.01 1...hxg6 2.Bf5 Qb7 3.Bxc8 Rxc8
4.Rfh4 Qb2 5.Rh7+ Kg8 6.Rh8+ Qxh8
7.Rxh8+ Kxh8 8.Qd4+ Kg8 9.Qxa7 Kf8
10.a4 Rc5 11.Qb8+ Kg7 12.h3 Rxd5
13.Qc7 Rc5 14.Qxe7 Kg8 (3.236.869.938) 31512

Will go on checking later on, if you don't read anything more about it, your new suite is ok. for me as well as your latest but one suite was, Kai:
viewtopic.php?p=761846#p761846
I would be confident with almost any well known tactical test suite, e.g. with chessbase's Tactical Marathon too, coming along with Fritz GUIs since several of the later versions already now, guess at least since F10.

Problem with this one only is again the level. Marathon for sure isn't difficult, Zappa2 on 1 core solves 175 our of 210 at 10" per move, if I remember correctly, but LC0 with 10" and 24 CPU- threads is 19 out of 210 only.

peter · Post by **peter** » Fri May 11, 2018 1:31 pm

Laskos wrote: ↑Fri May 11, 2018 12:46 pm I purged some positions and added some new more, to a total of 64, check it in the previous post.

Cross- writing again:
viewtopic.php?p=761850#p761850

peter · Post by **peter** » Fri May 11, 2018 1:42 pm

jp wrote: ↑Fri May 11, 2018 1:11 pm
peter wrote: ↑Fri May 11, 2018 12:42 pm
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines? Or even better his Sharp Gambit Lines starting-positions-collection?
That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.
I'm interested in all results. If you did try a little already, as I think you're saying, what were the results?

Didn't store many of the games, stopped after about 10 of them already, but LC0 didn't win a single one, there were more full points than draws of Zappa Mex2 running on 2 cores against 18 CPU- threads (HT on) for LC0 and the starting positions were taken from Sharp Gambit Lines by Jeroen Noomen.
TC was 60'+5".
Tried Zappa, because Thorsten Czub had a long TC- match LC0-Zappa with quite good results forLeela. (40/120 ending +8,=12,-6 for LC0)

At my try ,in 4 cases Zappa won the game with White and with Black from the same starting position.
Showed them in CSS here:
http://forum.computerschach.de/cgi-bin/ ... #pid113098
, and here:
http://forum.computerschach.de/cgi-bin/ ... #pid113037
To see each time two games in one .pgn stored together with evals, click "Zitieren" in menue (to quote) below the postings.
Thorsten's posting is two ones above mine.
Here:
http://forum.computerschach.de/cgi-bin/ ... #pid113096

Laskos · Post by **Laskos** » Fri May 11, 2018 2:43 pm

jp wrote: ↑Fri May 11, 2018 1:11 pm
peter wrote: ↑Fri May 11, 2018 12:42 pm
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines? Or even better his Sharp Gambit Lines starting-positions-collection?
That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.
I'm interested in all results. If you did try a little already, as I think you're saying, what were the results?

Seems not so much different, at least at short TC:

From solid, non-tactical, balanced 3movesGM opening suite, 60 games:

Code: Select all

Games Completed = 60 of 60 (Avg game length = 97.193 sec)
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD:C:\LittleBlitzer\3moves_GM_04.epd(817)
Time = 1617 sec elapsed, 0 sec remaining
 1.  LC0_09 MKL ID271         	36.0/60	28-16-16  	(L: m=16 t=0 i=0 a=0)	(D: r=12 i=2 f=1 s=0 a=1)	(tpm=961.8 d=11.70 nps=222)
 2.  Jabba 1.0                	24.0/60	16-28-16  	(L: m=28 t=0 i=0 a=0)	(D: r=12 i=2 f=1 s=0 a=1)	(tpm=802.3 d=9.40 nps=0)

From NoomenSharpGambit 2015 30 positions (side and reversed), 60 games

Code: Select all

Games Completed = 60 of 60 (Avg game length = 78.420 sec)
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/PGN:C:\LittleBlitzer\SharpGambits2015.pgn(30)
Time = 1305 sec elapsed, 0 sec remaining
 1.  LC0_09 MKL ID271         	33.5/60	28-21-11  	(L: m=21 t=0 i=0 a=0)	(D: r=10 i=1 f=0 s=0 a=0)	(tpm=954.8 d=11.71 nps=402)
 2.  Jabba 1.0                	26.5/60	21-28-11  	(L: m=28 t=0 i=0 a=0)	(D: r=10 i=1 f=0 s=0 a=0)	(tpm=803.9 d=9.12 nps=0)

Less drawish, but the score in the second, more tactical Gambit opening suite is not that lower for LC0.

Laskos · Post by **Laskos** » Fri May 11, 2018 6:15 pm

peter wrote: ↑Fri May 11, 2018 1:42 pm

Thanks! I eliminated this one and another two or three, and added several more for same 64 positions total. I hope there are very few by now, maybe 1-2, which are not clear-cut unique game-changers. Interestingly that what seemed to be like a serious tactical suite, namely ECM, contains about 70% non-unique or non-game changing or plainly wrong solutions. Here are these 64 cleaned positions:

Code: Select all

 8/5K2/kp6/p1p5/P2p4/1P3P2/2P5/8 b - - bm b5; id "ECM.602";
8/2k3p1/2p4p/5P2/2K3PP/8/8/8 w - - bm g5; id "ECM.603";
8/1kp1b3/1p4K1/4P2p/P1P3p1/5pP1/P4P2/4B3 b - - bm h4; id "ECM.604";
8/8/3K1k2/5p1p/4p1p1/4P1P1/5PP1/8 b - - bm f4; id "ECM.606";
4r1k1/5p1p/3q2p1/1p1P4/1P6/2p4P/2Q1nPB1/4RK2 b - - bm Ng3+; id "ECM.612";
3q1k2/5p2/p5pN/1b2Q2P/8/8/5PPK/8 w - - bm Qh8+; id "ECM.622";
6k1/p3b1pp/4p3/4Pp2/Pp1r1P1P/1P4P1/2p2R2/5RK1 b - - bm Rc4; id "ECM.623";
rn1q2k1/pp3pb1/3p2pp/2pP2N1/3r1P2/7Q/PP4PP/R1B2RK1 w - - bm Nxf7; id "ECM.628";
8/6Bp/6p1/2k1p3/4PPP1/1pb4P/8/2K5 b - - bm b2+; id "ECM.629";
r4rk1/ppq3pp/2p1Pn2/4p1Q1/8/2N5/PP4PP/2KR1R2 w - - bm Rxf6; id "ECM.636";
6k1/p4pp1/Pp2r3/1QPq3p/8/6P1/2P2P1P/1R4K1 w - - bm cxb6; id "ECM.641";
6k1/p4pbp/Bp2p1p1/n2P4/q3P3/B1rQP3/P5PP/5RK1 w - - bm dxe6; id "ECM.642";
8/2k5/2p5/2pb2K1/pp4P1/1P1R4/P7/8 b - - bm Bxb3; id "ECM.646";
8/1R2P3/6k1/3B4/2P2P2/1p2r3/1Kb4p/8 w - - bm Be6; id "ECM.650";
2kr2r1/pp2bQ1p/2b1P3/2qN4/8/1B2p2P/PPP3P1/3R1R1K b - - bm e2; id "ECM.651";
r1b2rk1/1p2qppp/p3p3/2n5/3N4/3B1R2/PPP1Q1PP/R5K1 w - - bm Bxh7+; id "ECM.652";
6rk/3nrpbp/p1bq1npB/1p2p1N1/4P1PQ/P2B3R/1PP1N2P/5R1K w - - bm Nxh7; id "ECM.655";
1rb2rk1/3nqppp/p1n1p3/1p1pP3/5P2/2NBQN2/PPP3PP/2KR3R w - - bm Bxh7+; id "ECM.656";
2k5/ppp3pp/8/NQ2n2q/2Pp1n2/R4bP1/1P3P1P/4R1K1 b - - bm Qxh2+; id "ECM.657";
r4rk1/pp2q1p1/4b2p/2ppb3/6n1/2P3N1/PPQBBPPP/R4RK1 b - - bm Nxh2; id "ECM.667";
3rr1k1/1pq1nppp/p1p2b2/4pB2/2QPP3/P1P1B3/1P4PP/3R1RK1 w - - bm Bxh7+; id "ECM.680";
2rrn1k1/2q2ppp/p2pp3/1p2P1P1/4B3/P5Q1/1PP3PP/R4R1K w - - bm Bxh7+; id "ECM.682";
r2q3r/2pkb1p1/p2p1n2/4p1p1/Pp2P1P1/1QP5/1P1P2PP/RNB2RK1 b - - bm Rxh2; id "ECM.683";
r4rk1/pp1n1ppp/3qp3/3nN1P1/b2P4/P2B1Q2/3B1P1P/1R2R1K1 w - - bm Bxh7+; id "ECM.687";
r5k1/6bp/2q1p1p1/p2pP3/3P4/1rP2QP1/3B1PK1/2R4R w - - bm Rxh7; id "ECM.689";
r2qrnk1/4bppp/b1p5/1p1p2P1/p2P1N1P/2NBP3/PPQ2P2/2K3RR w - - bm Bxh7+; id "ECM.693";
rn1q1rk1/pppbb1pp/4p3/3pP1p1/3P3P/2NB4/PPP2PP1/R2QK2R w KQ - bm Bxh7+; id "ECM.694";
r2q1rk1/3n1ppp/8/1pbP2P1/p1N4P/PnBBPQ2/5P2/R3K2R w KQ - bm Bxh7+; id "ECM.697";
3r2k1/p1R2p2/4pQp1/1q5p/5P1P/1PR5/2Pr2P1/6K1 b - - bm Rxg2+; id "ECM.700";
3r2k1/pb5p/1p2qpp1/8/2p5/1P1nP3/P1N2PPP/1Q1R1R1K b - - bm Bxg2+; id "ECM.703";
4rrk1/2qb2pp/p5P1/1p2p3/1b2P3/2N5/PPPQ4/1K1R2R1 w - - bm gxh7+; id "ECM.704";
2r1r1k1/5ppp/p3pn2/1pb1N3/2P5/1PQ3R1/PB2qPPP/3R2K1 w - - bm Rxg7+; id "ECM.708";
r4rk1/p2n2p1/1q1Qpn1p/1P6/P6B/2p5/2B1KP1P/R5R1 w - - bm Rxg7+; id "ECM.711";
r1qb1r1k/2p3pp/p1n1bp2/1p1Np2Q/P3P3/1BP3R1/1P3PPP/R1B3K1 w - - bm Rxg7; id "ECM.717";
r2r3k/5bp1/2p2N2/5P1p/3q3Q/3B2R1/n5PP/3R3K w - - bm Rxg7; id "ECM.720";
r4rk1/1p1q1ppp/p1b4B/8/2R3R1/P2P4/1b1N1QPP/6K1 w - - bm Bxg7; id "ECM.723";
rq3rk1/3b1ppp/p2bp3/3pB2Q/8/1B5P/PP3PP1/2RR2K1 w - - bm Bxg7; id "ECM.724";
2rr2k1/4bppp/p1n1p3/3q4/1p1P2N1/2P3R1/P3QPPP/2B2RK1 w - - bm Nh6+; id "ECM.727";
rq1r1bk1/1b3pp1/3pn2p/1n2BN1P/1P2P3/3R1NP1/3Q1PB1/2R3K1 w - - bm Bxg7; id "ECM.728";
r1bqkbnr/pp2ppp1/2p4p/3n2N1/2BP4/5N2/PPP2PPP/R1BQK2R w KQkq - bm Nxf7; id "ECM.731";
r2qr1k1/1ppb1p1p/p1np2p1/7Q/3PP2b/1B2N2P/PP3PP1/R1B2RK1 w - - bm Bxf7+; id "ECM.732";
r3r1k1/1bq1nppp/p1np4/1ppBpN2/4P3/2PP1N2/PP3PPP/R2QR1K1 w - - bm Bxf7+; id "ECM.743";
2r1r1k1/1pq1bp1p/p3pnp1/P2n2N1/7R/2P4P/1PB1QPP1/2B1R1K1 w - - bm Nxf7; id "ECM.748";
r1bq2k1/pp1n1ppp/3b1n2/PQ1B3r/3N1P2/2N5/1PP3PP/R1B2RK1 w - - bm Bxf7+; id "ECM.749";
2r1r1k1/5ppp/pq3b2/2pB1P2/2p2B2/5Q1P/Pn3PP1/2R1R1K1 w - - bm Bxf7+; id "ECM.750";
r4rk1/ppRn1p2/6pb/2P1pq1p/3N4/P1QPn1Pb/1B1NPP1P/4R1KB b - - bm Qxf2+; id "ECM.751";
r3kr2/1b2qp2/pp2p2N/4p2Q/8/2n5/P3B1PP/3R1R1K w q - bm Nxf7; id "ECM.752";
b2r1rk1/pq2bpp1/1p2p2p/4N2n/2P2R2/1PB2N2/1P2QPPP/4R1K1 w - - bm Rxf7; id "ECM.753";
rqb1k2r/1p1nbp1p/p4pp1/8/1PBN1P2/P1N1P3/7P/2RQ1RK1 w kq - bm Bxf7+; id "ECM.754";
1r2q1k1/p3pp2/3p1bp1/2pP2N1/8/P5PB/2Q2PK1/1rBR4 w - - bm Nxf7; id "ECM.756";
1qr1b1k1/4bpp1/pn2p2p/1p1nN3/3P4/P2BBN1Q/1P3PPP/4R1K1 w - - bm Bxh6; id "ECM.772";
rr1q2k1/1p2bpp1/2p1p2p/P1Pn4/2NP4/3Q1RP1/5PKP/2B1R3 w - - bm Bxh6; id "ECM.773";
2r5/1p4bk/3p2rp/4pN2/1P2P1pR/2P2q2/QP6/1K5R w - - bm Rxh6+; id "ECM.775";
r1b1r3/pp2Npbk/3pp2p/q5p1/2QNPP2/6P1/PPP3P1/2KR3R w - - bm Ndf5; id "ECM.776";
4r1k1/p1pq1pp1/2p5/3p1b2/Q7/2P1B2P/P1P1rPP1/2R2RK1 b - - bm Bxh3; id "ECM.778";
6rk/3b1n1p/1p1q3b/1PpNp3/2P1Pp2/2Q2NrP/5RP1/2R2B1K b - - bm Bxh3; id "ECM.783";
r2q1rk1/ppp2pp1/1b2b2p/3n3Q/2Bp4/3P1N2/PPP2PPP/R1B1R1K1 w - - bm Bxh6; id "ECM.784";
r3rbk1/1bp1qpp1/p6p/np2p2Q/4P2N/1BP4P/PP3PP1/R1B1R1K1 w - - bm Bg5; id "ECM.785";
4q3/p2r1ppk/R6p/3n4/3B1Q2/4P2P/5PP1/6K1 w - - bm Rxh6+; id "ECM.786";
2r1r1k1/pb1n1pp1/1p1qpn1p/4N1B1/2PP4/3B4/P2Q1PPP/3RR1K1 w - - bm Bxh6; id "ECM.789";
r1b2rk1/pp2bpp1/4p2p/2q4Q/5nNB/2PB4/PP3PPP/2KR3R w - - bm Nxh6+; id "ECM.794";
r2r2k1/pp1n1bp1/2p2p1p/b4N2/q2BR3/2QB2PP/1PP5/2KR4 w - - bm Nxh6+; id "ECM.797";
3r2bk/1q4p1/p2P1N1p/2p1rP2/pb5R/7P/1P4P1/2Q2RK1 w - - bm Rxh6+; id "ECM.798";
6R1/6Q1/3q2p1/5p1p/P3p1k1/1P1r2P1/5PK1/8 b - - bm Rxg3+; id "ECM.800";

Fast test results:

Code: Select all

                          0.2s          5s
                          
Stockfish 9   4 threads: 59/64       64/64
Fruit 2.1     1 thread:  40/64       55/64
BikJump 2.01  1 thread   36/64       49/64 
Pred 2.2.1    1 thread:  19/64       38/64
LC0_08 ID271  4 threads: 20/64       32/64

Fruit 2.1 is about 2700 CCRL Elo, BikJump is 2100, Predateur 1800. LC0 doesn't seem to scale better than AB engines, and is close to 1800 CCRL Elo level. That the suite is very easy is not a problem for me, it's not easy for LC0, and I want to see whether it will improve. Also, in even serious LTC games, LC0 is making silly tactical blunders visible even with my naked eye. On my positional test suite, it is above 3300 Elo level in CCRL conditions, and overall about 3000 CCRL Elo level. That's why, when playing normal games against similar in strength opposition, it often blunders like a patzer, playing very sound positional chess. Well, aside the recently discovered bug. I hope they will fix it soon.

yanquis1972 · Post by **yanquis1972** » Fri May 11, 2018 6:51 pm

any idea how much this would improve if hardware was at alphazero's level? i thought i'd read 80kn/s for AZ, but i have no idea how accurate that is or if it directly compares.

basically i'm wondering how poor the tactics really are compared to what can be realistically hoped; could AZ have been, for example, ~2100 tactically & ~3600 positionally? with or without hardware compensation.

Laskos · Post by **Laskos** » Fri May 11, 2018 7:40 pm

yanquis1972 wrote: ↑Fri May 11, 2018 6:51 pm any idea how much this would improve if hardware was at alphazero's level? i thought i'd read 80kn/s for AZ, but i have no idea how accurate that is or if it directly compares.

basically i'm wondering how poor the tactics really are compared to what can be realistically hoped; could AZ have been, for example, ~2100 tactically & ~3600 positionally? with or without hardware compensation.

It is plausible that A0 in the presented paper match conditions against SF8, on these tactical test suites was below Fruit 2.1 level. But positionally it might have been extraordinarily good. The tactical positions of these test suites not solved by Fruit at LTC are rarely occurring in "regular games", and without a sharp lines opening book, in fact with no book at all, as it was played, most of the games are "regular", not involving too many deep tactical shots. So, we might be happy if on GTX 1060, LC0 reaches sometime in the future 2300 or so CCRL level on tactical shots test suites, but being extremely strong positionally. Let's see.

mhull · Post by **mhull** » Fri May 11, 2018 7:52 pm

Laskos wrote: ↑Fri May 11, 2018 7:40 pm
yanquis1972 wrote: ↑Fri May 11, 2018 6:51 pm any idea how much this would improve if hardware was at alphazero's level? i thought i'd read 80kn/s for AZ, but i have no idea how accurate that is or if it directly compares.

basically i'm wondering how poor the tactics really are compared to what can be realistically hoped; could AZ have been, for example, ~2100 tactically & ~3600 positionally? with or without hardware compensation.
It is plausible that A0 in the presented paper match conditions against SF8, on these tactical test suites was below Fruit 2.1 level. But positionally it might have been extraordinarily good. The tactical positions of these test suites not solved by Fruit at LTC are rarely occurring in "regular games", and without a sharp lines opening book, in fact with no book at all, as it was played, most of the games are "regular", not involving too much tactical shots. So, we might be happy if on GTX 1060, LC0 reaches sometime in the future 2300 or so CCRL level on tactical test suites, but being extremely strong positionally. Let's see.

Basically the same point I've made before. Test suites may not contain representative positions from typical games, especially from self-play training.

Another related issue is forcing L0 to play from positions it never played into, i.e. an imposed opening book. If L0 is tested with its preferred opening choices, do these tactical holes become less/more?

I have argued eloquently and in vain to allow L0 to play all its own moves and not impose book lines upon it (in testing gauntlets). Forced books will skew Elo estimates in unknown ways.

But people have the CCRL-style testing (stripped/hobbled-engine) deeply ingrained in their thinking and one cannot blast them out of it. There is no persuading them.

That's not to say there is no value in forcing L0 to play test positions but it should be compared to letting it play all moves of a game, not just middle/endgame. There would be value in that comparison.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo