I purged some positions and added some new more, to a total of 64, check it in the previous post. Sure, 1800 Elo performance compared to AB engines is very poor one, while the overall strength in normal games is at about 3000 Elo points. It's not a secret for most that LC0 is often caught by tactical blunders.peter wrote: ↑Fri May 11, 2018 12:42 pmSame could be said about your position nr.8 and 17 e.g., further on i didn't check till now.peter wrote: ↑Fri May 11, 2018 10:46 am Many clearly best moves could be denied then, mate in 2 against other anyhow anywhen also winning moves e.g., as Albert wrote about WAC nr.1. here:
viewtopic.php?p=761680#p761680
Yet I'd be confident with this one tactical suite too, (I was with WAC also anyhow), even if 60 positions isn't very much, that's what I meant before, not the easy level, that's ok, just comparable with WAC, I guess.
And if you think your result with LC0 with it would be 1800 Elo, as much as I doubt measuring that in such a way (you'd have to call it TCelo at least for Tactical Computer Elo, even more exact LTTCelo, LaskosTacticalTestCelo), your 1800 still would be about 1000 Elo less than what's said about LC0's game playing level at the moment, isn't it?
As for this one engine- engine- game- playing Celo- measurement, it depends on opening- books or starting-positions too very much of course.
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines?
That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.
LCZero: Progress and Scaling. Relation to CCRL Elo
Moderators: hgm, Rebel, chrisw
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
I'm interested in all results. If you did try a little already, as I think you're saying, what were the results?peter wrote: ↑Fri May 11, 2018 12:42 pm
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines? Or even better his Sharp Gambit Lines starting-positions-collection?
That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.
-
- Posts: 3186
- Joined: Sat Feb 16, 2008 7:38 am
- Full name: Peter Martan
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Having come so far till now only with your new suite, here nr.20 now:Laskos wrote: ↑Fri May 11, 2018 12:39 pm Anyway, I tried to purge and to add some new positions. 2 or 3 were even worse than the one you have showed, at longer analysis. Here are 64 maybe better (more decisive) game-changers. Maybe you can find again some ambiguity as game-changing goes for some positions, I didn't analyze to such depths as yours.
Code: Select all
8/5K2/kp6/p1p5/P2p4/1P3P2/2P5/8 b - - bm b5; id "ECM.602"; 8/2k3p1/2p4p/5P2/2K3PP/8/8/8 w - - bm g5; id "ECM.603"; 8/1kp1b3/1p4K1/4P2p/P1P3p1/5pP1/P4P2/4B3 b - - bm h4; id "ECM.604"; 8/8/3K1k2/5p1p/4p1p1/4P1P1/5PP1/8 b - - bm f4; id "ECM.606"; 4r1k1/5p1p/3q2p1/1p1P4/1P6/2p4P/2Q1nPB1/4RK2 b - - bm Ng3+; id "ECM.612"; 3q1k2/5p2/p5pN/1b2Q2P/8/8/5PPK/8 w - - bm Qh8+; id "ECM.622"; 6k1/p3b1pp/4p3/4Pp2/Pp1r1P1P/1P4P1/2p2R2/5RK1 b - - bm Rc4; id "ECM.623"; rn1q2k1/pp3pb1/3p2pp/2pP2N1/3r1P2/7Q/PP4PP/R1B2RK1 w - - bm Nxf7; id "ECM.628"; 8/6Bp/6p1/2k1p3/4PPP1/1pb4P/8/2K5 b - - bm b2+; id "ECM.629"; r4rk1/ppq3pp/2p1Pn2/4p1Q1/8/2N5/PP4PP/2KR1R2 w - - bm Rxf6; id "ECM.636"; 6k1/p4pp1/Pp2r3/1QPq3p/8/6P1/2P2P1P/1R4K1 w - - bm cxb6; id "ECM.641"; 6k1/p4pbp/Bp2p1p1/n2P4/q3P3/B1rQP3/P5PP/5RK1 w - - bm dxe6; id "ECM.642"; 8/2k5/2p5/2pb2K1/pp4P1/1P1R4/P7/8 b - - bm Bxb3; id "ECM.646"; 8/1R2P3/6k1/3B4/2P2P2/1p2r3/1Kb4p/8 w - - bm Be6; id "ECM.650"; 2kr2r1/pp2bQ1p/2b1P3/2qN4/8/1B2p2P/PPP3P1/3R1R1K b - - bm e2; id "ECM.651"; r1b2rk1/1p2qppp/p3p3/2n5/3N4/3B1R2/PPP1Q1PP/R5K1 w - - bm Bxh7+; id "ECM.652"; 6rk/3nrpbp/p1bq1npB/1p2p1N1/4P1PQ/P2B3R/1PP1N2P/5R1K w - - bm Nxh7; id "ECM.655"; 1rb2rk1/3nqppp/p1n1p3/1p1pP3/5P2/2NBQN2/PPP3PP/2KR3R w - - bm Bxh7+; id "ECM.656"; 2k5/ppp3pp/8/NQ2n2q/2Pp1n2/R4bP1/1P3P1P/4R1K1 b - - bm Qxh2+; id "ECM.657"; 2r2r2/p2qppkp/3p2p1/3P1P2/2n2R2/7R/P5PP/1B1Q2K1 w - - bm Rxh7+; id "ECM.662";
2r2r2/p2qppkp/3p2p1/3P1P2/2n2R2/7R/P5PP/1B1Q2K1 w - -
For sure Rxh7 is best move, yet 1.fxg6(?) seems to win too:
2r2r2/p2qppkp/3p2P1/3P4/2n2R2/7R/P5PP/1B1Q2K1 b - -
Engine: asmFishW_2018-05-07_popcnt0 (8192 MB)
by TypingALot
40/77 1:42 +3.01 1...hxg6 2.Bf5 Qb7 3.Bxc8 Rxc8
4.Rfh4 Qb2 5.Rh7+ Kg8 6.Rh8+ Qxh8
7.Rxh8+ Kxh8 8.Qd4+ Kg8 9.Qxa7 Kf8
10.a4 Rc5 11.Qb8+ Kg7 12.h3 Rxd5
13.Qc7 Rc5 14.Qxe7 Kg8 (3.236.869.938) 31512
Will go on checking later on, if you don't read anything more about it, your new suite is ok. for me as well as your latest but one suite was, Kai:
viewtopic.php?p=761846#p761846
I would be confident with almost any well known tactical test suite, e.g. with chessbase's Tactical Marathon too, coming along with Fritz GUIs since several of the later versions already now, guess at least since F10.
Problem with this one only is again the level. Marathon for sure isn't difficult, Zappa2 on 1 core solves 175 our of 210 at 10" per move, if I remember correctly, but LC0 with 10" and 24 CPU- threads is 19 out of 210 only.
Peter.
-
- Posts: 3186
- Joined: Sat Feb 16, 2008 7:38 am
- Full name: Peter Martan
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Peter.
-
- Posts: 3186
- Joined: Sat Feb 16, 2008 7:38 am
- Full name: Peter Martan
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Didn't store many of the games, stopped after about 10 of them already, but LC0 didn't win a single one, there were more full points than draws of Zappa Mex2 running on 2 cores against 18 CPU- threads (HT on) for LC0 and the starting positions were taken from Sharp Gambit Lines by Jeroen Noomen.jp wrote: ↑Fri May 11, 2018 1:11 pmI'm interested in all results. If you did try a little already, as I think you're saying, what were the results?peter wrote: ↑Fri May 11, 2018 12:42 pm
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines? Or even better his Sharp Gambit Lines starting-positions-collection?
That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.
TC was 60'+5".
Tried Zappa, because Thorsten Czub had a long TC- match LC0-Zappa with quite good results forLeela. (40/120 ending +8,=12,-6 for LC0)
At my try ,in 4 cases Zappa won the game with White and with Black from the same starting position.
Showed them in CSS here:
http://forum.computerschach.de/cgi-bin/ ... #pid113098
, and here:
http://forum.computerschach.de/cgi-bin/ ... #pid113037
To see each time two games in one .pgn stored together with evals, click "Zitieren" in menue (to quote) below the postings.
Thorsten's posting is two ones above mine.
Here:
http://forum.computerschach.de/cgi-bin/ ... #pid113096
Peter.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Seems not so much different, at least at short TC:jp wrote: ↑Fri May 11, 2018 1:11 pmI'm interested in all results. If you did try a little already, as I think you're saying, what were the results?peter wrote: ↑Fri May 11, 2018 12:42 pm
Ever tried e.g. Jeroen Noomen's Gambit-Lines.ctg for testing LC0 playing against other engines? Or even better his Sharp Gambit Lines starting-positions-collection?
That brings rather different results too, I can tell, having tried only a little till now, cause of course nobody would be interested in such results, at least not as for LC0 right now.
From solid, non-tactical, balanced 3movesGM opening suite, 60 games:
Code: Select all
Games Completed = 60 of 60 (Avg game length = 97.193 sec)
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD:C:\LittleBlitzer\3moves_GM_04.epd(817)
Time = 1617 sec elapsed, 0 sec remaining
1. LC0_09 MKL ID271 36.0/60 28-16-16 (L: m=16 t=0 i=0 a=0) (D: r=12 i=2 f=1 s=0 a=1) (tpm=961.8 d=11.70 nps=222)
2. Jabba 1.0 24.0/60 16-28-16 (L: m=28 t=0 i=0 a=0) (D: r=12 i=2 f=1 s=0 a=1) (tpm=802.3 d=9.40 nps=0)
From NoomenSharpGambit 2015 30 positions (side and reversed), 60 games
Code: Select all
Games Completed = 60 of 60 (Avg game length = 78.420 sec)
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/PGN:C:\LittleBlitzer\SharpGambits2015.pgn(30)
Time = 1305 sec elapsed, 0 sec remaining
1. LC0_09 MKL ID271 33.5/60 28-21-11 (L: m=21 t=0 i=0 a=0) (D: r=10 i=1 f=0 s=0 a=0) (tpm=954.8 d=11.71 nps=402)
2. Jabba 1.0 26.5/60 21-28-11 (L: m=28 t=0 i=0 a=0) (D: r=10 i=1 f=0 s=0 a=0) (tpm=803.9 d=9.12 nps=0)
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Thanks! I eliminated this one and another two or three, and added several more for same 64 positions total. I hope there are very few by now, maybe 1-2, which are not clear-cut unique game-changers. Interestingly that what seemed to be like a serious tactical suite, namely ECM, contains about 70% non-unique or non-game changing or plainly wrong solutions. Here are these 64 cleaned positions:
Code: Select all
8/5K2/kp6/p1p5/P2p4/1P3P2/2P5/8 b - - bm b5; id "ECM.602";
8/2k3p1/2p4p/5P2/2K3PP/8/8/8 w - - bm g5; id "ECM.603";
8/1kp1b3/1p4K1/4P2p/P1P3p1/5pP1/P4P2/4B3 b - - bm h4; id "ECM.604";
8/8/3K1k2/5p1p/4p1p1/4P1P1/5PP1/8 b - - bm f4; id "ECM.606";
4r1k1/5p1p/3q2p1/1p1P4/1P6/2p4P/2Q1nPB1/4RK2 b - - bm Ng3+; id "ECM.612";
3q1k2/5p2/p5pN/1b2Q2P/8/8/5PPK/8 w - - bm Qh8+; id "ECM.622";
6k1/p3b1pp/4p3/4Pp2/Pp1r1P1P/1P4P1/2p2R2/5RK1 b - - bm Rc4; id "ECM.623";
rn1q2k1/pp3pb1/3p2pp/2pP2N1/3r1P2/7Q/PP4PP/R1B2RK1 w - - bm Nxf7; id "ECM.628";
8/6Bp/6p1/2k1p3/4PPP1/1pb4P/8/2K5 b - - bm b2+; id "ECM.629";
r4rk1/ppq3pp/2p1Pn2/4p1Q1/8/2N5/PP4PP/2KR1R2 w - - bm Rxf6; id "ECM.636";
6k1/p4pp1/Pp2r3/1QPq3p/8/6P1/2P2P1P/1R4K1 w - - bm cxb6; id "ECM.641";
6k1/p4pbp/Bp2p1p1/n2P4/q3P3/B1rQP3/P5PP/5RK1 w - - bm dxe6; id "ECM.642";
8/2k5/2p5/2pb2K1/pp4P1/1P1R4/P7/8 b - - bm Bxb3; id "ECM.646";
8/1R2P3/6k1/3B4/2P2P2/1p2r3/1Kb4p/8 w - - bm Be6; id "ECM.650";
2kr2r1/pp2bQ1p/2b1P3/2qN4/8/1B2p2P/PPP3P1/3R1R1K b - - bm e2; id "ECM.651";
r1b2rk1/1p2qppp/p3p3/2n5/3N4/3B1R2/PPP1Q1PP/R5K1 w - - bm Bxh7+; id "ECM.652";
6rk/3nrpbp/p1bq1npB/1p2p1N1/4P1PQ/P2B3R/1PP1N2P/5R1K w - - bm Nxh7; id "ECM.655";
1rb2rk1/3nqppp/p1n1p3/1p1pP3/5P2/2NBQN2/PPP3PP/2KR3R w - - bm Bxh7+; id "ECM.656";
2k5/ppp3pp/8/NQ2n2q/2Pp1n2/R4bP1/1P3P1P/4R1K1 b - - bm Qxh2+; id "ECM.657";
r4rk1/pp2q1p1/4b2p/2ppb3/6n1/2P3N1/PPQBBPPP/R4RK1 b - - bm Nxh2; id "ECM.667";
3rr1k1/1pq1nppp/p1p2b2/4pB2/2QPP3/P1P1B3/1P4PP/3R1RK1 w - - bm Bxh7+; id "ECM.680";
2rrn1k1/2q2ppp/p2pp3/1p2P1P1/4B3/P5Q1/1PP3PP/R4R1K w - - bm Bxh7+; id "ECM.682";
r2q3r/2pkb1p1/p2p1n2/4p1p1/Pp2P1P1/1QP5/1P1P2PP/RNB2RK1 b - - bm Rxh2; id "ECM.683";
r4rk1/pp1n1ppp/3qp3/3nN1P1/b2P4/P2B1Q2/3B1P1P/1R2R1K1 w - - bm Bxh7+; id "ECM.687";
r5k1/6bp/2q1p1p1/p2pP3/3P4/1rP2QP1/3B1PK1/2R4R w - - bm Rxh7; id "ECM.689";
r2qrnk1/4bppp/b1p5/1p1p2P1/p2P1N1P/2NBP3/PPQ2P2/2K3RR w - - bm Bxh7+; id "ECM.693";
rn1q1rk1/pppbb1pp/4p3/3pP1p1/3P3P/2NB4/PPP2PP1/R2QK2R w KQ - bm Bxh7+; id "ECM.694";
r2q1rk1/3n1ppp/8/1pbP2P1/p1N4P/PnBBPQ2/5P2/R3K2R w KQ - bm Bxh7+; id "ECM.697";
3r2k1/p1R2p2/4pQp1/1q5p/5P1P/1PR5/2Pr2P1/6K1 b - - bm Rxg2+; id "ECM.700";
3r2k1/pb5p/1p2qpp1/8/2p5/1P1nP3/P1N2PPP/1Q1R1R1K b - - bm Bxg2+; id "ECM.703";
4rrk1/2qb2pp/p5P1/1p2p3/1b2P3/2N5/PPPQ4/1K1R2R1 w - - bm gxh7+; id "ECM.704";
2r1r1k1/5ppp/p3pn2/1pb1N3/2P5/1PQ3R1/PB2qPPP/3R2K1 w - - bm Rxg7+; id "ECM.708";
r4rk1/p2n2p1/1q1Qpn1p/1P6/P6B/2p5/2B1KP1P/R5R1 w - - bm Rxg7+; id "ECM.711";
r1qb1r1k/2p3pp/p1n1bp2/1p1Np2Q/P3P3/1BP3R1/1P3PPP/R1B3K1 w - - bm Rxg7; id "ECM.717";
r2r3k/5bp1/2p2N2/5P1p/3q3Q/3B2R1/n5PP/3R3K w - - bm Rxg7; id "ECM.720";
r4rk1/1p1q1ppp/p1b4B/8/2R3R1/P2P4/1b1N1QPP/6K1 w - - bm Bxg7; id "ECM.723";
rq3rk1/3b1ppp/p2bp3/3pB2Q/8/1B5P/PP3PP1/2RR2K1 w - - bm Bxg7; id "ECM.724";
2rr2k1/4bppp/p1n1p3/3q4/1p1P2N1/2P3R1/P3QPPP/2B2RK1 w - - bm Nh6+; id "ECM.727";
rq1r1bk1/1b3pp1/3pn2p/1n2BN1P/1P2P3/3R1NP1/3Q1PB1/2R3K1 w - - bm Bxg7; id "ECM.728";
r1bqkbnr/pp2ppp1/2p4p/3n2N1/2BP4/5N2/PPP2PPP/R1BQK2R w KQkq - bm Nxf7; id "ECM.731";
r2qr1k1/1ppb1p1p/p1np2p1/7Q/3PP2b/1B2N2P/PP3PP1/R1B2RK1 w - - bm Bxf7+; id "ECM.732";
r3r1k1/1bq1nppp/p1np4/1ppBpN2/4P3/2PP1N2/PP3PPP/R2QR1K1 w - - bm Bxf7+; id "ECM.743";
2r1r1k1/1pq1bp1p/p3pnp1/P2n2N1/7R/2P4P/1PB1QPP1/2B1R1K1 w - - bm Nxf7; id "ECM.748";
r1bq2k1/pp1n1ppp/3b1n2/PQ1B3r/3N1P2/2N5/1PP3PP/R1B2RK1 w - - bm Bxf7+; id "ECM.749";
2r1r1k1/5ppp/pq3b2/2pB1P2/2p2B2/5Q1P/Pn3PP1/2R1R1K1 w - - bm Bxf7+; id "ECM.750";
r4rk1/ppRn1p2/6pb/2P1pq1p/3N4/P1QPn1Pb/1B1NPP1P/4R1KB b - - bm Qxf2+; id "ECM.751";
r3kr2/1b2qp2/pp2p2N/4p2Q/8/2n5/P3B1PP/3R1R1K w q - bm Nxf7; id "ECM.752";
b2r1rk1/pq2bpp1/1p2p2p/4N2n/2P2R2/1PB2N2/1P2QPPP/4R1K1 w - - bm Rxf7; id "ECM.753";
rqb1k2r/1p1nbp1p/p4pp1/8/1PBN1P2/P1N1P3/7P/2RQ1RK1 w kq - bm Bxf7+; id "ECM.754";
1r2q1k1/p3pp2/3p1bp1/2pP2N1/8/P5PB/2Q2PK1/1rBR4 w - - bm Nxf7; id "ECM.756";
1qr1b1k1/4bpp1/pn2p2p/1p1nN3/3P4/P2BBN1Q/1P3PPP/4R1K1 w - - bm Bxh6; id "ECM.772";
rr1q2k1/1p2bpp1/2p1p2p/P1Pn4/2NP4/3Q1RP1/5PKP/2B1R3 w - - bm Bxh6; id "ECM.773";
2r5/1p4bk/3p2rp/4pN2/1P2P1pR/2P2q2/QP6/1K5R w - - bm Rxh6+; id "ECM.775";
r1b1r3/pp2Npbk/3pp2p/q5p1/2QNPP2/6P1/PPP3P1/2KR3R w - - bm Ndf5; id "ECM.776";
4r1k1/p1pq1pp1/2p5/3p1b2/Q7/2P1B2P/P1P1rPP1/2R2RK1 b - - bm Bxh3; id "ECM.778";
6rk/3b1n1p/1p1q3b/1PpNp3/2P1Pp2/2Q2NrP/5RP1/2R2B1K b - - bm Bxh3; id "ECM.783";
r2q1rk1/ppp2pp1/1b2b2p/3n3Q/2Bp4/3P1N2/PPP2PPP/R1B1R1K1 w - - bm Bxh6; id "ECM.784";
r3rbk1/1bp1qpp1/p6p/np2p2Q/4P2N/1BP4P/PP3PP1/R1B1R1K1 w - - bm Bg5; id "ECM.785";
4q3/p2r1ppk/R6p/3n4/3B1Q2/4P2P/5PP1/6K1 w - - bm Rxh6+; id "ECM.786";
2r1r1k1/pb1n1pp1/1p1qpn1p/4N1B1/2PP4/3B4/P2Q1PPP/3RR1K1 w - - bm Bxh6; id "ECM.789";
r1b2rk1/pp2bpp1/4p2p/2q4Q/5nNB/2PB4/PP3PPP/2KR3R w - - bm Nxh6+; id "ECM.794";
r2r2k1/pp1n1bp1/2p2p1p/b4N2/q2BR3/2QB2PP/1PP5/2KR4 w - - bm Nxh6+; id "ECM.797";
3r2bk/1q4p1/p2P1N1p/2p1rP2/pb5R/7P/1P4P1/2Q2RK1 w - - bm Rxh6+; id "ECM.798";
6R1/6Q1/3q2p1/5p1p/P3p1k1/1P1r2P1/5PK1/8 b - - bm Rxg3+; id "ECM.800";
Code: Select all
0.2s 5s
Stockfish 9 4 threads: 59/64 64/64
Fruit 2.1 1 thread: 40/64 55/64
BikJump 2.01 1 thread 36/64 49/64
Pred 2.2.1 1 thread: 19/64 38/64
LC0_08 ID271 4 threads: 20/64 32/64
-
- Posts: 1766
- Joined: Wed Jun 03, 2009 12:14 am
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
any idea how much this would improve if hardware was at alphazero's level? i thought i'd read 80kn/s for AZ, but i have no idea how accurate that is or if it directly compares.
basically i'm wondering how poor the tactics really are compared to what can be realistically hoped; could AZ have been, for example, ~2100 tactically & ~3600 positionally? with or without hardware compensation.
basically i'm wondering how poor the tactics really are compared to what can be realistically hoped; could AZ have been, for example, ~2100 tactically & ~3600 positionally? with or without hardware compensation.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
It is plausible that A0 in the presented paper match conditions against SF8, on these tactical test suites was below Fruit 2.1 level. But positionally it might have been extraordinarily good. The tactical positions of these test suites not solved by Fruit at LTC are rarely occurring in "regular games", and without a sharp lines opening book, in fact with no book at all, as it was played, most of the games are "regular", not involving too many deep tactical shots. So, we might be happy if on GTX 1060, LC0 reaches sometime in the future 2300 or so CCRL level on tactical shots test suites, but being extremely strong positionally. Let's see.yanquis1972 wrote: ↑Fri May 11, 2018 6:51 pm any idea how much this would improve if hardware was at alphazero's level? i thought i'd read 80kn/s for AZ, but i have no idea how accurate that is or if it directly compares.
basically i'm wondering how poor the tactics really are compared to what can be realistically hoped; could AZ have been, for example, ~2100 tactically & ~3600 positionally? with or without hardware compensation.
-
- Posts: 13447
- Joined: Wed Mar 08, 2006 9:02 pm
- Location: Dallas, Texas
- Full name: Matthew Hull
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Basically the same point I've made before. Test suites may not contain representative positions from typical games, especially from self-play training.Laskos wrote: ↑Fri May 11, 2018 7:40 pmIt is plausible that A0 in the presented paper match conditions against SF8, on these tactical test suites was below Fruit 2.1 level. But positionally it might have been extraordinarily good. The tactical positions of these test suites not solved by Fruit at LTC are rarely occurring in "regular games", and without a sharp lines opening book, in fact with no book at all, as it was played, most of the games are "regular", not involving too much tactical shots. So, we might be happy if on GTX 1060, LC0 reaches sometime in the future 2300 or so CCRL level on tactical test suites, but being extremely strong positionally. Let's see.yanquis1972 wrote: ↑Fri May 11, 2018 6:51 pm any idea how much this would improve if hardware was at alphazero's level? i thought i'd read 80kn/s for AZ, but i have no idea how accurate that is or if it directly compares.
basically i'm wondering how poor the tactics really are compared to what can be realistically hoped; could AZ have been, for example, ~2100 tactically & ~3600 positionally? with or without hardware compensation.
Another related issue is forcing L0 to play from positions it never played into, i.e. an imposed opening book. If L0 is tested with its preferred opening choices, do these tactical holes become less/more?
I have argued eloquently and in vain to allow L0 to play all its own moves and not impose book lines upon it (in testing gauntlets). Forced books will skew Elo estimates in unknown ways.
But people have the CCRL-style testing (stripped/hobbled-engine) deeply ingrained in their thinking and one cannot blast them out of it. There is no persuading them.
That's not to say there is no value in forcing L0 to play test positions but it should be compared to letting it play all moves of a game, not just middle/endgame. There would be value in that comparison.
Matthew Hull