LCZero: Progress and Scaling. Relation to CCRL Elo

peter · Post by **peter** » Fri May 11, 2018 11:59 pm

mhull wrote: ↑Fri May 11, 2018 7:52 pm I have argued eloquently and in vain to allow L0 to play all its own moves and not impose book lines upon it (in testing gauntlets). Forced books will skew Elo estimates in unknown ways.

But people have the CCRL-style testing (stripped/hobbled-engine) deeply ingrained in their thinking and one cannot blast them out of it. There is no persuading them.

That's not to say there is no value in forcing L0 to play test positions but it should be compared to letting it play all moves of a game, not just middle/endgame. There would be value in that comparison.

I understand well, that AI- engine might need move history and has to have positions brought to board by itself, to get its full performance, "positionally" and tactically.
Yet I'm of the oldfashioned kind, judging engines' worth by the usefulness for human requirements.
If I can have an engine only for engine- engine games to watch and count the points afterwards for getting rating lists, I'm not fully happy with it.

There's an old joke of Ephraim Kishon (who by the way wrote chessbooks too), he lets "Jossele" tell: Invention of a machine that grows potatoes, harvests them, cooks them, peels them and eats them up.

Or like Chrilly Donninger used to say: watching engines play against each other is like watching the washing machine washing the laundry.

If I can't "ask" an engine for opening, middlegame and endgame positions' judging without waiting, till the positons are brought to board by AI on its own, I simply lose interest.

mhull · Post by **mhull** » Sat May 12, 2018 5:33 am

peter wrote: ↑Fri May 11, 2018 11:59 pm
mhull wrote: ↑Fri May 11, 2018 7:52 pm I have argued eloquently and in vain to allow L0 to play all its own moves and not impose book lines upon it (in testing gauntlets). Forced books will skew Elo estimates in unknown ways.

But people have the CCRL-style testing (stripped/hobbled-engine) deeply ingrained in their thinking and one cannot blast them out of it. There is no persuading them.

That's not to say there is no value in forcing L0 to play test positions but it should be compared to letting it play all moves of a game, not just middle/endgame. There would be value in that comparison.
I understand well, that AI- engine might need move history and has to have positions brought to board by itself, to get its full performance, "positionally" and tactically.
Yet I'm of the oldfashioned kind, judging engines' worth by the usefulness for human requirements.
If I can have an engine only for engine- engine games to watch and count the points afterwards for getting rating lists, I'm not fully happy with it.

There's an old joke of Ephraim Kishon (who by the way wrote chessbooks too), he lets "Jossele" tell: Invention of a machine that grows potatoes, harvests them, cooks them, peels them and eats them up.

Or like Chrilly Donninger used to say: watching engines play against each other is like watching the washing machine washing the laundry.

If I can't "ask" an engine for opening, middlegame and endgame positions' judging without waiting, till the positons are brought to board by AI on its own, I simply lose interest.

But objective testing cannot happen if assuaging human boredom is the priority.

Laskos · Post by **Laskos** » Sat May 12, 2018 7:58 am

peter · Post by **peter** » Sat May 12, 2018 8:53 am

mhull wrote: ↑Sat May 12, 2018 5:33 am But objective testing cannot happen if assuaging human boredom is the priority.

Of course you're right, especially considering the boredom- boundary of different people differ much too.

And of course it's of some thrill too, watching the development of an engine Elo- wise, especially in a way LC0 develops.

What I just wanted to say (in a little witty way) was, that any development of a chess engine, of course as objective as possible (to which postional testing belongs too for me), has to have the "use" and the usefulness for humans (for whom else?) of the engine in mind.

If there isn't anything like analysis of opening, middlegame and endgame positions working efficiently compared to other engines, the Celo (Computer- Elo, depending in its objective measurement as for LC0 even more then with other engines on opening book, hardware- time and opponents, even more on the kind of test suites as for that kind of testing) don't count as much to me per se.

Hashlearning works quite well for quite a while with modern AB- engines nowadays, yet I must admit, I didn't try much with LC0 in that way of analysing positions, I'll have to give it a better try with Forward- Backward analysing.
Are there any experiences from testers and users about that already, comparing it to "normal" engines?

jp · Post by jp » Sat May 12, 2018 11:06 am

peter wrote: ↑Sat May 12, 2018 8:53 am
mhull wrote: ↑Sat May 12, 2018 5:33 am But objective testing cannot happen if assuaging human boredom is the priority.
What I just wanted to say (in a little witty way) was, that any development of a chess engine, of course as objective as possible (to which postional testing belongs too for me), has to have the "use" and the usefulness for humans (for whom else?) of the engine in mind.

If there isn't anything like analysis of opening, middlegame and endgame positions working efficiently compared to other engines, the Celo (Computer- Elo, depending in its objective measurement as for LC0 even more then with other engines on opening book, hardware- time and opponents, even more on the kind of test suites as for that kind of testing) don't count as much to me per se.

Yes, this has nothing to do with boredom. Chess players don't want to be excited by the computer. They want it to help them understand positions of their choice. If the program cannot, it's fine to think it is narrow or deficient. That's not the only way to judge the program, but it is one way. It's best to test it in all ways that make sense.

jp · Post by jp » Sat May 12, 2018 12:58 pm

mhull wrote: ↑Fri May 11, 2018 7:52 pm But people have the CCRL-style testing (stripped/hobbled-engine) deeply ingrained in their thinking and one cannot blast them out of it.

If you're happy for the standard engines to have the opening books of their authors' choice too, then it's okay. Neither side is stripped/hobbled. It's just a different kind of test.

mhull · Post by **mhull** » Sat May 12, 2018 11:57 pm

peter wrote: ↑Sat May 12, 2018 8:53 am
mhull wrote: ↑Sat May 12, 2018 5:33 am But objective testing cannot happen if assuaging human boredom is the priority.
Of course you're right, especially considering the boredom- boundary of different people differ much too.

And of course it's of some thrill too, watching the development of an engine Elo- wise, especially in a way LC0 develops.

What I just wanted to say (in a little witty way) was, that any development of a chess engine, of course as objective as possible (to which postional testing belongs too for me), has to have the "use" and the usefulness for humans (for whom else?) of the engine in mind.

If there isn't anything like analysis of opening, middlegame and endgame positions working efficiently compared to other engines, the Celo (Computer- Elo, depending in its objective measurement as for LC0 even more then with other engines on opening book, hardware- time and opponents, even more on the kind of test suites as for that kind of testing) don't count as much to me per se.

I understand this of course. However, since the beginning of aritifcial NN, the promise (and many times, the delivery) has been that the NN finds what we could not find otherwise, the hidden pattern, the secret of complex interactions. It could find it though we could not understand the mechanics by looking at the NN, it had learned the nuances and found the way foreward.

With chess, what is it that we don't know? How to play like God, that's what. But if I want to study opening XYZ, of what interest is that when the program thinks that's a wrong opening in the first place? You want to use an engine to analyze a wrong opening, that's fine. But what are we doing with gauntlet testing, to find out how good Leela is at studying positions it thinks are the wrong place to be in the first place, like studying the planet Mercury when the program learned that the key to success is found on Europa.

We should be measuring a chess player first, and a utility second, IMHO. People's minds are stuck in cripple-engine mode. The first thing they do when the get an engine is turn off everything except the A/B searcher and then pitting it against other crippled engines. Completely strange. What a waste.

peter wrote: ↑Sat May 12, 2018 8:53 am Hashlearning works quite well for quite a while with modern AB- engines nowadays, yet I must admit, I didn't try much with LC0 in that way of analysing positions, I'll have to give it a better try with Forward- Backward analysing.
Are there any experiences from testers and users about that already, comparing it to "normal" engines?

How about letting it play a gauntlet with no book and the others can have whatever books (and learning) the authors think best for their projects. We could at least compare the results with cripple-gauntlets, though if the error bars are too high anyway, maybe it wouldn't reveal anything. But it should be tried, I think.

jp · Post by jp » Sun May 13, 2018 12:16 am

mhull wrote: ↑Sat May 12, 2018 11:57 pm With chess, what is it that we don't know? How to play like God, that's what. But if I want to study opening XYZ, of what interest is that when the program thinks that's a wrong opening in the first place? You want to use an engine to analyze a wrong opening, that's fine. But what are we doing with gauntlet testing, to find out how good Leela is at studying positions it thinks are the wrong place to be in the first place

This is extremely wrong. The engine is not playing like God. Not even close. The program think's it's wrong is not the same as it is wrong.

mhull · Post by **mhull** » Sun May 13, 2018 5:17 am

jp wrote: ↑Sun May 13, 2018 12:16 am
mhull wrote: ↑Sat May 12, 2018 11:57 pm With chess, what is it that we don't know? How to play like God, that's what. But if I want to study opening XYZ, of what interest is that when the program thinks that's a wrong opening in the first place? You want to use an engine to analyze a wrong opening, that's fine. But what are we doing with gauntlet testing, to find out how good Leela is at studying positions it thinks are the wrong place to be in the first place
This is extremely wrong. The engine is not playing like God. Not even close. The program think's it's wrong is not the same as it is wrong.

I disagree. It is the same thing because it is the goal, training to play better chess than humans or A/B searchers have ever seen, and that means allowing it to play all moves. Then that is what needs to be looked-for in testing, how well does it play chess, not how good is it as a cripple-bot utility, landing it in positions it might otherwise avoid.

It would be like forcing Capablanca to play the Dutch defense all the time and then deciding his Elo based on that. Completely not logical nor an accurate appraisal of strength. Capablanca varied his openings very little. But if he were a machine, you lot would abuse him most abominably, and you cannot deny it.

IMHO.

shrapnel · Post by **shrapnel** » Sun May 13, 2018 5:27 am

Interesting to note that left to itself, LCZero seems to prefer the English Opening every time, at least on my machine.
Also, Performance improves drastically the more Time it is given, more so than the conventional engines.
If I remember correctly, even AlphaZero was given 1 minute/move while thrashing Stockfish.
Maybe Short/Blitz games are simply NOT the forte of LCZero and AlphaZero.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo