LCZero: Progress and Scaling. Relation to CCRL Elo

jp · Post by jp » Thu May 10, 2018 5:27 pm

Albert Silver wrote: ↑Thu May 10, 2018 4:44 pm I think you need to reread what I wrote since I said nothing of the sort.

I know what you wrote. I did not mean you think LC0 is human-like.
I just put your name with Nay Lin Tun because of your:
"How often do we read about choosing the path of least resistance even if the most 'precise' move, which would allow all sorts of dangerous possibilities, is discarded?"

That's irrelevant to LC0 play or LC0 training but relevant to humans.

Perfectionism, which you talk about too, is not relevant to LC0 or human play or training, because both are so far below perfect.

That's all. I'm not suggesting you think they are. Just saying they're not.

Henk · Post by **Henk** » Thu May 10, 2018 5:40 pm

If LCZero has progress Log(x) then they better stop for it is a waste of resources.

Albert Silver · Post by **Albert Silver** » Thu May 10, 2018 6:01 pm

jp wrote: ↑Thu May 10, 2018 5:27 pm
Albert Silver wrote: ↑Thu May 10, 2018 4:44 pm I think you need to reread what I wrote since I said nothing of the sort.
I know what you wrote. I did not mean you think LC0 is human-like.
I just put your name with Nay Lin Tun because of your:
"How often do we read about choosing the path of least resistance even if the most 'precise' move, which would allow all sorts of dangerous possibilities, is discarded?"

That's irrelevant to LC0 play or LC0 training but relevant to humans.

Perfectionism, which you talk about too, is not relevant to LC0 or human play or training, because both are so far below perfect.

That's all. I'm not suggesting you think they are. Just saying they're not.

Perfectionism here is about positions that are winning with multiple solutions, some more expedient than others. If LC0 (or any engine or person) fails to find the 'best' move, and instead opts for another path to victory, then there are two things:

1) It still won, and that is what matters ultimately
2) the position is a very poor choice to test with if it allows many choices.

Consider this position from a Leela test game today:

The winning move is 91.e5!! and 91.Ke6?? would be a draw. 91.Ke6 is a definite blunder as it changes the outcome of the game. However, any white bishop move is still winning since e5 can be played without a problem 1-2 moves later, so one cannot claim a 'failure' if an engine does play one of those bishop moves instead. That is the point.

I will repeat it a last time: I am not defending Leela, I am criticizing the validity of WAC as a measuring tool.

jp · Post by jp » Thu May 10, 2018 6:28 pm

Albert Silver wrote: ↑Thu May 10, 2018 6:01 pm
jp wrote: ↑Thu May 10, 2018 5:27 pm Perfectionism, which you talk about too, is not relevant to LC0 or human play or training, because both are so far below perfect.
Perfectionism here is about positions that are winning with multiple solutions, some more expedient than others. If LC0 (or any engine or person) fails to find the 'best' move, and instead opts for another path to victory, then there are two things:

1) It still won, and that is what matters ultimately
2) the position is a very poor choice to test with if it allows many choices.

Consider this position from a Leela test game today:
...
I will repeat it a last time: I am not defending Leela, I am criticizing the validity of WAC as a measuring tool.

Sure. I don't think anyone is defending or attacking Leela here.
Kai's result suggests WAC is still useful as a measuring tool.

In your example, if it finds e5 later, then it's finding it slower.

peter · Post by **peter** » Thu May 10, 2018 6:34 pm

Albert Silver wrote: ↑Thu May 10, 2018 6:01 pm I will repeat it a last time: I am not defending Leela, I am criticizing the validity of WAC as a measuring tool.

And I'd say, that depends on what you're gone measure with it.

Absolute overall playing strenght? No good test for that.
Absolute overall tactical playing strength? No test good enough for such a high aim neither.

Number of solutions in a given time compared to other engines? Yes, why not?
If you don't like the one or the other one position, simply leave it out.

I would e.g. let your example nr.2 be called critical enough to say next best move is good enough too for a sure win, best move solution maybe isn't provable winning so much sooner then next best one (that would be the Point to prove if you have to have standards high enough and yet practcable) to call it unique as for your definition, but then it's still only 140, 150, 161 out of 299, isn't it?
And other engines still have much better results and still "solve" nr.2 as an extra benefit.

If you really want to accept best move test positions with only single winning moves without any other candidate move that could at any game- length sometimes win too, you won't get 300 positions of such an easy but yet somewhat selective level very soon.

If you have a better tactical test suite (of course there are may of them of higher difficulty, pity on my database I have only such ones stored) LC0 has any chance to solve some of ist positions in reasonable TC, let me know.

BTW, I yet tried LC0 at "Marathon" with 10" per move too, because I don't think 15 or 20" will change much and I don't want to run easy tactical sets with minutes per move.
Marathon,that's the suite, that comes along with Fritz- GUIs since several versions of them.

19 out of 210 LC0 got right (if you call given solutions right too).

Albert Silver · Post by **Albert Silver** » Thu May 10, 2018 6:52 pm

peter wrote: ↑Thu May 10, 2018 6:34 pm
Albert Silver wrote: ↑Thu May 10, 2018 6:01 pm I will repeat it a last time: I am not defending Leela, I am criticizing the validity of WAC as a measuring tool.
And I'd say, that depends on what you're gone measure with it.

Absolute overall playing strenght? No good test for that.
Absolute overall tactical playing strength? No test good enough for such a high aim neither.

Number of solutions in a given time compared to other engines? Yes, why not?
If you don't like the one or the other one position, simply leave it out.

I would e.g. let your example nr.2 be called critical enough to say next best move is good enough too for a sure win, best move solution maybe isn't provable winning so much sooner then next best one (that would be the Point to prove if you have to have standards high enough and yet practcable) to call it unique as for your definition, but then it's still only 140, 150, 161 out of 299, isn't it?

Is it? I took the very first two positions and found them problematic. Just the very first two. I did not look closely at the 298 others. For me, such an inauspicious start casts serious doubt on the rest.

If you have a better tactical test suite (of course there are may of them of higher difficulty, pity on my database I have only such ones stored) LC0 has any chance to solve some of ist positions in reasonable TC, let me know.

BTW, I yet tried LC0 at "Marathon"with 10" per move yet.
That the suite, that comes along with Fritz- GUIs since several Versions of them.
19 out of 210 LC0 got right (as for given solutions).

I have not used tactical tests in many years for engines. Admittedly, this is because they got too strong for them as a rule. I'll take a look at WAC and prune out all the multi-solutions, and see what is left. I'm guessing some 200 out of 300 will survive.

Albert Silver · Post by **Albert Silver** » Thu May 10, 2018 6:57 pm

jp wrote: ↑Thu May 10, 2018 6:28 pm
Albert Silver wrote: ↑Thu May 10, 2018 6:01 pm
jp wrote: ↑Thu May 10, 2018 5:27 pm Perfectionism, which you talk about too, is not relevant to LC0 or human play or training, because both are so far below perfect.
Perfectionism here is about positions that are winning with multiple solutions, some more expedient than others. If LC0 (or any engine or person) fails to find the 'best' move, and instead opts for another path to victory, then there are two things:

1) It still won, and that is what matters ultimately
2) the position is a very poor choice to test with if it allows many choices.

Consider this position from a Leela test game today:
...
I will repeat it a last time: I am not defending Leela, I am criticizing the validity of WAC as a measuring tool.
Sure. I don't think anyone is defending or attacking Leela here.
Kai's result suggests WAC is still useful as a measuring tool.

In your example, if it finds e5 later, then it's finding it slower.

That's my point: it might not be finding it later, it might just be choosing to play it later, which is very different. it might see e5, and see there is no way of preventing it, and somehow decide that a bishop move first improves its evaluation by 0.01 pawns on some invisible evaluation scale. This is common in all engines, and I have seen tons of examples of some unimportant zwischenzug thrown in before the final killing blow.

For the record, Leela played e5 in a second.

jp · Post by jp » Thu May 10, 2018 7:44 pm

Albert Silver wrote: ↑Thu May 10, 2018 6:57 pm
That's my point: it might not be finding it later, it might just be choosing to play it later, which is very different. it might see e5, and see there is no way of preventing it, and somehow decide that a bishop move first improves its evaluation by 0.01 pawns on some invisible evaluation scale. This is common in all engines, and I have seen tons of examples of some unimportant zwischenzug thrown in before the final killing blow.

For the record, Leela played e5 in a second.

So we'd really need to look at the PVs. Is that possible with Leela?
But that's also true of normal engines. Do you think WAC does not rank normal engines accurately in practice?

peter · Post by **peter** » Thu May 10, 2018 7:55 pm

Albert Silver wrote: ↑Thu May 10, 2018 6:52 pm I have not used tactical tests in many years for engines. Admittedly, this is because they got too strong for them as a rule. I'll take a look at WAC and prune out all the multi-solutions, and see what is left. I'm guessing some 200 out of 300 will survive.

I normally don't use suites by myself neither, as I wrote already too, I prefer testing engine- strength position- dependent only always also, just because any statistical measurment itself depends on the single test- positions, and as for eng-eng-games it's just the same.

Letting engines play against each other from certain starting positions, even from very early ones, always tests the starting-positions too, the nearer to the single one initial chess position the test positions get, no matter if from sets or books, the nearer to a test of this single initial position and the bookless match it gets.

If you prune out all the solutions of any tactical test suite you want to make it more selective for your special measurement-aim, you'll get the more selective results, finally ending up at more and more positional- depending testing of course.

Thanks in advance, if you do this work for WAC, still it's the only one suite I know, easy enough for LC0.
As a matter of fact, I didn't do that work on my own as for more then only a few positions of it neither, because it's simply a very old classical one, of course discussed very much and often since quite a long time.

Let's then talk about this and that one position, you doubt or you are sure about, and see, if there won't me or anybody else still find some of them, in which LC0 maybe even "succeeds", but yet maybe for the wrong reasons, giving output- lines with incorrect moves in follow- up plies or quite wrong evals.

That's the way I like to discuss engines' achievements as for certain single positions compared to other engines, and then you can use pracitically any single position of interest, as for tactical or as for "positional" play, if any such differentiation is still any good for nowadays engines at all anyhow.

And then you don't need to stick to single best move positions of undoubtable unique "solution" at all, but then you yet have to define your standards and measurements as for each and any position in question yet still.

Overall playing strenght, tactical or "positional," is an Illusion anyhow in any kind of testing, rated in Elo (Celo) out of engine-engine-games or in numbers of positions of a suite solved or not solved.
I like to call it Elosion.

Albert Silver · Post by **Albert Silver** » Thu May 10, 2018 9:20 pm

jp wrote: ↑Thu May 10, 2018 7:44 pm
Albert Silver wrote: ↑Thu May 10, 2018 6:57 pm
That's my point: it might not be finding it later, it might just be choosing to play it later, which is very different. it might see e5, and see there is no way of preventing it, and somehow decide that a bishop move first improves its evaluation by 0.01 pawns on some invisible evaluation scale. This is common in all engines, and I have seen tons of examples of some unimportant zwischenzug thrown in before the final killing blow.

For the record, Leela played e5 in a second.
So we'd really need to look at the PVs. Is that possible with Leela?

Of course, why would you think it wasn't?

But that's also true of normal engines. Do you think WAC does not rank normal engines accurately in practice?

WAC is like an IQ test: it does not test intelligence, but the ability to score in IQ tests. In this case it is a bit of a lithmus test for easy tactics, but only so long as those positions have a concrete solution. If it has three winning moves (for example), then you are not only testing if it finds one of the three, but which one. As far as I am concerned, a proper tactics test does not allow for multiple solutions.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo