LCZero Accomplishments and Goals Thus Far

Jhoravi · Post by **Jhoravi** » Thu May 03, 2018 1:06 pm

A sea lion that swims fast under water can also walk on land and we know how it struggles to walk even just few inches. Just because he can walk doesn't mean its fair to race him against your dogs. The same way if your dog can swim doesn't mean it's fair to make a swimming race against a sea lion.

hgm · Post by **hgm** » Thu May 03, 2018 3:17 pm

Well, suppose I am racing a seal for 400m then. Would it be fair if I required the race to be conducted on land? Would it be fair if the race was conducted in the water?

[edit] Ah, I see that the poster before me independently came to the same analogy!

Robert Flesher · Post by **Robert Flesher** » Thu May 03, 2018 3:41 pm

jp wrote:
mhull wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
Your demand for uniform platform comparison is commutative. Why not demand all the other engines run on a GPU?

Then it would be "equal".
So if I teach my pet frog to play chess, I should demand you play it underwater rather than on land, because demands should be "commutative"?

LOL!

Albert Silver · Post by **Albert Silver** » Thu May 03, 2018 3:43 pm

Jhoravi wrote:A sea lion that swims fast under water can also walk on land and we know how it struggles to walk even just few inches. Just because he can walk doesn't mean its fair to race him against your dogs. The same way if your dog can swim doesn't mean it's fair to make a swimming race against a sea lion.

Thus spoke La Fontaine.

Evert · Post by **Evert** » Thu May 03, 2018 3:59 pm

Albert Silver wrote: Remember that LCZ first began self-learning in February. It is now over 2900 CCRL. That is quite incredible if you ask me.

I wonder if trying to express the strength of Leela "on CCRL scale" isn't grosly misleading.
CCRL builds a rating list under specific conditions, which are scaled/tweaked in a specific way to correct for different hardware. That's fine as long as every one uses the same hardware. For Leela, that's not the case. So in a sense, you cannot meaningfully express its strength in "CCRL Elo".

To meaningfully compare the strength of Leela to, say, Stockfish, one first has to agree on testing conditions for both engines (say, time control scaled with NPS for Stockfish, and scaled with FLOPS for Leela).

Dariusz Orzechowski · Post by **Dariusz Orzechowski** » Thu May 03, 2018 4:36 pm

Albert Silver wrote:
MonteCarlo wrote:Well, the search algorithm plays the move with the most visits.

The initial visit is made to the move with the highest probability from the policy head.

With a 1 visit "search", the search will evaluate the position after the move with the highest probability from the policy head, but no matter what the evaluation is, it will play that move.

It has 1 visit, and everything else has 0.

So you're literally just playing against the policy head, as he indicated he liked to do
So it is essentially just evaluating the position with the highest value move according to its policy. Ok, well, needless to say, if you reduce the time to the human to impossible controls such as one minute or less with no increment, the human will lose in all likelihood at some juncture, but if that is qualifying the 'pure network' as 2000 or GM, we can say it is already a GM at g/10s. On the other hand, if you set the lower limit to g/5 (minutes) and use this 'pure network', it will never reach GM level, due to the tactics. Unless you handpick the GM I suppose...

You can play against pure network by choosing Easy mode on play.lczero.org and see for yourself. I watched on youtube player called kingscrusher playing against LCZ on Easy and using few minutes per game (so rather blitz than bullet TC). Kingscrusher strength is around 2200 I think (he is a FIDE CM) and it was not an easy task for him to dispatch LCZ network, he even lost at least one game IIRC. Hence my rough estimation of LCZ pure network level to be about 2000.

As for tactics, a neural network can learn to recognize tactical motifs and LCZ is learning albeit slowly. I took network #125 (one that played in TCEC recently) and a more recent #234 and run them on ECM tactical test suite (879 positions).

Results on 1 playout (for comparison: SF9 depth 1 scores 124/879):

Code: Select all

#125   111/879
#234   158/879

I also checked how it changes for net #234 with more playouts:

Code: Select all

158/879  p 1
171/879  p 10
208/879  p 100
268/879  p 1000

My guess is that with more training and bigger network, a lot of tactics will be recognized on the spot and it may be enough to get to GM level with just 1 playout. Having such strong priors, search will solve even more tactics and it may be enough for 3000+ engines. In other aspects of game (except endgame), LCZ is already there I think.

Albert Silver · Post by **Albert Silver** » Thu May 03, 2018 4:40 pm

Evert wrote:
Albert Silver wrote: Remember that LCZ first began self-learning in February. It is now over 2900 CCRL. That is quite incredible if you ask me.
I wonder if trying to express the strength of Leela "on CCRL scale" isn't grosly misleading.
CCRL builds a rating list under specific conditions, which are scaled/tweaked in a specific way to correct for different hardware. That's fine as long as every one uses the same hardware. For Leela, that's not the case. So in a sense, you cannot meaningfully express its strength in "CCRL Elo".

To meaningfully compare the strength of Leela to, say, Stockfish, one first has to agree on testing conditions for both engines (say, time control scaled with NPS for Stockfish, and scaled with FLOPS for Leela).

You're right of course that you cannot make a straightforward comparison. So how about this? Leela is 2900 CCRL when paired with a GTX 1060. The CPU and the rest is identical so that's the only difference. And since Leela is designed to to make use of a GPU, without which it is quite clearly crippled, there's also little point in making a straight CPU to CPU comparison. Unless the purpose is to show how necessary a GPU is to Leela.

Albert Silver · Post by **Albert Silver** » Thu May 03, 2018 5:13 pm

Dariusz Orzechowski wrote:
Albert Silver wrote:
MonteCarlo wrote:Well, the search algorithm plays the move with the most visits.

The initial visit is made to the move with the highest probability from the policy head.

With a 1 visit "search", the search will evaluate the position after the move with the highest probability from the policy head, but no matter what the evaluation is, it will play that move.

It has 1 visit, and everything else has 0.

So you're literally just playing against the policy head, as he indicated he liked to do
So it is essentially just evaluating the position with the highest value move according to its policy. Ok, well, needless to say, if you reduce the time to the human to impossible controls such as one minute or less with no increment, the human will lose in all likelihood at some juncture, but if that is qualifying the 'pure network' as 2000 or GM, we can say it is already a GM at g/10s. On the other hand, if you set the lower limit to g/5 (minutes) and use this 'pure network', it will never reach GM level, due to the tactics. Unless you handpick the GM I suppose...
You can play against pure network by choosing Easy mode on play.lczero.org and see for yourself. I watched on youtube player called kingscrusher playing against LCZ on Easy and using few minutes per game (so rather blitz than bullet TC). Kingscrusher strength is around 2200 I think (he is a FIDE CM) and it was not an easy task for him to dispatch LCZ network, he even lost at least one game IIRC. Hence my rough estimation of LCZ pure network level to be about 2000.

As for tactics, a neural network can learn to recognize tactical motifs and LCZ is learning albeit slowly. I took network #125 (one that played in TCEC recently) and a more recent #234 and run them on ECM tactical test suite (879 positions).

Results on 1 playout (for comparison: SF9 depth 1 scores 124/879):
Code: Select all
#125   111/879
#234   158/879
I also checked how it changes for net #234 with more playouts:
Code: Select all
158/879  p 1
171/879  p 10
208/879  p 100
268/879  p 1000
My guess is that with more training and bigger network, a lot of tactics will be recognized on the spot and it may be enough to get to GM level with just 1 playout. Having such strong priors, search will solve even more tactics and it may be enough for 3000+ engines. In other aspects of game (except endgame), LCZ is already there I think.

As mentioned in a reply to someone else, the real question of course is the time control. If you make it fast enough it will beat anybody even now. But at 5 minutes it's a different story since a good player should be able to bring in tactics at multiple ply levels which can swing things radically. That said, I did just test LCZ in Easy Mode in a 5-minute game and though I didn't have too much trouble beating it, I was forced to think a lot more and be more attentive then I had expected so my pessimism about it's 1 playout strength is misplaced. I don't think it's a 2000 yet, at least not in five minute games, but it plays a lot better than I expected and it's 'instinctual' positional moves compensate for a lot.

mhull · Post by **mhull** » Thu May 03, 2018 8:37 pm

jp wrote:
mhull wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
Your demand for uniform platform comparison is commutative. Why not demand all the other engines run on a GPU?

Then it would be "equal".
So if I teach my pet frog to play chess, I should demand you play it underwater rather than on land, because demands should be "commutative"?

You don't demand the frog should use equal hardware though, which is the point.

jp · Post by jp » Thu May 03, 2018 8:50 pm

mhull wrote:
jp wrote:...
So if I teach my pet frog to play chess, I should demand you play it underwater rather than on land, because demands should be "commutative"?
You don't demand the frog should use equal hardware though, which is the point.

No, that's not right. Neither you nor my frog are allowed to use any hardware. That would be cheating. My frog can play in two playing environments, but not equally comfortably. You can only play in one. If there's to be a match, which environment should it be in?

Other people desire other animal matches, though.

LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far