LCZero: Progress and Scaling. Relation to CCRL Elo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
jp
Posts: 522
Joined: Mon Apr 23, 2018 5:54 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by jp » Tue May 15, 2018 3:31 pm

mhull wrote:
Tue May 15, 2018 2:08 pm
jp wrote:
Tue May 15, 2018 9:39 am
mhull wrote:
Mon May 14, 2018 1:39 pm
The idea of computer chess is the asymptotic approach to best play. You can't measure that approach if much weaker humans are ALWAYS interjecting their moves into the test. Any resulting Elo measures are contaminated with human moves.
If we calculated human Elo using games composed partly of computer moves, we would call that cheating.
No current computer chess program tells us anything about the asymptotic approach to best play, because they are all so far below best play.
With all due respect, you wouldn't know. Human assessment of how close programs (which are hundreds of Elo better players than them) are approaching best play is likely of no value.
You don't know either. No one knows exactly how bad computers are. But obviously they aren't anywhere near perfect. No human is assessing the actual moves with their own chess skill to say that, so elo is irrelevant.

You want to excuse bad play just because it's from a position that isn't the starting position. Do you think that excuse works for human players? Do you think there's no relation between how well Capablanca would play some random reasonable middlegame or endgame and his strength in a whole game? Do you think he would make a worse annotator of some bozo's game than the bozo because it's not his moves?

Your complaints would be more reasonable if there were no games with the conditions you like.

Seems you'll be unhappy unless all tests you don't like were banned. Just be happy that tests you do like exist.

Albert Silver
Posts: 2821
Joined: Wed Mar 08, 2006 8:57 pm
Location: Rio de Janeiro, Brazil

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Albert Silver » Tue May 15, 2018 3:50 pm

Laskos wrote:
Tue May 15, 2018 3:18 pm
JJJ wrote:
Tue May 15, 2018 2:57 pm
The good news is Leela is back on track. Only 25 elo below his max ! And progress are coming back very fast.
Don't look at their official self-play rating, it is only of some guidance. Look at matches against varied AB opposition. LC0 now is the strogest ever.
You mean as opposed to the normal builds, or are you referring to the normal builds? The LC0-cudnn builds are indeed the strongest, though they only run in machines equipped with Nvidia GPUs.

The self-attributed ratings for the NN are unreliable IMHO. I ran a 300-game match with v10 between NN223 and NN253, and they were about equal (facing each other). NN223 actually pulled fractionally ahead (+8 Elo) but well within the error margins obviously.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."

User avatar
Laskos
Posts: 9043
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Laskos » Tue May 15, 2018 4:02 pm

Albert Silver wrote:
Tue May 15, 2018 3:50 pm
Laskos wrote:
Tue May 15, 2018 3:18 pm
JJJ wrote:
Tue May 15, 2018 2:57 pm
The good news is Leela is back on track. Only 25 elo below his max ! And progress are coming back very fast.
Don't look at their official self-play rating, it is only of some guidance. Look at matches against varied AB opposition. LC0 now is the strogest ever.
You mean as opposed to the normal builds, or are you referring to the normal builds? The LC0-cudnn builds are indeed the strongest, though they only run in machines equipped with Nvidia GPUs.

The self-attributed ratings for the NN are unreliable IMHO. I ran a 300-game match with v10 between NN223 and NN253, and they were about equal (facing each other). NN223 actually pulled fractionally ahead (+8 Elo) but well within the error margins obviously.
I just posted in this thread the result for ID292, it is the strongest ever (the standard v0.10 CPU and GPU build on master).
Last edited by Laskos on Tue May 15, 2018 4:06 pm, edited 1 time in total.

User avatar
mhull
Posts: 12005
Joined: Wed Mar 08, 2006 8:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by mhull » Tue May 15, 2018 4:05 pm

jp wrote:
Tue May 15, 2018 3:31 pm
mhull wrote:
Tue May 15, 2018 2:08 pm
jp wrote:
Tue May 15, 2018 9:39 am

No current computer chess program tells us anything about the asymptotic approach to best play, because they are all so far below best play.
With all due respect, you wouldn't know. Human assessment of how close programs (which are hundreds of Elo better players than them) are approaching best play is likely of no value.
You don't know either. No one knows exactly how bad computers are. But obviously they aren't anywhere near perfect. No human is assessing the actual moves with their own chess skill to say that, so elo is irrelevant.
But that is using the unknown to assess the unknown which only supports the point being made.
jp wrote:
Tue May 15, 2018 3:31 pm
You want to excuse bad play just because it's from a position that isn't the starting position.
I have never excused bad play. You are imposing this view on me to support your view. Since I'm not doing what you assume, your point is empty and void.
jp wrote:
Tue May 15, 2018 3:31 pm
Do you think that excuse works for human players? Do you think there's no relation between how well Capablanca would play some random reasonable middlegame or endgame and his strength in a whole game? Do you think he would make a worse annotator of some bozo's game than the bozo because it's not his moves?
It's fine if your measure is for ability at analysis for arbitrary positions or playing partial games from arbitrary positions. Sure, give the engines a bunch of mate-in-x puzzles to start from and then measure the Elo. But to give an Elo for those abilities is to use Elo for that which it was not designed which is playing full games of chess. Just a fact.
jp wrote:
Tue May 15, 2018 3:31 pm
Your complaints would be more reasonable if there were no games with the conditions you like.Seems you'll be unhappy unless all tests you don't like were banned. Just be happy that tests you do like exist.
Completely wrong, otherwise I would not be making this case. There are no extensive, on-going gauntlets for L0 playing all moves itself across its evolution, tracking its Elo progress. Instead there are only CCRL-style, cripple-bot games in overflowing abundance.
Matthew Hull

jp
Posts: 522
Joined: Mon Apr 23, 2018 5:54 am

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by jp » Tue May 15, 2018 4:19 pm

mhull wrote:
Tue May 15, 2018 4:05 pm
jp wrote:
Tue May 15, 2018 3:31 pm
mhull wrote:
Tue May 15, 2018 2:08 pm
With all due respect, you wouldn't know. Human assessment of how close programs (which are hundreds of Elo better players than them) are approaching best play is likely of no value.
You don't know either. No one knows exactly how bad computers are. But obviously they aren't anywhere near perfect. No human is assessing the actual moves with their own chess skill to say that, so elo is irrelevant.
But that is using the unknown to assess the unknown which only supports the point being made.
No, you're not. You're not using the unknown parts to assess the unknown. Your argument is like saying I can only measure the speed of a car by running alongside it and knowing my own speed.

yanquis1972
Posts: 1762
Joined: Tue Jun 02, 2009 10:14 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by yanquis1972 » Tue May 15, 2018 4:46 pm

anyone have general guidelines for setting up LCZ for tournament play? finally became curious enough to try it, but i couldn't find any explanation of the parameters.

would like to test an 'optimal' vanilla setting vs same for cudnn and/or ensure they're identical (cudnn has several more parameters than the default)

Albert Silver
Posts: 2821
Joined: Wed Mar 08, 2006 8:57 pm
Location: Rio de Janeiro, Brazil

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Albert Silver » Tue May 15, 2018 5:14 pm

Laskos wrote:
Tue May 15, 2018 4:02 pm
Albert Silver wrote:
Tue May 15, 2018 3:50 pm
Laskos wrote:
Tue May 15, 2018 3:18 pm

Don't look at their official self-play rating, it is only of some guidance. Look at matches against varied AB opposition. LC0 now is the strogest ever.
You mean as opposed to the normal builds, or are you referring to the normal builds? The LC0-cudnn builds are indeed the strongest, though they only run in machines equipped with Nvidia GPUs.

The self-attributed ratings for the NN are unreliable IMHO. I ran a 300-game match with v10 between NN223 and NN253, and they were about equal (facing each other). NN223 actually pulled fractionally ahead (+8 Elo) but well within the error margins obviously.
I just posted in this thread the result for ID292, it is the strongest ever (the standard v0.10 CPU and GPU build on master).
If confirmed, that will be very promising as it will clearly indicates that not only was the rut the neural network was in caused by the bug but that it is also past it and finally making genuine progress.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."

yanquis1972
Posts: 1762
Joined: Tue Jun 02, 2009 10:14 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by yanquis1972 » Tue May 15, 2018 6:36 pm

i expect slow progress, myself. i've watched about a hundred games or fragments & while leela's play out of the opening is excellent, she loses the plot in the middlegame. her endgame evals are clueless but the play itself nevertheless manages to be interesting.

i think one of the reasons (& i hope, the only reason) for this is obvious; literally every game she's played has been from the opening. my guess is that will be resolved with volume (lots & lots of volume).

my worry is the tactics. she will what looks to be an overwhelming attack but doesn't execute on it. this is where i'm hoping my total ignorance of the process has me completely wrong in worrying that this is an inherent problem that google was able to overcome by testing on their hardware & conditions.

i assume the theory is that there are consistent patterns to attack but she just hasn't played enough of the correct moves to have learned them yet.

User avatar
Guenther
Posts: 2815
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by Guenther » Tue May 15, 2018 6:53 pm

Albert Silver wrote:
Tue May 15, 2018 5:14 pm
Laskos wrote:
Tue May 15, 2018 4:02 pm
Albert Silver wrote:
Tue May 15, 2018 3:50 pm


You mean as opposed to the normal builds, or are you referring to the normal builds? The LC0-cudnn builds are indeed the strongest, though they only run in machines equipped with Nvidia GPUs.

The self-attributed ratings for the NN are unreliable IMHO. I ran a 300-game match with v10 between NN223 and NN253, and they were about equal (facing each other). NN223 actually pulled fractionally ahead (+8 Elo) but well within the error margins obviously.
I just posted in this thread the result for ID292, it is the strongest ever (the standard v0.10 CPU and GPU build on master).
If confirmed, that will be very promising as it will clearly indicates that not only was the rut the neural network was in caused by the bug but that it is also past it and finally making genuine progress.
You know that there will be a rollback soon?
Current foe list count : [83]
http://rwbc-chess.de/chronology.htm

yanquis1972
Posts: 1762
Joined: Tue Jun 02, 2009 10:14 pm

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Post by yanquis1972 » Tue May 15, 2018 7:03 pm

did want to add that just after i posted that, i watched a game against rybka 3 that quickly boiled down to an early endgame. rybka evaluated drawish & stayed there, leela +2 or so (ended drawn). i realized the answer there is pretty obvious too; not every game contains an endgame, but most would've, & the large majority probably ended decisively.

also forgot to mention the other hardware aspect to the tactical problem; while we're waiting for millions of games & hoping she stumbles upon the solution often enough to learn it, i'm guessing google used training h/w that could calculate several orders beyond what lc0 does. but i'm hopefully wrong & it was volume-focused.

stumbled on this graph (re strength vs stockfish based on time per move) which is interesting as well
Image

Post Reply