You can add him to your ignore (foe) list. I have done this very soon after his first posts.
LCZero: Progress and Scaling. Relation to CCRL Elo
Moderator: Ras
- 
				Guenther
 - Posts: 4718
 - Joined: Wed Oct 01, 2008 6:33 am
 - Location: Regensburg, Germany
 - Full name: Guenther Simon
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
- 
				mar
 - Posts: 2668
 - Joined: Fri Nov 26, 2010 2:00 pm
 - Location: Czech Republic
 - Full name: Martin Sedlak
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Well I'm playing some test games with ID303 and so far (20 games played) it seems not 100 elo stronger than ID 24x I played last time, but rather 100 elo weaker....Laskos wrote: ↑Thu May 17, 2018 12:22 am Yes, the same for ID302 compared to ID292, no improvement (well, within error margins, so there is maybe at most 20 Elo points improvement). They see 130 Elo points improvement in self-games. Either something is wrong with my testing, or again something is fishy in their framework.
Still too early to draw conclusions, but 25% after 20 games when I expected Leela to be on par with Cheng according to their elo graph, so far a disappointment.
Note that I'm using 40 moves in 2 min now so the TC should be better for Leela than 40/1min I played before (note it's still the official OpenCL-based engine).
What exactly does their elo graph show anyway? Do they run regression tests from time to time or is it just delta from the previous version?
If so then that's pretty much random and useless if improvements are small.
Anyway, always the same story with Leela: blundering random moves like crazy,
losing to shallow tactics. I even saw Leela blunder twice in a single game, first throwing away a win then wasting a draw
- no way they can compete with the top dogs with this approach on consumer HW (not to mention that current SF should be on par with A0 elo-wise on Google HW).
I plan to play 200 games to get a rough idea of how strong the current engine + net is, I'll post the results here.
- 
				Laskos
														 - Posts: 10948
 - Joined: Wed Jul 26, 2006 10:21 pm
 - Full name: Kai Laskos
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
So, some sort of confirmation, both on overall strength and on tactics. I was wondering about the validity of my test against one AB engine, LC0 being on CPU and with pretty low number of playouts. The Elo graph is here, since the first bigger net ID227:mar wrote: ↑Thu May 17, 2018 8:20 amWell I'm playing some test games with ID303 and so far (20 games played) it seems not 100 elo stronger than ID 24x I played last time, but rather 100 elo weaker....Laskos wrote: ↑Thu May 17, 2018 12:22 am Yes, the same for ID302 compared to ID292, no improvement (well, within error margins, so there is maybe at most 20 Elo points improvement). They see 130 Elo points improvement in self-games. Either something is wrong with my testing, or again something is fishy in their framework.
Still too early to draw conclusions, but 25% after 20 games when I expected Leela to be on par with Cheng according to their elo graph, so far a disappointment.
Note that I'm using 40 moves in 2 min now so the TC should be better for Leela than 40/1min I played before (note it's still the official OpenCL-based engine).
What exactly does their elo graph show anyway? Do they run regression tests from time to time or is it just delta from the previous version?
If so then that's pretty much random and useless if improvements are small.
Anyway, always the same story with Leela: blundering random moves like crazy,
losing to shallow tactics. I even saw Leela blunder twice in a single game, first throwing away a win then wasting a draw
- no way they can compete with the top dogs with this approach on consumer HW (not to mention that current SF should be on par with A0 elo-wise on Google HW).
I plan to play 200 games to get a rough idea of how strong the current engine + net is, I'll post the results here.

Red lines are one standard deviation. There seem to have been an improvement, but I guess there are still critical bugs in their engine v0.10. They are very careless adding 100+ commits since v0.7, without any proper testing.
They see in the last 2 datapoints a 130 Elo points progress, I see no progress at all. They don't seem to run regression tests, and are just comparing to previous version with "freezing temperature", if I understood. Never mind that these small "gains" could be almost orthogonal taken successively, so all in all add to nothing in a regression test.
We are in agreement also on easy tactics: it is worse now with ID302 than with the initial ID227. I used Albert's cleaned WAC201.epd tactical suite
6s/position on 4 CPU threads, equivalent to 1s/position on GTX 1060:
ID227
score=84/201 [averages on correct positions: depth=11.1 time=0.96 nodes=178]
ID302
score=74/201 [averages on correct positions: depth=11.3 time=1.21 nodes=190]
So, even if it gained Elo points since ID227, the easy tactics is even worse. I think they have to roll-back to a less buggy engine, say v0.7 and older nets, and then accept commits after severe vetting and testing (more or less SF framework).
- 
				Guenther
 - Posts: 4718
 - Joined: Wed Oct 01, 2008 6:33 am
 - Location: Regensburg, Germany
 - Full name: Guenther Simon
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
I have been running my own tests since April. ID 303 is currently being tested (ID 100 will be added later)
A google spreadsheet with details/graph and conditions(+games) is prepared but not ready yet for publishing.
Each LCZero version always plays 10*30 games vs. the same 10 opponents.
Each 30 games batch is randomly played from a small ~1200 3 moves pgn with reversed colours.
TC always is 5+5 vs. 2+1, thus a timeodds of around 3.5-4.5:1 in favour of LCZero
to mimic a better gpu. Current used gpu is very weak but not old. (with current net size around 70-80nps)
Actually I bought it for around 30€ and the reason for it was, that it is cooled passive (no fan) and thus absolutely silent.
(The one before slowly died with a hell of a noise sometimes due to a damaged fan)
Below is the current CCRL40/4 calibrated result calculated with ordo with 400 simuls.
A little note:
Counter 1.2-64 is still and always was outside the err window.
This was the reason why I asked for it being 32 or 64 bit in CCRL.
(the result was that it was able to run both ways, but this was not
distinguished in the ratings)
http://talkchess.com/forum3/viewtopic.php?f=2&t=67250
This means the CCRL rating for Counter_12-64 should be likely a bit higher,
and this would shift ratings for all LCZero entities a bit higher in comparison.
			
			
									
						
										
						A google spreadsheet with details/graph and conditions(+games) is prepared but not ready yet for publishing.
Each LCZero version always plays 10*30 games vs. the same 10 opponents.
Each 30 games batch is randomly played from a small ~1200 3 moves pgn with reversed colours.
TC always is 5+5 vs. 2+1, thus a timeodds of around 3.5-4.5:1 in favour of LCZero
to mimic a better gpu. Current used gpu is very weak but not old. (with current net size around 70-80nps)
Actually I bought it for around 30€ and the reason for it was, that it is cooled passive (no fan) and thus absolutely silent.
(The one before slowly died with a hell of a noise sometimes due to a damaged fan)
Below is the current CCRL40/4 calibrated result calculated with ordo with 400 simuls.
A little note:
Counter 1.2-64 is still and always was outside the err window.
This was the reason why I asked for it being 32 or 64 bit in CCRL.
(the result was that it was able to run both ways, but this was not
distinguished in the ratings)
http://talkchess.com/forum3/viewtopic.php?f=2&t=67250
This means the CCRL rating for Counter_12-64 should be likely a bit higher,
and this would shift ratings for all LCZero entities a bit higher in comparison.
Code: Select all
#       PLAYER          RATING   ERROR  POINTS  PLAYED  (%)     CCRL 40/4(1)    CCRL 40/40(2)   Diff 1  Diff 2
1       Chronos_197     2631.48  49.51   73.0   150     48.7    2639            2639            -7.52    -7.52
2       Counter_12-64   2505.12  50.05   49.5   150     33.0    2446            2468            59.12    37.12
3       Danasah_70      2592.51  49.77   65.5   150     43.7    2596            2611            -3.49   -18.49
4       Glaurung_201-64 2720.07  51.65   90.0   150     60.0    2740            2745           -19.93   -24.93
5       Hermann_25-64   2510.87  50.68   50.5   150     33.7    2512            2496            -1.13    14.87
6       Jellyfish_11-64 2628.90  49.21   72.5   150     48.3    2608            2577            20.90    51.90
7       LCZero_07ID125  2509.54* 34.74  109.5   300     36.5    *               *                *       *
8       LCZero_07ID150  2518.65* 35.74  113.0   300     37.7    *               *                *       *
9       LCZero_07ID181  2669.39* 35.10  174.0   300     58.0    *               *                *       *
10      LCZero_07ID231  2740.88* 35.92  201.5   300     67.2    *               *                *       *
11      LCZero_010ID254 2767.63* 36.07  211.0   300     70.3    *               *                *       *
12      LCZero_010ID303 *        *      *       *       *       *               *                *       *
13      Monolith_04-64  2574.05  48.82   62.0   150     41.3    2597            2591            -22.95  -16.95
14      Rodent_10-64    2683.21  48.12   83.0   150     55.3    2692            2677             -8.79    6.21
15      Rotor_08        2613.37  45.95   69.5   150     46.3    2612            2628              1.37  -14.63
16      Tucano_400-64   2644.39  48.60   75.5   150     50.3    2662            2664            -17.61  -19.61
---------------------------------------------------------------------------------------------------------------
Gauntlet Opp Rating     2610.40                                 2610.40         2609.60           0.00    0.80
                        avg                                     adapted avg     avg               avg     avg   
- 
				Guenther
 - Posts: 4718
 - Joined: Wed Oct 01, 2008 6:33 am
 - Location: Regensburg, Germany
 - Full name: Guenther Simon
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
They did a very few regression tests in the past and only one lately. (233 vs. 292)
http://lczero.org/matches
Anyhow as you have noticed and what is mentioned since long, the SP ratings are quite meaningless for various reasons.
Guenther
- 
				jp
 - Posts: 1485
 - Joined: Mon Apr 23, 2018 7:54 am
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
I have to check. David Xu, are you asking me that question? If you're not, please ignore the following.
If you are, I don't know why you resort to personal attacks.
Do you realize that once someone else wrongly grouped me with you and attacked me for your views? I didn't see you reply then to attack him or to tell him it was your views he was attacking, not mine. Another person attacked you then, again without you responding. Why not?
I have never attacked you. I have never attacked anyone here.
So you decide you have to butt in to a conversation with yanquis1972 and Albert and attack me?
May I ask what your special qualifications are?
You appear to be extremely intolerant of anyone saying anything you don't like, even if they are not speaking to you and even if they don't know you don't like what they say.
- 
				Milos
 - Posts: 4190
 - Joined: Wed Nov 25, 2009 1:47 am
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
You tell a typical anonymous troll to add someone to ignore list, gee.
That David Xu guy posted in total 40 posts on this forum out of which 38 are oneliners, and in most of those he just calls ppl names, stalks them, and posts meaningless BS. He is someone that is the best recommendation for ignore list.
Your judgement of ppl is problematic at best.
- 
				noobpwnftw
 - Posts: 694
 - Joined: Sun Nov 08, 2015 11:10 pm
 - Full name: Bojun Guo
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
As far as I can tell, the last claim he made against me was the performance saturation can be nowhere close, now with the reality came should I enjoy returning the favor because "facts don't care about your feelings"?Milos wrote: ↑Thu May 17, 2018 11:56 amYou tell a typical anonymous troll to add someone to ignore list, gee.
That David Xu guy posted in total 40 posts on this forum out of which 38 are oneliners, and in most of those he just calls ppl names, stalks them, and posts meaningless BS. He is someone that is the best recommendation for ignore list.
Your judgement of ppl is problematic at best.
Back to the topic, LC0 is doing okay, and it seems not likely to get another +400 real world ELO on an average hardware just by tossing more games into it, Zuck's team probably demonstrated that with a reasonable amount of hardware in the NN realm.
- 
				jkiliani
 - Posts: 143
 - Joined: Wed Jan 17, 2018 1:26 pm
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Most recent commits are either changes to the lc0 implementation with multiple backends for neural net evaluation, bugfixes to original lczero, or diagnostic or server features. Commits that directly affect play are already handled much more conservatively now compared to a few weeks ago.Laskos wrote: ↑Thu May 17, 2018 9:10 am Red lines are one standard deviation. There seem to have been an improvement, but I guess there are still critical bugs in their engine v0.10. They are very careless adding 100+ commits since v0.7, without any proper testing.
They see in the last 2 datapoints a 130 Elo points progress, I see no progress at all. They don't seem to run regression tests, and are just comparing to previous version with "freezing temperature", if I understood. Never mind that these small "gains" could be almost orthogonal taken successively, so all in all add to nothing in a regression test.
The discrepancies of self-play Elo to your testing could also stem from different methods: Afaik you test with opening books, is that correct? Self-play matches do not use a book, instead temperature (determining the chance to pick a move that did not receive the most visits) is used, mostly in the opening and much less later in game. That means that any new opening knowledge discovered, for instance which lines to prefer or to avoid, will be measured by self-play Elo but entirely missed by testing which uses a fixed book instead.
- 
				main line
 - Posts: 60
 - Joined: Thu Jul 07, 2016 10:15 pm
 
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
What happens? Can Lczero beats a human?noobpwnftw wrote: ↑Thu May 17, 2018 12:18 pmAs far as I can tell, the last claim he made against me was the performance saturation can be nowhere close, now with the reality came should I enjoy returning the favor because "facts don't care about your feelings"?Milos wrote: ↑Thu May 17, 2018 11:56 amYou tell a typical anonymous troll to add someone to ignore list, gee.
That David Xu guy posted in total 40 posts on this forum out of which 38 are oneliners, and in most of those he just calls ppl names, stalks them, and posts meaningless BS. He is someone that is the best recommendation for ignore list.
Your judgement of ppl is problematic at best.
Back to the topic, LC0 is doing okay, and it seems not likely to get another +400 real world ELO on an average hardware just by tossing more games into it, Zuck's team probably demonstrated that with a reasonable amount of hardware in the NN realm.