You can add him to your ignore (foe) list. I have done this very soon after his first posts.
LCZero: Progress and Scaling. Relation to CCRL Elo
Moderators: hgm, Rebel, chrisw
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
https://rwbc-chess.de
trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
-
- Posts: 2559
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Well I'm playing some test games with ID303 and so far (20 games played) it seems not 100 elo stronger than ID 24x I played last time, but rather 100 elo weaker....Laskos wrote: ↑Thu May 17, 2018 12:22 am Yes, the same for ID302 compared to ID292, no improvement (well, within error margins, so there is maybe at most 20 Elo points improvement). They see 130 Elo points improvement in self-games. Either something is wrong with my testing, or again something is fishy in their framework.
Still too early to draw conclusions, but 25% after 20 games when I expected Leela to be on par with Cheng according to their elo graph, so far a disappointment.
Note that I'm using 40 moves in 2 min now so the TC should be better for Leela than 40/1min I played before (note it's still the official OpenCL-based engine).
What exactly does their elo graph show anyway? Do they run regression tests from time to time or is it just delta from the previous version?
If so then that's pretty much random and useless if improvements are small.
Anyway, always the same story with Leela: blundering random moves like crazy,
losing to shallow tactics. I even saw Leela blunder twice in a single game, first throwing away a win then wasting a draw
- no way they can compete with the top dogs with this approach on consumer HW (not to mention that current SF should be on par with A0 elo-wise on Google HW).
I plan to play 200 games to get a rough idea of how strong the current engine + net is, I'll post the results here.
Martin Sedlak
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
So, some sort of confirmation, both on overall strength and on tactics. I was wondering about the validity of my test against one AB engine, LC0 being on CPU and with pretty low number of playouts. The Elo graph is here, since the first bigger net ID227:mar wrote: ↑Thu May 17, 2018 8:20 amWell I'm playing some test games with ID303 and so far (20 games played) it seems not 100 elo stronger than ID 24x I played last time, but rather 100 elo weaker....Laskos wrote: ↑Thu May 17, 2018 12:22 am Yes, the same for ID302 compared to ID292, no improvement (well, within error margins, so there is maybe at most 20 Elo points improvement). They see 130 Elo points improvement in self-games. Either something is wrong with my testing, or again something is fishy in their framework.
Still too early to draw conclusions, but 25% after 20 games when I expected Leela to be on par with Cheng according to their elo graph, so far a disappointment.
Note that I'm using 40 moves in 2 min now so the TC should be better for Leela than 40/1min I played before (note it's still the official OpenCL-based engine).
What exactly does their elo graph show anyway? Do they run regression tests from time to time or is it just delta from the previous version?
If so then that's pretty much random and useless if improvements are small.
Anyway, always the same story with Leela: blundering random moves like crazy,
losing to shallow tactics. I even saw Leela blunder twice in a single game, first throwing away a win then wasting a draw
- no way they can compete with the top dogs with this approach on consumer HW (not to mention that current SF should be on par with A0 elo-wise on Google HW).
I plan to play 200 games to get a rough idea of how strong the current engine + net is, I'll post the results here.
Red lines are one standard deviation. There seem to have been an improvement, but I guess there are still critical bugs in their engine v0.10. They are very careless adding 100+ commits since v0.7, without any proper testing.
They see in the last 2 datapoints a 130 Elo points progress, I see no progress at all. They don't seem to run regression tests, and are just comparing to previous version with "freezing temperature", if I understood. Never mind that these small "gains" could be almost orthogonal taken successively, so all in all add to nothing in a regression test.
We are in agreement also on easy tactics: it is worse now with ID302 than with the initial ID227. I used Albert's cleaned WAC201.epd tactical suite
6s/position on 4 CPU threads, equivalent to 1s/position on GTX 1060:
ID227
score=84/201 [averages on correct positions: depth=11.1 time=0.96 nodes=178]
ID302
score=74/201 [averages on correct positions: depth=11.3 time=1.21 nodes=190]
So, even if it gained Elo points since ID227, the easy tactics is even worse. I think they have to roll-back to a less buggy engine, say v0.7 and older nets, and then accept commits after severe vetting and testing (more or less SF framework).
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
I have been running my own tests since April. ID 303 is currently being tested (ID 100 will be added later)
A google spreadsheet with details/graph and conditions(+games) is prepared but not ready yet for publishing.
Each LCZero version always plays 10*30 games vs. the same 10 opponents.
Each 30 games batch is randomly played from a small ~1200 3 moves pgn with reversed colours.
TC always is 5+5 vs. 2+1, thus a timeodds of around 3.5-4.5:1 in favour of LCZero
to mimic a better gpu. Current used gpu is very weak but not old. (with current net size around 70-80nps)
Actually I bought it for around 30€ and the reason for it was, that it is cooled passive (no fan) and thus absolutely silent.
(The one before slowly died with a hell of a noise sometimes due to a damaged fan)
Below is the current CCRL40/4 calibrated result calculated with ordo with 400 simuls.
A little note:
Counter 1.2-64 is still and always was outside the err window.
This was the reason why I asked for it being 32 or 64 bit in CCRL.
(the result was that it was able to run both ways, but this was not
distinguished in the ratings)
viewtopic.php?f=2&t=67250
This means the CCRL rating for Counter_12-64 should be likely a bit higher,
and this would shift ratings for all LCZero entities a bit higher in comparison.
A google spreadsheet with details/graph and conditions(+games) is prepared but not ready yet for publishing.
Each LCZero version always plays 10*30 games vs. the same 10 opponents.
Each 30 games batch is randomly played from a small ~1200 3 moves pgn with reversed colours.
TC always is 5+5 vs. 2+1, thus a timeodds of around 3.5-4.5:1 in favour of LCZero
to mimic a better gpu. Current used gpu is very weak but not old. (with current net size around 70-80nps)
Actually I bought it for around 30€ and the reason for it was, that it is cooled passive (no fan) and thus absolutely silent.
(The one before slowly died with a hell of a noise sometimes due to a damaged fan)
Below is the current CCRL40/4 calibrated result calculated with ordo with 400 simuls.
A little note:
Counter 1.2-64 is still and always was outside the err window.
This was the reason why I asked for it being 32 or 64 bit in CCRL.
(the result was that it was able to run both ways, but this was not
distinguished in the ratings)
viewtopic.php?f=2&t=67250
This means the CCRL rating for Counter_12-64 should be likely a bit higher,
and this would shift ratings for all LCZero entities a bit higher in comparison.
Code: Select all
# PLAYER RATING ERROR POINTS PLAYED (%) CCRL 40/4(1) CCRL 40/40(2) Diff 1 Diff 2
1 Chronos_197 2631.48 49.51 73.0 150 48.7 2639 2639 -7.52 -7.52
2 Counter_12-64 2505.12 50.05 49.5 150 33.0 2446 2468 59.12 37.12
3 Danasah_70 2592.51 49.77 65.5 150 43.7 2596 2611 -3.49 -18.49
4 Glaurung_201-64 2720.07 51.65 90.0 150 60.0 2740 2745 -19.93 -24.93
5 Hermann_25-64 2510.87 50.68 50.5 150 33.7 2512 2496 -1.13 14.87
6 Jellyfish_11-64 2628.90 49.21 72.5 150 48.3 2608 2577 20.90 51.90
7 LCZero_07ID125 2509.54* 34.74 109.5 300 36.5 * * * *
8 LCZero_07ID150 2518.65* 35.74 113.0 300 37.7 * * * *
9 LCZero_07ID181 2669.39* 35.10 174.0 300 58.0 * * * *
10 LCZero_07ID231 2740.88* 35.92 201.5 300 67.2 * * * *
11 LCZero_010ID254 2767.63* 36.07 211.0 300 70.3 * * * *
12 LCZero_010ID303 * * * * * * * * *
13 Monolith_04-64 2574.05 48.82 62.0 150 41.3 2597 2591 -22.95 -16.95
14 Rodent_10-64 2683.21 48.12 83.0 150 55.3 2692 2677 -8.79 6.21
15 Rotor_08 2613.37 45.95 69.5 150 46.3 2612 2628 1.37 -14.63
16 Tucano_400-64 2644.39 48.60 75.5 150 50.3 2662 2664 -17.61 -19.61
---------------------------------------------------------------------------------------------------------------
Gauntlet Opp Rating 2610.40 2610.40 2609.60 0.00 0.80
avg adapted avg avg avg avg
https://rwbc-chess.de
trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
-
- Posts: 4607
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
They did a very few regression tests in the past and only one lately. (233 vs. 292)
http://lczero.org/matches
Anyhow as you have noticed and what is mentioned since long, the SP ratings are quite meaningless for various reasons.
Guenther
https://rwbc-chess.de
trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/people stuck in the pleistocene > 80% of the posts fall into this category...
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
I have to check. David Xu, are you asking me that question? If you're not, please ignore the following.
If you are, I don't know why you resort to personal attacks.
Do you realize that once someone else wrongly grouped me with you and attacked me for your views? I didn't see you reply then to attack him or to tell him it was your views he was attacking, not mine. Another person attacked you then, again without you responding. Why not?
I have never attacked you. I have never attacked anyone here.
So you decide you have to butt in to a conversation with yanquis1972 and Albert and attack me?
May I ask what your special qualifications are?
You appear to be extremely intolerant of anyone saying anything you don't like, even if they are not speaking to you and even if they don't know you don't like what they say.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
You tell a typical anonymous troll to add someone to ignore list, gee.
That David Xu guy posted in total 40 posts on this forum out of which 38 are oneliners, and in most of those he just calls ppl names, stalks them, and posts meaningless BS. He is someone that is the best recommendation for ignore list.
Your judgement of ppl is problematic at best.
-
- Posts: 560
- Joined: Sun Nov 08, 2015 11:10 pm
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
As far as I can tell, the last claim he made against me was the performance saturation can be nowhere close, now with the reality came should I enjoy returning the favor because "facts don't care about your feelings"?Milos wrote: ↑Thu May 17, 2018 11:56 amYou tell a typical anonymous troll to add someone to ignore list, gee.
That David Xu guy posted in total 40 posts on this forum out of which 38 are oneliners, and in most of those he just calls ppl names, stalks them, and posts meaningless BS. He is someone that is the best recommendation for ignore list.
Your judgement of ppl is problematic at best.
Back to the topic, LC0 is doing okay, and it seems not likely to get another +400 real world ELO on an average hardware just by tossing more games into it, Zuck's team probably demonstrated that with a reasonable amount of hardware in the NN realm.
-
- Posts: 143
- Joined: Wed Jan 17, 2018 1:26 pm
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
Most recent commits are either changes to the lc0 implementation with multiple backends for neural net evaluation, bugfixes to original lczero, or diagnostic or server features. Commits that directly affect play are already handled much more conservatively now compared to a few weeks ago.Laskos wrote: ↑Thu May 17, 2018 9:10 am Red lines are one standard deviation. There seem to have been an improvement, but I guess there are still critical bugs in their engine v0.10. They are very careless adding 100+ commits since v0.7, without any proper testing.
They see in the last 2 datapoints a 130 Elo points progress, I see no progress at all. They don't seem to run regression tests, and are just comparing to previous version with "freezing temperature", if I understood. Never mind that these small "gains" could be almost orthogonal taken successively, so all in all add to nothing in a regression test.
The discrepancies of self-play Elo to your testing could also stem from different methods: Afaik you test with opening books, is that correct? Self-play matches do not use a book, instead temperature (determining the chance to pick a move that did not receive the most visits) is used, mostly in the opening and much less later in game. That means that any new opening knowledge discovered, for instance which lines to prefer or to avoid, will be measured by self-play Elo but entirely missed by testing which uses a fixed book instead.
-
- Posts: 60
- Joined: Thu Jul 07, 2016 10:15 pm
Re: LCZero: Progress and Scaling. Relation to CCRL Elo
What happens? Can Lczero beats a human?noobpwnftw wrote: ↑Thu May 17, 2018 12:18 pmAs far as I can tell, the last claim he made against me was the performance saturation can be nowhere close, now with the reality came should I enjoy returning the favor because "facts don't care about your feelings"?Milos wrote: ↑Thu May 17, 2018 11:56 amYou tell a typical anonymous troll to add someone to ignore list, gee.
That David Xu guy posted in total 40 posts on this forum out of which 38 are oneliners, and in most of those he just calls ppl names, stalks them, and posts meaningless BS. He is someone that is the best recommendation for ignore list.
Your judgement of ppl is problematic at best.
Back to the topic, LC0 is doing okay, and it seems not likely to get another +400 real world ELO on an average hardware just by tossing more games into it, Zuck's team probably demonstrated that with a reasonable amount of hardware in the NN realm.