Houston: We have lift off ...

chrisw · Post by **chrisw** » Fri Nov 16, 2018 9:58 pm

Laskos wrote: ↑Fri Nov 16, 2018 6:16 pm
chrisw wrote: ↑Fri Nov 16, 2018 5:54 pm
Laskos wrote: ↑Fri Nov 16, 2018 3:56 pm I don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:

TC: 60'' + 1''
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
     SF8                           120      68      60   66.7%   43.3%
   
   1 lc0_v19_11261                   0     111      20   50.0%   50.0%
   2 lc0_v19_31214                -147     128      20   30.0%   40.0%
   3 lc0_v19_9155                 -241     127      20   20.0%   40.0%
Finished match
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200 real Elo points over current level, although this is not granted at all.
is that a fact? 6x64 nets are only 100 elo below the best of the larger nets?
No, 250 Elo points below 10xxx run with large 20x256 net. But these 20xxx and 30xxx runs with large nets run for months with only 100 Elo points better results than small net. 20xxx is IIRC already abandoned after several LR droppings, and I think that by now the computing effort is larger than DeepMind learning effort with their enormous hardware for A0 learning.
The start of this effort with CUDA engine was much more promissing, their initial runs achieving with small resources fast and remarkable results.

I stopped paying attention to LC0 claims and chatter some time ago. All too much noise and excited claims and over-estimating that are quite similar to the mug’s eyeful graphic at the top of this thread. It’s self play “elo” (known to be more or less ridiculous), the y-axis scale is cut, making it look dramatic, but it isn’t, and so on. But I’m very interested in the idea that the 6x64 nets are really not so far off the later larger ones. Is there some well tabulated raw data on this?

jp · Post by jp » Fri Nov 16, 2018 11:14 pm

Laskos wrote: ↑Fri Nov 16, 2018 6:16 pm 250 Elo points below 10xxx run with large 20x256 net. But these 20xxx and 30xxx runs with large nets run for months with only 100 Elo points better results than small net. 20xxx is IIRC already abandoned after several LR droppings

Not abandoned, but put on lower priority. 30xxx is of course no longer "zero", so 20xxx is important if we care about being zero.

Andre van Ark · Post by **Andre van Ark** » Sat Nov 17, 2018 7:34 am

My self play “elo” is 2785, but vs other players my elo is only around 1800

chrisw · Post by **chrisw** » Sat Nov 17, 2018 11:26 am

jp wrote: ↑Fri Nov 16, 2018 11:14 pm
Laskos wrote: ↑Fri Nov 16, 2018 6:16 pm 250 Elo points below 10xxx run with large 20x256 net. But these 20xxx and 30xxx runs with large nets run for months with only 100 Elo points better results than small net. 20xxx is IIRC already abandoned after several LR droppings
Not abandoned, but put on lower priority. 30xxx is of course no longer "zero", so 20xxx is important if we care about being zero.

sorry for being boring, and I ought really to know, but I don’t read much the AZ Discord chat. What aspects of 30xxx are no longer Zero?

Laskos · Post by **Laskos** » Sat Nov 17, 2018 12:09 pm

chrisw wrote: ↑Fri Nov 16, 2018 9:58 pm
Laskos wrote: ↑Fri Nov 16, 2018 6:16 pm
chrisw wrote: ↑Fri Nov 16, 2018 5:54 pm
Laskos wrote: ↑Fri Nov 16, 2018 3:56 pm I don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:

TC: 60'' + 1''
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
     SF8                           120      68      60   66.7%   43.3%
   
   1 lc0_v19_11261                   0     111      20   50.0%   50.0%
   2 lc0_v19_31214                -147     128      20   30.0%   40.0%
   3 lc0_v19_9155                 -241     127      20   20.0%   40.0%
Finished match
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200 real Elo points over current level, although this is not granted at all.
is that a fact? 6x64 nets are only 100 elo below the best of the larger nets?
No, 250 Elo points below 10xxx run with large 20x256 net. But these 20xxx and 30xxx runs with large nets run for months with only 100 Elo points better results than small net. 20xxx is IIRC already abandoned after several LR droppings, and I think that by now the computing effort is larger than DeepMind learning effort with their enormous hardware for A0 learning.
The start of this effort with CUDA engine was much more promissing, their initial runs achieving with small resources fast and remarkable results.
I stopped paying attention to LC0 claims and chatter some time ago. All too much noise and excited claims and over-estimating that are quite similar to the mug’s eyeful graphic at the top of this thread. It’s self play “elo” (known to be more or less ridiculous), the y-axis scale is cut, making it look dramatic, but it isn’t, and so on. But I’m very interested in the idea that the 6x64 nets are really not so far off the later larger ones. Is there some well tabulated raw data on this?

They scale differently, so it depends on time control and hardware. But in blitz on a reasonable GPU, the difference between 6x64 run and currently training 20x256 run is about 100 Elo points. Since yesterday evening, they improved their self-Elo by another 200 points, but in my check it amounts to some 50 real Elo points (against SF8). However, it does seem, with LR drops, that fairly soon the 30xxx will become the strongest run, surpassing early and successful 10xxx run. Now the difference between them is only 100 real Elo points (at blitz with GTX 1060). I am interested in this thing, as well as Go programs, so much that I hope that by the end of the next week I will have my RTX 2070 GPU, which is roughly 4 times faster than my current GTX 1060 in these machine learning applications. In Go, the positional strength of "Zero" programs is such, that they are rewriting the textbooks and some "golden rules" which were almost unquestionable for decades and even centuries. But I am very, very weak at Go.

nabildanial · Post by **nabildanial** » Sat Nov 17, 2018 12:16 pm

It is also important to note that it took 52 mil games for the whole Test10 run to finish. Current Test30 by comparison is still at 18.6 mil games or so, not even half way yet.

grahamj · Post by **grahamj** » Sat Nov 17, 2018 2:00 pm

chrisw wrote: ↑Sat Nov 17, 2018 11:26 am
jp wrote: ↑Fri Nov 16, 2018 11:14 pm
Laskos wrote: ↑Fri Nov 16, 2018 6:16 pm 250 Elo points below 10xxx run with large 20x256 net. But these 20xxx and 30xxx runs with large nets run for months with only 100 Elo points better results than small net. 20xxx is IIRC already abandoned after several LR droppings
Not abandoned, but put on lower priority. 30xxx is of course no longer "zero", so 20xxx is important if we care about being zero.
sorry for being boring, and I ought really to know, but I don’t read much the AZ Discord chat. What aspects of 30xxx are no longer Zero?

From discord: Tilps Yesterday at 21:12 said

Test20 is suspended - all contributions are now pointing to Test30 by default.

Test30 uses "tablebase rescoring", explained here: http://blog.lczero.org/2018/09/tb-rescoring.html. Tablebase rescoring is arguably nonzero, and arguably still zero.

jp · Post by jp » Sat Nov 17, 2018 4:10 pm

grahamj wrote: ↑Sat Nov 17, 2018 2:00 pm
chrisw wrote: ↑Sat Nov 17, 2018 11:26 am
jp wrote: ↑Fri Nov 16, 2018 11:14 pm Not abandoned, but put on lower priority. 30xxx is of course no longer "zero", so 20xxx is important if we care about being zero.
sorry for being boring, and I ought really to know, but I don’t read much the AZ Discord chat. What aspects of 30xxx are no longer Zero?
From discord: Tilps Yesterday at 21:12 said
Test20 is suspended - all contributions are now pointing to Test30 by default.
Test30 uses "tablebase rescoring". Tablebase rescoring is arguably nonzero, and arguably still zero.

22201 is the last ID in their graph. If Test20 is totally suspended, with not even the option for contributions to it, that's a disaster, unless they can be convinced to replace it with a "zero" Test40. I hope they can be.

Test30 is only arguably "zero" because "zero" is not a technical word but just a marketing buzzword like "Coke Zero" (except the "zero" in Coke Zero is more meaningful). If you give "zero" a reasonable meaning, it's not zero at all. The people who claim it's still zero are twisting the word "zero" to mean "not fed human errors". Like Coke zero, that's not what "zero" can sensibly mean. If you feed it stuff that's non-human non-errors, that's still not zero. You're still feeding it outside food.

Laskos · Post by **Laskos** » Sat Nov 17, 2018 8:23 pm

Lol, 300 self-Elo points in one day, but only 50 real Elo points against SF8. At this pace, they will need to reach some 6000 self-Elo points to be competitive with 10xxx run, and maybe 7000 to reach A0 level nets. They even seem to have large inversions in self-Elo versus real-Elo. That plot is a joke, but I do think that run 30xxx will be the strongest one.

chrisw · Post by **chrisw** » Sat Nov 17, 2018 10:05 pm

jp wrote: ↑Sat Nov 17, 2018 4:10 pm
grahamj wrote: ↑Sat Nov 17, 2018 2:00 pm
chrisw wrote: ↑Sat Nov 17, 2018 11:26 am
jp wrote: ↑Fri Nov 16, 2018 11:14 pm Not abandoned, but put on lower priority. 30xxx is of course no longer "zero", so 20xxx is important if we care about being zero.
sorry for being boring, and I ought really to know, but I don’t read much the AZ Discord chat. What aspects of 30xxx are no longer Zero?
From discord: Tilps Yesterday at 21:12 said
Test20 is suspended - all contributions are now pointing to Test30 by default.
Test30 uses "tablebase rescoring". Tablebase rescoring is arguably nonzero, and arguably still zero.
22201 is the last ID in their graph. If Test20 is totally suspended, with not even the option for contributions to it, that's a disaster, unless they can be convinced to replace it with a "zero" Test40. I hope they can be.

Test30 is only arguably "zero" because "zero" is not a technical word but just a marketing buzzword like "Coke Zero" (except the "zero" in Coke Zero is more meaningful). If you give "zero" a reasonable meaning, it's not zero at all. The people who claim it's still zero are twisting the word "zero" to mean "not fed human errors". Like Coke zero, that's not what "zero" can sensibly mean. If you feed it stuff that's non-human non-errors, that's still not zero. You're still feeding it outside food.

“They” see “themselves” doing a “community” effort, but “you” see “them”, an “in-group” in which “you” are not “in”, but “you” hope “they” will do whatever and so on.

Herein lies the problem. Groups have positive but also negative aspects. Something has gone wrong (in the sense that the inexorable expected rise in strength is kind of questionable). I’ld be looking at groupthink and social dynamics, rather than technical. There’ll be something dumb, that nobody thought, going on, somewhere, someway, somehow. Is why I decided a few months ago not to “follow” this project, I think I prefer to watch the other ones, working by themselves, or in small groups. All credit, however, to the central core at lc0 for setting a path.

Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...