Something goes wrong with lc0 since yesterday?

ankan · Post by **ankan** » Thu Jul 19, 2018 2:25 pm

ID9155 was after many drops in the learning rate (4 or 5?).
Test10 had only a single LR drop till now.
Bigger LR can help the network learn fast initially and avoid local minima, but it also prevents it converging into a minima (causing relatively large fluctuations in performance after every training iteration).
This time devs are experimenting with keeping bigger LR for longer duration (even when the progress slows down/fluctuates) hoping that it will result in the network finally converging to a better minima.

Another reason why ID9155 is better for you at same time control could be because it's much faster (6x64 network is likely >20x faster than 20x256) so you will get a lot more nodes explored for each move. With equal no of nodes, recent test10 networks are already breaking many records (e.g, see https://docs.google.com/spreadsheets/d/ ... =336087686).
Bigger network is expected to scale better with longer time control.

ID10093 performing 50 elo worse than ID10086 is probably a bit concerning. What were your time-controls, no of games, etc?

yanquis1972 · Post by **yanquis1972** » Thu Jul 19, 2018 2:39 pm

the theory from what i've read is that upgrading to larger nets can lower the ceiling, & what appears oversaturation (stagnation before a reduction in learning rate) can't lower but can raise the ceiling. i don't presume to understand it, but something good clearly happened w/ the first reduction. i assume we're in a saturation phase & the next reduction will see another spike, at which point they have one more LR bullet in the gun if trying to replicate A0. from the chart i'd guess deepmind used their 3rd LR reduction @ step 110k-125k.

i'm guessing the theory is something like explosive growth the first so many steps (A0=~2300elo), slow & steady growth with the last reduction but before that the thinking seems to be that it's a series of spikes

keep in mind 1s/move TC for A0 = ~80KN searched per move; full-strength, 44M games-trained A0 was slightly but significantly ahead of SF8 @ ~70MN/second at about that rate. fast TCs can't be used to benchmark the strength compared to small nets if you don't take scaling into account, but that's assuming the testnet scales as A0 did, which i haven't seen confirmed.

btw, also read someone is donating massive h/w & there will be (iirc) 600k training games/day. so as long as he's ok with another potential reset, i'm not worried rn.

yanquis1972 · Post by **yanquis1972** » Thu Jul 19, 2018 4:09 pm

ankan wrote: ↑Thu Jul 19, 2018 2:25 pm ID9155 was after many drops in the learning rate (4 or 5?).
Test10 had only a single LR drop till now.
Bigger LR can help the network learn fast initially and avoid local minima, but it also prevents it converging into a minima (causing relatively large fluctuations in performance after every training iteration).
This time devs are experimenting with keeping bigger LR for longer duration (even when the progress slows down/fluctuates) hoping that it will result in the network finally converging to a better minima.

Another reason why ID9155 is better for you at same time control could be because it's much faster (6x64 network is likely >20x faster than 20x256) so you will get a lot more nodes explored for each move. With equal no of nodes, recent test10 networks are already breaking many records (e.g, see https://docs.google.com/spreadsheets/d/ ... =336087686).
Bigger network is expected to scale better with longer time control.

ID10093 performing 50 elo worse than ID10086 is probably a bit concerning. What were your time-controls, no of games, etc?

i don't think it's worrisome, yet. there was a lot of fluctuation before the LR reduction, to the point i was nearly positive it was intended to be reset.

what i do find odd is that the peak at 100n is 10087, not 10086 (-20 elo). doesn't that mean too much noise in kai's results, he made a typo, or 100n isn't a solid indicator of strength....?

yanquis1972 · Post by **yanquis1972** » Thu Jul 19, 2018 5:00 pm

elo id10005-id10075 (100n):

id10005 1860 elo
10015 2412
10025 2503
10035 2562
10045 2596
10055 2601
10065 2626
10075 2610
-----

id10068 2642
id10066 2643
id10056 2646 [peak]
id10075 2610
id10076 2630
<<<LR reduction

10077 2745
10078 2751
10079 2762
10080 2774
10081 2790
10082 2786
10083 2797
10084 2801
10085 2802
10086 2795
10087 2813 [current peak]
------
10088 incomplete
10089 2809
10090 incomplete
10091 2809

Laskos · Post by **Laskos** » Thu Jul 19, 2018 7:18 pm

yanquis1972 wrote: ↑Thu Jul 19, 2018 4:09 pm
ankan wrote: ↑Thu Jul 19, 2018 2:25 pm ID9155 was after many drops in the learning rate (4 or 5?).
Test10 had only a single LR drop till now.
Bigger LR can help the network learn fast initially and avoid local minima, but it also prevents it converging into a minima (causing relatively large fluctuations in performance after every training iteration).
This time devs are experimenting with keeping bigger LR for longer duration (even when the progress slows down/fluctuates) hoping that it will result in the network finally converging to a better minima.

Another reason why ID9155 is better for you at same time control could be because it's much faster (6x64 network is likely >20x faster than 20x256) so you will get a lot more nodes explored for each move. With equal no of nodes, recent test10 networks are already breaking many records (e.g, see https://docs.google.com/spreadsheets/d/ ... =336087686).
Bigger network is expected to scale better with longer time control.

ID10093 performing 50 elo worse than ID10086 is probably a bit concerning. What were your time-controls, no of games, etc?
i don't think it's worrisome, yet. there was a lot of fluctuation before the LR reduction, to the point i was nearly positive it was intended to be reset.

what i do find odd is that the peak at 100n is 10087, not 10086 (-20 elo). doesn't that mean too much noise in kai's results, he made a typo, or 100n isn't a solid indicator of strength....?

I haven't tested all, just 10086 and 10093 lately. 200 games at 2+2 each, so even 50 Elo regression is still within 2SD error margins, only likely there is a regression.

CMCanavessi · Post by **CMCanavessi** » Fri Jul 20, 2018 1:34 am

Try 10101, it has set several records in Aloril's sheet.

Laskos · Post by **Laskos** » Fri Jul 20, 2018 9:48 am

CMCanavessi wrote: ↑Fri Jul 20, 2018 1:34 am Try 10101, it has set several records in Aloril's sheet.

I don't know, at short time control ID10104 is 20 Elo points weaker than ID10086 (within error margins), and on testsuites is weaker on longer time control (10s/position)

WAC200 tactical:
ID10086: 126/200
ID10104: 116/200

Openings200 positional:
ID10086: 111/200
ID10104: 108/200

gordonr · Post by **gordonr** » Fri Jul 20, 2018 10:44 am

Laskos wrote: ↑Fri Jul 20, 2018 9:48 am
CMCanavessi wrote: ↑Fri Jul 20, 2018 1:34 am Try 10101, it has set several records in Aloril's sheet.
I don't know, at short time control ID10104 is 20 Elo points weaker than ID10086 (within error margins), and on testsuites is weaker on longer time control (10s/position)

WAC200 tactical:
ID10086: 126/200
ID10104: 116/200

Openings200 positional:
ID10086: 111/200
ID10104: 108/200

I've never looked at any positional test suites, only tactical tests. So a question out of curiousity... how do we know the solution for a positional test case is definitely best? If e.g. AlphaZero failed on a given test case would AlphaZero definitely be wrong? As I said, I know nothing about such tests so this is a genuine curiousity for my own understanding. Thanks.

peter · Post by **peter** » Fri Jul 20, 2018 12:12 pm

Laskos wrote: ↑Fri Jul 20, 2018 9:48 am WAC200 tactical:
ID10086: 126/200
ID10104: 116/200

That's a real bad regression as for my pov.

At CSS forum a user reported with net 494 at 5"/move on 2x GTX 1080, as best result here a while ago, 168/200:

http://forum.computerschach.de/cgi-bin/ ... #pid113947

That's still 2 positions less than old Crafty24, but now 116/200 only again with latest net is simply disappointing.

Laskos · Post by **Laskos** » Fri Jul 20, 2018 1:52 pm

gordonr wrote: ↑Fri Jul 20, 2018 10:44 am
Laskos wrote: ↑Fri Jul 20, 2018 9:48 am
CMCanavessi wrote: ↑Fri Jul 20, 2018 1:34 am Try 10101, it has set several records in Aloril's sheet.
I don't know, at short time control ID10104 is 20 Elo points weaker than ID10086 (within error margins), and on testsuites is weaker on longer time control (10s/position)

WAC200 tactical:
ID10086: 126/200
ID10104: 116/200

Openings200 positional:
ID10086: 111/200
ID10104: 108/200
I've never looked at any positional test suites, only tactical tests. So a question out of curiousity... how do we know the solution for a positional test case is definitely best? If e.g. AlphaZero failed on a given test case would AlphaZero definitely be wrong? As I said, I know nothing about such tests so this is a genuine curiousity for my own understanding. Thanks.

No, that's actually a good question. It's my custom test-suite of 200 opening positions. In my biased opinion, it's the most positional larger test-suite, more positional than STS, although my test-suite relates only to openings. I used large databases of human and computer games, trying to find the best moves, those which are significantly better in the statistic of outcomes. Also, I used engines to check that solutions are not too easy or determined by tactics. Basically, the test-suite is based on accumulated knowledge of opening theory combined with some, not too much, engine analysis to eliminate tactics. From my point of view, STS suffers from over-analyzation with Rybka and some over engines of its time. The results do encourage me to suspect that I am up to something, for example Komodo 12.1.1 comes as best solver at 10s/position on 4 threads with 146/200, the next coming Stockfish dev. with 141/200. This is in accordance with some knowledge that Komodo, although by now significantly weaker overall than Stockfish dev., might be better positionally. Also, another encouraging fact is that Houdini Tactical comes significantly below Houdini default. Going to longer time per position, say in minutes, Komodo and Stockfish solve 160+ out of 200. In my estimation, 15 or so solutions are wrong and 15 or so are too hard to be solved by current engines in reasonable amount of time. If this theory stands, I would expect A0 (on that DeepMind hardware) at 1 minute/move to solve 175-180 out of 200. But that would happen only if the accumulated opening theory is not actually wrong. If it is wrong, A0 will rewrite it, as I believe A0 positionally is a monster (on that hardware and TC). So, if A0 will solve only 140/200 in those conditions, it would be almost sure that I have significantly larger number of wrong solutions than my estimation of 15 or so.

ID395 from main branch on GTX 1060 at 10s/position was almost as strong as SF and Komodo on 4 threads on this test-suite, but since then the results are weaker with the newer nets, be them main branch or testserver. I don't know if that means something, at least main branch new nets are stronger tactically than ID395.

Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?

Re: Something goes wrong with lc0 since yesterday?