Lc0 51010

supersharp77 · Post by **supersharp77** » Sun Mar 31, 2019 8:53 pm

lkaufman wrote: ↑Sun Mar 31, 2019 8:07 am One question: does anyone know roughly how many 2080s would be needed to duplicate the training that this 51xxx series has averaged so far? It doesn't mean much to say that it has trained for two days without stating what the average resources used for the training were. I imagine that they were just a tiny percentage of the resources used to train AlphaZero in 9 hours or so.

NN 51010 has had 559441 games. The NN size is smaller (10x128) than the A0 one and 41xxx NNs (20x256), but the training param is visits=10000 (visits=800 for 41xxx).

I'll let others say what 2080 time that works out to.
[/quote]
I will try to estimate (writing on the phone, I am on a vacation

). About 0.2s/move on 2080, meaning games in about 20s. About 200 games per hour, so half a million games need 2500 hours, or about 100 days needed for training the NN 51010 on a single 2080. Or 100x 2080 GPUs needed to train it in one day.
The main unknown is the time to get 10000 visits, I took it as 0.2s on 2080. So this calculation is just for the order of magnitude.
[/quote]

Thanks. So the training resources were a lot more than I thought they were. It makes the amazing results at least easier to accept.
[/quote]

I will repeat: the average nodes per visit in this training run is around 800 nodes, not 10 thousand. It is a complete misunderstanding of what is being done. The average nodes per move per game is around 800 still.
[/quote]

Ok, then I would estimate a game on 2080 to take some 6 seconds. Meaning 30 days of 2080 training for half a million games, or 30x 2080 GPUs training for one day.
[/quote]

Much closer to what I imagined.
[/quote]
http://talkchess.com/forum3/viewtopic.p ... t+on+draws

This is the problem with the "NN Concept" It is always "Best" in chess engine tourneys to try and Limit the GUI opening book that way you are testing engine overall strength and not just playing "opening book moves" Found this a problem with Lokasoft, Chess genius and the Arena GUi's.. found a workaround by using the engine opening books (as long as they were not too large) Cerebellum books caused quite a stir for this very reason.(too large) most engine games go "out of book" quite early of left to their own devices, so if a few engines(2-3) had 'different' Cerebellum books, not much of a issue..big problem if they all used the same book. Huge Neural Nets using Milions & Millions of Games as a "basis for play" creates a Big Big Problem. CMC chess engine I remember used the entire "Week In Chess Database" as it's opening book..To me that was a problem...(best to limit books 7-8-9 moves)..AR

jp · Post by jp » Sun Mar 31, 2019 9:38 pm

Albert Silver wrote: ↑Sun Mar 31, 2019 5:42 pm
jp wrote: ↑Sun Mar 31, 2019 8:37 am NN 51010 has had 559441 games. The NN size is smaller (10x128) than the A0 one and 41xxx NNs (20x256), but the training param is visits=10000 (visits=800 for 41xxx).
That is inaccurate. It is averaging around 800 visits per move too.

That info was taken directly from the Lc0 website. If it's inaccurate, then someone there should correct it. Why do you believe they have wrong info on their website?

jp · Post by jp » Sun Mar 31, 2019 9:43 pm

Uri Blass wrote: ↑Sun Mar 31, 2019 8:27 pm
Ozymandias wrote: ↑Sun Mar 31, 2019 12:23 pm
Uri Blass wrote: ↑Sun Mar 31, 2019 11:20 amThe claim is that TCEC used some opening book to reduce the number of draws and with good opening book we are going to get almost 100% draws.

larry's words about it:
"It seems pretty clear that if you take the strongest Stockfish and the strongest NN engine on TCEC type hardware, and have them play a long match at TCEC time controls, the results will depend heavily on the openings. If you give each side a really good, deep opening book and have them play only the optimum or near optimum openings, nearly every game will end in a draw."

I am not sure about it and I have 2 problems:
1)How do you define if a book is a good deep opening book?
2)How do you define if an opening is the optimum or near the optimum opening?
I'm surprised to see this topic still being debated after so long, maybe you missed the draw death of Freestyle chess. The answer to your questions is there.
The draw death of freestyle chess proves nothing because it is possible that the participants are simply at equal level and it is possible to beat them by playing better than them.

There are also OTB tournaments when most of the games are drawn and not because the players make no mistakes.

There's still the match A0 vs crippled SF8+book, which was mostly draws.

jp · Post by jp » Sun Mar 31, 2019 9:47 pm

Albert Silver wrote: ↑Sun Mar 31, 2019 6:37 pm
lkaufman wrote: ↑Sun Mar 31, 2019 6:30 pm
Laskos wrote: ↑Sun Mar 31, 2019 10:58 am
jp wrote: ↑Sun Mar 31, 2019 8:37 am
lkaufman wrote: ↑Sun Mar 31, 2019 8:07 am
NN 51010 has had 559441 games. The NN size is smaller (10x128) than the A0 one and 41xxx NNs (20x256), but the training param is visits=10000 (visits=800 for 41xxx).

I'll let others say what 2080 time that works out to.
I will try to estimate (writing on the phone, I am on a vacation ). About 0.2s/move on 2080, meaning games in about 20s. About 200 games per hour, so half a million games need 2500 hours, or about 100 days needed for training the NN 51010 on a single 2080. Or 100x 2080 GPUs needed to train it in one day.
The main unknown is the time to get 10000 visits, I took it as 0.2s on 2080. So this calculation is just for the order of magnitude.
Thanks. So the training resources were a lot more than I thought they were. It makes the amazing results at least easier to accept.
I will repeat: the average nodes per visit in this training run is around 800 nodes, not 10 thousand. It is a complete misunderstanding of what is being done. The average nodes per move per game is around 800 still.

I will repeat: visits=10000 for NN 51010 (visits=800 for 41xxx) is what the lc0 website says.

nabildanial · Post by **nabildanial** » Sun Mar 31, 2019 10:13 pm

jp wrote: ↑Sun Mar 31, 2019 9:47 pm
Albert Silver wrote: ↑Sun Mar 31, 2019 6:37 pm
lkaufman wrote: ↑Sun Mar 31, 2019 6:30 pm
Laskos wrote: ↑Sun Mar 31, 2019 10:58 am
jp wrote: ↑Sun Mar 31, 2019 8:37 am
lkaufman wrote: ↑Sun Mar 31, 2019 8:07 am
NN 51010 has had 559441 games. The NN size is smaller (10x128) than the A0 one and 41xxx NNs (20x256), but the training param is visits=10000 (visits=800 for 41xxx).

I'll let others say what 2080 time that works out to.
I will try to estimate (writing on the phone, I am on a vacation ). About 0.2s/move on 2080, meaning games in about 20s. About 200 games per hour, so half a million games need 2500 hours, or about 100 days needed for training the NN 51010 on a single 2080. Or 100x 2080 GPUs needed to train it in one day.
The main unknown is the time to get 10000 visits, I took it as 0.2s on 2080. So this calculation is just for the order of magnitude.
Thanks. So the training resources were a lot more than I thought they were. It makes the amazing results at least easier to accept.
I will repeat: the average nodes per visit in this training run is around 800 nodes, not 10 thousand. It is a complete misunderstanding of what is being done. The average nodes per move per game is around 800 still.
I will repeat: visits=10000 for NN 51010 (visits=800 for 41xxx) is what the lc0 website says.

Correct me if I'm wrong, but as far as I know starting from T50, there's another experiment that enables Lc0 to dynamically use different visit counts depending on the complexity of each positions. Visit=10000 is the upper bound, lc0 might use as low as 1 visit on a position but the average visit is still around 800 visits per game in training.

nabildanial · Post by **nabildanial** » Sun Mar 31, 2019 10:17 pm

nabildanial wrote: ↑Sun Mar 31, 2019 10:13 pm
jp wrote: ↑Sun Mar 31, 2019 9:47 pm
Albert Silver wrote: ↑Sun Mar 31, 2019 6:37 pm
lkaufman wrote: ↑Sun Mar 31, 2019 6:30 pm
Laskos wrote: ↑Sun Mar 31, 2019 10:58 am
jp wrote: ↑Sun Mar 31, 2019 8:37 am
lkaufman wrote: ↑Sun Mar 31, 2019 8:07 am
NN 51010 has had 559441 games. The NN size is smaller (10x128) than the A0 one and 41xxx NNs (20x256), but the training param is visits=10000 (visits=800 for 41xxx).

I'll let others say what 2080 time that works out to.
I will try to estimate (writing on the phone, I am on a vacation ). About 0.2s/move on 2080, meaning games in about 20s. About 200 games per hour, so half a million games need 2500 hours, or about 100 days needed for training the NN 51010 on a single 2080. Or 100x 2080 GPUs needed to train it in one day.
The main unknown is the time to get 10000 visits, I took it as 0.2s on 2080. So this calculation is just for the order of magnitude.
Thanks. So the training resources were a lot more than I thought they were. It makes the amazing results at least easier to accept.
I will repeat: the average nodes per visit in this training run is around 800 nodes, not 10 thousand. It is a complete misunderstanding of what is being done. The average nodes per move per game is around 800 still.
I will repeat: visits=10000 for NN 51010 (visits=800 for 41xxx) is what the lc0 website says.
Correct me if I'm wrong, but as far as I know starting from T50, there's another experiment that enables Lc0 to dynamically use different visit counts depending on the complexity of each positions. Visit=10000 is the upper bound, lc0 might use as low as 1 visit on a position but the average visit is still around 800 visits per game in training.

jp · Post by jp » Sun Mar 31, 2019 10:21 pm

nabildanial wrote: ↑Sun Mar 31, 2019 10:17 pm Correct me if I'm wrong, but as far as I know starting from T50, there's another experiment that enables Lc0 to dynamically use different visit counts depending on the complexity of each positions. Visit=10000 is the upper bound, lc0 might use as low as 1 visit on a position but the average visit is still around 800 visits per game in training.

That would make sense. And before T50 was it fixed to be exactly 800?

I'd guess there'd have to be another parameter for that.
What is "--minimum-kldgain-per-node=0.000008"?

Maybe there's something about this on Discord, is there?

nabildanial · Post by **nabildanial** » Sun Mar 31, 2019 10:35 pm

jp wrote: ↑Sun Mar 31, 2019 10:21 pm
nabildanial wrote: ↑Sun Mar 31, 2019 10:17 pm Correct me if I'm wrong, but as far as I know starting from T50, there's another experiment that enables Lc0 to dynamically use different visit counts depending on the complexity of each positions. Visit=10000 is the upper bound, lc0 might use as low as 1 visit on a position but the average visit is still around 800 visits per game in training.
That would make sense. And before T50 was it fixed to be exactly 800?

I'd guess there'd have to be another parameter for that.
Maybe there's something on Discord, is there?

What is "kld"?

Correct. You can read more about that here under "kldgain thresholding in training" topic.

Albert Silver · Post by **Albert Silver** » Mon Apr 01, 2019 4:10 am

nabildanial wrote: ↑Sun Mar 31, 2019 10:13 pm
jp wrote: ↑Sun Mar 31, 2019 9:47 pm
Albert Silver wrote: ↑Sun Mar 31, 2019 6:37 pm
lkaufman wrote: ↑Sun Mar 31, 2019 6:30 pm
Laskos wrote: ↑Sun Mar 31, 2019 10:58 am
jp wrote: ↑Sun Mar 31, 2019 8:37 am
lkaufman wrote: ↑Sun Mar 31, 2019 8:07 am
NN 51010 has had 559441 games. The NN size is smaller (10x128) than the A0 one and 41xxx NNs (20x256), but the training param is visits=10000 (visits=800 for 41xxx).

I'll let others say what 2080 time that works out to.
I will try to estimate (writing on the phone, I am on a vacation ). About 0.2s/move on 2080, meaning games in about 20s. About 200 games per hour, so half a million games need 2500 hours, or about 100 days needed for training the NN 51010 on a single 2080. Or 100x 2080 GPUs needed to train it in one day.
The main unknown is the time to get 10000 visits, I took it as 0.2s on 2080. So this calculation is just for the order of magnitude.
Thanks. So the training resources were a lot more than I thought they were. It makes the amazing results at least easier to accept.
I will repeat: the average nodes per visit in this training run is around 800 nodes, not 10 thousand. It is a complete misunderstanding of what is being done. The average nodes per move per game is around 800 still.
I will repeat: visits=10000 for NN 51010 (visits=800 for 41xxx) is what the lc0 website says.
Correct me if I'm wrong, but as far as I know starting from T50, there's another experiment that enables Lc0 to dynamically use different visit counts depending on the complexity of each positions. Visit=10000 is the upper bound, lc0 might use as low as 1 visit on a position but the average visit is still around 800 visits per game in training.

You're not wrong. The KLD value is regularly updated because otherwise the average nodes per move over the course of the game drops to under 800.

jp · Post by jp » Mon Apr 01, 2019 6:32 am

nabildanial wrote: ↑Sun Mar 31, 2019 10:35 pm Correct. You can read more about that here under "kldgain thresholding in training" topic.

Thanks for this link.

Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010

Re: Lc0 51010