LCZero: Progress and Scaling. Relation to CCRL Elo

Laskos · Post by **Laskos** » Mon Apr 30, 2018 12:41 am

Albert Silver wrote:
Robert Flesher wrote:
AdminX wrote:
Laskos wrote:
AdminX wrote:
Laskos wrote:
Werewolf wrote:Very early days, but current result for LCZero 127 on NVidia 1060 vs Colossus 2008b @ 15 sec / move below. Colossus is single Intel Broadwell core @ 4.2 GHz.

4 wins
2 losses
1 draw

for LCZero

Wow, on a good GPU and longer time control, LC0 rocks. It scales completely differently from standard engines, give it strong hardware and LTC, and it soares.
Stop saying things like this, I am trying very hard over here not to upgrade my GPU! You have no idea how tempted I am to spend money on this right now. I'm like a drug addict going through withdrawals.

Pretty same here, need at least an Nvidia 1060, but maybe even higher. Bad times .
I folded Kai and just could take it anymore. Went and bought a GTX 1060, only to find out I needed a SATA 15-Pin Male to Dual 4-Pin Molex Female Y Splitter which I thought I already had, if I do it's saying find me. So I ordered a new one did not feel like driving back to Microcenter to get it. It just kept calling me and calling me. This stuff has me addicted.
+1 I see a Nvidia Titan V coming home with me really soon.
My testing here, still early, is showing just how enormous a difference the GPU level makes. I don't mean the CPU vs GPU results, which are known, but from a good GPU to very good. To do this, I am running identical match (and openings) between Leela NN202 on my desktop and laptop. The CPU is almost identical (Nebula scores 8900KNPS vs 9050KNPS) while the GPU has Leela preforming about 1950 NPS on laptop (GTX980M 8GB) and 2270 on desktop (GTX1060 6GB). The scaling is not even close to comparable. On Desktop, Leela clobbered Nebula 72-28 (+166 Elo), but on laptop they are very close so far. Will report once it is done.

Probably other reasons than benchmark speed. 15% in speed cannot stand for that gap in performances. A doubling or so, depending on TC.

Albert Silver · Post by **Albert Silver** » Mon Apr 30, 2018 2:13 am

Laskos wrote:
Albert Silver wrote: My testing here, still early, is showing just how enormous a difference the GPU level makes. I don't mean the CPU vs GPU results, which are known, but from a good GPU to very good. To do this, I am running identical match (and openings) between Leela NN202 on my desktop and laptop. The CPU is almost identical (Nebula scores 8900KNPS vs 9050KNPS) while the GPU has Leela preforming about 1950 NPS on laptop (GTX980M 8GB) and 2270 on desktop (GTX1060 6GB). The scaling is not even close to comparable. On Desktop, Leela clobbered Nebula 72-28 (+166 Elo), but on laptop they are very close so far. Will report once it is done.
Probably other reasons than benchmark speed. 15% in speed cannot stand for that gap in performances. A doubling or so, depending on TC.

I'm perfectly willing to run the same test on another engine, identical conditions and similar ratings. It will certainly help verify that this isn't just Nebula unable to cope. The speeds benchmarked are not in doubt.

Albert Silver · Post by **Albert Silver** » Mon Apr 30, 2018 2:16 am

Leo wrote:It looks like LCzero is at 2500 Elo.

On your rig?

Milos · Post by **Milos** » Mon Apr 30, 2018 2:17 am

Albert Silver wrote:My testing here, still early, is showing just how enormous a difference the GPU level makes. I don't mean the CPU vs GPU results, which are known, but from a good GPU to very good. To do this, I am running identical match (and openings) between Leela NN202 on my desktop and laptop. The CPU is almost identical (Nebula scores 8900KNPS vs 9050KNPS) while the GPU has Leela preforming about 1950 NPS on laptop (GTX980M 8GB) and 2270 on desktop (GTX1060 6GB). The scaling is not even close to comparable. On Desktop, Leela clobbered Nebula 72-28 (+166 Elo), but on laptop they are very close so far. Will report once it is done.

That's because you don't quite understand how things work.
980M is just a tad bit slower than regular 980 which is much more powerful card especially for ML compared to 1060.
So your obviously worse performance despite almost the same nps is due to thermal throttling of your GPU on laptop. Just disable it and you'll have the same performance

.

Milos · Post by **Milos** » Mon Apr 30, 2018 2:24 am

Laskos wrote:It's anyway necessary, and from published benchmarks, GTX 1060 comes as one the best price-performance wise. These guys are already testing 192x15 nets, and probably soon 256x20 nets will appear, which will render CPU completely obsolete. Even a strong full i9 CPU will be 10 times slower than a strong GPU. I also need to change my 600W PSU, I guess more is needed to not fry it (or the motherboard). In less than a month I will buy all that, no way, this thing is too exciting to miss it.

Imagine in a year or so GPU races, with guys having arrays of 4-8 super-GPUs/TPUs .

Card is important but much more important is an implementation.
Gian-Carlo's pathetic OpenCL implementation is obviously a tremendous bottleneck.
Tensorflow implementation already brings in a factor of 3, while cuDNN implementation brings another factor of 3 on top of that, i.e. almost factor of 10 compared to Gian-Carlo's implementation.
If someone ports it to TensorRT we'll see another factor of 2 speed up probably

.

Milos · Post by **Milos** » Mon Apr 30, 2018 2:31 am

Laskos wrote:I think there is a trade-off between the size of the net and the number of games (and playouts) one needs for learning per time. Larger nets are slower. I guess, until it starts to saturate, it's better to keep smaller nets, for faster games. A0 team had huge hardware for learning phase with an initial very big size of the net, and they had 40+ million games having very slow playouts, so the path is a bit different from LC0. But the final point I guess will be similar.
I also think that A0 team had many tries before finding the fastest way to reach that level using an adequate hardware, tries not mentioned in the paper.

With increasing the net you get better performance but there are diminishing returns.
At some point increase in size of the net on a fully trained net will bring less compared to previous size of the net, than what you'd lose in terms of nps by running that larger net.
I am pretty sure Google ppl are not stupid and that they experimented with larger nets (coz they were not a bottleneck for TPUs memory) and found their optimal net size (which is exactly the one in the paper).
Than they simply used as many TPUs in parallel to run it to surpass SF.
If they had a stronger net they could run it on 1 TPU instead of 4. If they had a weaker one they'd needed 8 or 16 TPU but they would anyway only publish results where they surpass SF.
Also bear in mind that Google's inference implementation is far superior than what publicly available TF offers on NVIDIA cards.
The level of optimization is probably close to what NVIDIA manages to get now with TensorRT.

mar · Post by **mar** » Mon Apr 30, 2018 3:46 am

Leo wrote:It looks like LCzero is at 2500 Elo.

On my machine (980Ti), Leela (net 217) seems to perform above 2800 CCRL.
It's very good in opening/midgame, but tactically it's a disaster! (some EG knowlegde is also missing but it's obvious why, not a bit deal)

I saw two games where Leela lost a won game due to dumbest knight fork, losing a queen/rook for no compensation.
No alpha-beta searcher wouldn ever play such blunder even at depth 1 (assuming good qs with checks at depth 0).
I also saw Leela move into mate in 2 twice.

So Leela misses simple 4 ply tactics, which is really sad as this costs her at least 100 elo.

Nobody says the Deepmind approach can't be improved
so I have high hopes for the future as there is no doubt that she improves,
I wouldn't be surprised if Leela hits 3k elo in a month or two.

Nevertheless it's fun to watch as Leela isn't afraid of sacrifice material, an amazing project.

Albert Silver · Post by **Albert Silver** » Mon Apr 30, 2018 4:57 am

Milos wrote:
Albert Silver wrote:My testing here, still early, is showing just how enormous a difference the GPU level makes. I don't mean the CPU vs GPU results, which are known, but from a good GPU to very good. To do this, I am running identical match (and openings) between Leela NN202 on my desktop and laptop. The CPU is almost identical (Nebula scores 8900KNPS vs 9050KNPS) while the GPU has Leela preforming about 1950 NPS on laptop (GTX980M 8GB) and 2270 on desktop (GTX1060 6GB). The scaling is not even close to comparable. On Desktop, Leela clobbered Nebula 72-28 (+166 Elo), but on laptop they are very close so far. Will report once it is done.
That's because you don't quite understand how things work.
980M is just a tad bit slower than regular 980 which is much more powerful card especially for ML compared to 1060.
So your obviously worse performance despite almost the same nps is due to thermal throttling of your GPU on laptop. Just disable it and you'll have the same performance .

It pains me to admit it, but you were right about the throttling. I'll test it in desktop, and just disable a couple of CPU cores. and then compare. That said, it is fascinating to know that NN202, which is what I had tested, performs at 2900 CCRL on a dinky old Sandy Bridge i5 with a GTX1060.

Werewolf · Post by **Werewolf** » Mon Apr 30, 2018 8:27 am

Milos wrote:
980M is just a tad bit slower than regular 980 which is much more powerful card especially for ML compared to 1060.

Are you basing that on their GFLOPS processing power?

The 1060 is roughly 3800 GFLOPS and the 980 is 4600 GFLOPS.

nabildanial · Post by **nabildanial** » Mon Apr 30, 2018 8:49 am

Werewolf wrote:
Milos wrote:
980M is just a tad bit slower than regular 980 which is much more powerful card especially for ML compared to 1060.
Are you basing that on their GFLOPS processing power?

The 1060 is roughly 3800 GFLOPS and the 980 is 4600 GFLOPS.

I have both 970 and 1060, and their performance are the same for gaming, video rendering and for using Leela. 980 is around 20%-30% faster than a 970 and it should be the same against 1060 too.

LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo

Re: LCZero: Progress and Scaling. Relation to CCRL Elo