LCZero Accomplishments and Goals Thus Far

Daniel Shawul · Post by **Daniel Shawul** » Mon Apr 30, 2018 9:11 pm

Uri Blass wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
I see so many excited people giving a hardware advantage to LCzero, like CCLS does for instance uses a GPU for LCzero and single core CPU for the rest of the engines.

I don't think anybody misses the fact that given a hardware advantage that will basically make the evaluation free, you can increase your elo to your satisfaction.
If Stockfish had its evaluation FGPA'ed it may be a 4000 elo engine but who cares for that anyway ? The fact is Stockfish has been throwing away evaluation features for the sake of speed throughout the years.

Daniel
I wonder 2100 at what time control and if the rating is not higher at longer time control.

Edit:I am not sure if the parts that stockfish throwed away were productive for stockfish even with no price in speed and I guess that stockfish had counter productive code in the evaluation not because it made it slower but because something that humans believe to help simply does not help.

Edit 2:I believe that stockfish scale better than weaker engines including engines with slower evaluation and when I say scale better I mean that if the weaker engine need 10 minutes against 1 minute of stockfish to get 50% then they need more than 20 minutes against 2 minutes of stockfish to get 50%

generally stronger engine at STC scale better and Alphazero is an exception to the rule if to believe Google.

I went from game in 20sec+0.1sec upto game in 10min+10sec IIRC each using 2 threads each and it performed 2100 elo on average. I didn't use a GPU.

It may scale better than other engines as TCEC showed. To get from a 2100 elo to being remotely competitive at TCEC even though it lost almost all games is fantastic; I recall even the developers were surprized at that.

Daniel

noobpwnftw · Post by **noobpwnftw** » Mon Apr 30, 2018 9:14 pm

hgm wrote:Indeed, the PCI bus is a lot slower than CPU-DRAM traffic. But remember that AlphaZero needed only 80knps to beat Stockfish' 70Mnps. So I guess you don't care so much if things are a factor 100 slower, if you have a factor 1000 more time.

I have this concern about the way their MCTS search was implemented, it is still depth dependent due to their way to explore child based on node visits, this tends to be heavily IO bound if mass parallelized.

If SMP is introduced then it might have similar cut-off waste ratio like a/b, those are more expensive considering slower visit per node.

hgm · Post by **hgm** » Mon Apr 30, 2018 9:21 pm

Joost Buijs wrote:How do you know how many transistors AlphaZero needs to play its' games?

The TPUs are described in good detail in some paper. This mentions chip surface, and the technology used (i.e. how many nm resolution for the lithography).

The article is very vague about everything, they used 4 TPU's (gen1 or gen2?) do they mean 4 gen2 TPU modules with 4 asics each?

For playing they use gen1 TPUs; the gen2 TPUs are only needed for training the network. I think the paper was quite explicit about that. (Or perhaps the AlphaGo Zero paper.)

Are you very sure about the level of 3500 Elo, they only show you 10 games it won against a crippled version of Stockfish 8 and for the rest you have to believe them on their blue eyes.

I don't see how Stockfish was crippled. But if you think they are lying, everything stops. TPUs could be as real as phasers, the 10 games could have been generated by having Houdini play them at 1 hour/move, etc. I can just as easily show you 1000 fabricated games as 10.

They don't tell you how many matches of 100 games they played

Yes they do. They say they play 'a match', meaning one. Every scientist knows that selecting a result when you are supposed to measure something will be considered plain lying.

and how many matches they lost, statistically the error margin on 100 games is rather high and since the network always plays the same move there is a possibility that there were many games repeated, maybe the SMP randomness in Stockfish gave some variation, I don't know.

This is indeed a concern. But even beating Stockfish 7-3 without any losses is very strong evidence for a sizable Elo gap. So you can put a question mark with the error bar.

Time will tell I guess, LCZero is already past 10 million training games, Google seems to have used 44 million, with the current speed it will take another 3 months for LCZero to reach 44 million, and I wonder if it is able to close the gap of 600 Elo with Stockfish within 3 months.

You should not expect LCZero to surpass Stockfish as AlphaZero did, as a single GPU is a quite poor substitute for 4 TPUs. But of course you could emulate AlphaZero by giving sufficient time odds to have it search the same number of nodes.

hgm · Post by **hgm** » Mon Apr 30, 2018 9:34 pm

noobpwnftw wrote:I have this concern about the way their MCTS search was implemented, it is still depth dependent due to their way to explore child based on node visits, this tends to be heavily IO bound if mass parallelized.

If SMP is introduced then it might have similar cut-off waste ratio like a/b, those are more expensive considering slower visit per node.

Why would you want SMP to do 80knps? You think one CPU cannot handle that?

Only a single position for every 'simulation' is presented to the NN (and thus requires I/O), namely the leaf that you reach. I suppose that the 80knps is determined completely by the number of NN evaluations you can do per second, and that running the NN completely ties up the TPU. (I.e. there is no pipe-lining, you have to wait for the result before presenting the next position to the NN.)

Daniel Shawul · Post by **Daniel Shawul** » Mon Apr 30, 2018 9:34 pm

Daniel Shawul wrote:
Uri Blass wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
I see so many excited people giving a hardware advantage to LCzero, like CCLS does for instance uses a GPU for LCzero and single core CPU for the rest of the engines.

I don't think anybody misses the fact that given a hardware advantage that will basically make the evaluation free, you can increase your elo to your satisfaction.
If Stockfish had its evaluation FGPA'ed it may be a 4000 elo engine but who cares for that anyway ? The fact is Stockfish has been throwing away evaluation features for the sake of speed throughout the years.

Daniel
I wonder 2100 at what time control and if the rating is not higher at longer time control.

Edit:I am not sure if the parts that stockfish throwed away were productive for stockfish even with no price in speed and I guess that stockfish had counter productive code in the evaluation not because it made it slower but because something that humans believe to help simply does not help.

Edit 2:I believe that stockfish scale better than weaker engines including engines with slower evaluation and when I say scale better I mean that if the weaker engine need 10 minutes against 1 minute of stockfish to get 50% then they need more than 20 minutes against 2 minutes of stockfish to get 50%

generally stronger engine at STC scale better and Alphazero is an exception to the rule if to believe Google.
I went from game in 20sec+0.1sec upto game in 10min+10sec IIRC each using 2 threads each and it performed 2100 elo on average. I didn't use a GPU.

It may scale better than other engines as TCEC showed. To get from a 2100 elo to being remotely competitive at TCEC even though it lost almost all games is fantastic; I recall even the developers were surprized at that.

Daniel

I copy paste CEGTs test match for LCzero so that you would understand I am not talking nonsense

Code: Select all

Testmatch
&#40;startposition 87kns&#41; &#40;166 = 5169.70&#41;
1   LCZero 0.7/166 1CPU   2057      011½10½0½0½½111½10½0  10.5/20
2   Anechka 0.08          2040     100½01½1½1½½000½01½1    9.5/20

Lczero 187= 5297.69 &#40;startposition 87kns&#41; 
1   LCZero 0.7 1CPU  2166  +126   100½1111111010½1½½½1  13.5/20
2   Anechka 0.08     2040  -126  011½0000000101½0½½½0    6.5/20

Lczero 181= 5267.33 &#40;startposition 87kns&#41;                          
1   EveAnn 1.72 2071      +108   0½1110½110111010½11½  13.0/20
2   LCZero 0.7/181 1CPU 1963  -108  1½0001½001000101½00½    7.0/20

Lczero 187 = 5297.69 and GPU Intel Graphics 630 &#40;startposition 145kns&#41;
1   LCZero GPU v0.7/187 1CPU 2124   +53   ½010½01½111½1100101½  11.5/20
2   EveAnn 1.72 2071             -53  ½101½10½000½0011010½    8.5/20

Lczero 217 = 5405.12 1CPU &#40;startposition 87kns&#41;                         
1   LCZero/217 0.7 1CPU  2197  +126   1010100111111101½½½1  13.5/20
2   EveAnn 1.72      2071  -126  0101011000000010½½½0    6.5/20

Lczero 187 = 5297.69  &#40;startposition 87kns&#41;
1   LCZero 0.7 1CPU 2123   010½001101100½111½0½  10.0/20  100.00
2   Popochin 4.0 2123     101½110010011½000½1½   10.0/20  100.00

Lczero 217 = 5405.12 1CPU &#40;startposition 87kns&#41; 
1   LCZero 0.7 1CPU  2504  +381   1111111111½111111½10  18.0/20 !?
2   Popochin 4.0     2123  -381  0000000000½000000½01    2.0/20


Lczero 187 = 5297.69  &#40;startposition 365 kns&#41;
1   The Baron 3.42 x64 1CPU  2669  +190   11110½101111110½0111  15.0/20
2   LCZero 0.7 CPU 8Threads  2479  -190  00001½010000001½1000    5.0/20

Lczero 187 = 5297.69  &#40;startposition 572 kns&#41;
1   lczero v0.7 4CPU 2598   +89   ½1010½1111000101½111  12.5/20
2   Djinn 1.021 x64 2509    -89  ½0101½0000111010½000    7.5/20

Lczero 217 = 5405.12 1CPU &#40;startposition 87kns&#41; (+100 in selfplay = +50 against other opponents&#41;
1 Djinn 1.021 x64 2509 +190 ½½11½1½101110½11111½ 15.0/20
2 LCZero 0.7 1CPU 2319 -190 ½½00½0½010001½00000½ 5.0/20

That is the true elo and ID 217 maybe the latest latest network ..

Werner · Post by **Werner** » Mon Apr 30, 2018 9:51 pm

Werner wrote:Anybody knows how to run the engine now on Google Colab now on the new net?
This line is no longer true I think:
!echo '0;XgemmBatched;128;16;128;16;

Edit: just saw it uses the newest net

noobpwnftw · Post by **noobpwnftw** » Mon Apr 30, 2018 9:57 pm

hgm wrote:
noobpwnftw wrote:I have this concern about the way their MCTS search was implemented, it is still depth dependent due to their way to explore child based on node visits, this tends to be heavily IO bound if mass parallelized.

If SMP is introduced then it might have similar cut-off waste ratio like a/b, those are more expensive considering slower visit per node.
Why would you want SMP to do 80knps? You think one CPU cannot handle that?

Only a single position for every 'simulation' is presented to the NN (and thus requires I/O), namely the leaf that you reach. I suppose that the 80knps is determined completely by the number of NN evaluations you can do per second, and that running the NN completely ties up the TPU. (I.e. there is no pipe-lining, you have to wait for the result before presenting the next position to the NN.)

I can recall there was one mention in AG0 or earlier that they used more than one CPU thread to drive the search, even for Leela, one CPU thread can't seem to feed a 1080TI into full load, that yields some 2k NPS.

Albert Silver · Post by **Albert Silver** » Mon Apr 30, 2018 9:59 pm

Daniel Shawul wrote:
Albert Silver wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
I see so many excited people giving a hardware advantage to LCzero, like CCLS does for instance uses a GPU for LCzero and single core CPU for the rest of the engines.
Well, to begin with, I remember when Rybka was the first engine to take advantage of the 64-bit environment when every standard OS was 32-bit. It had a big speed-up, and no one was able to do the same at first. Shredder 64-bit came out a couple of months later but with zero speedup. I don't recall people saying that it needed to run in a 32-bit environment like everyone else to be 'fair'.

The advantage you complain about is just sour grapes in my book. For one thing, if CCLS or whomever offer a GPU, then it is up to the authors to take advantage of it, not for the one who is able to, to learn to dumb down his machine for 'fairness'.

Leela is designed to use a GPU for best performance. it is inherent in its design. If it reaches 100 Elo better than everyone else on my computer because it alone can use the GPU to best advantage, while all others are weaker because they are only able to use the CPU, guess how much I (and everyone who analyzes with engines) will care?
If it is designed solely for the GPU,

I never said anything of the sort. I said it takes advantage of the GPU.

On the other hand, Stockfish can perform 3200+ elo on a mobile processor.

Your point is what exactly?

Daniel Shawul · Post by **Daniel Shawul** » Mon Apr 30, 2018 10:05 pm

Albert Silver wrote:
Daniel Shawul wrote:
Albert Silver wrote:
Daniel Shawul wrote:Sigh..wake me up when it is 2800 elo running on singe CPU core, which is what every other engine uses in rating lists. As far as I am concerned, it is still a 2100 elo engine there.
I see so many excited people giving a hardware advantage to LCzero, like CCLS does for instance uses a GPU for LCzero and single core CPU for the rest of the engines.
Well, to begin with, I remember when Rybka was the first engine to take advantage of the 64-bit environment when every standard OS was 32-bit. It had a big speed-up, and no one was able to do the same at first. Shredder 64-bit came out a couple of months later but with zero speedup. I don't recall people saying that it needed to run in a 32-bit environment like everyone else to be 'fair'.

The advantage you complain about is just sour grapes in my book. For one thing, if CCLS or whomever offer a GPU, then it is up to the authors to take advantage of it, not for the one who is able to, to learn to dumb down his machine for 'fairness'.

Leela is designed to use a GPU for best performance. it is inherent in its design. If it reaches 100 Elo better than everyone else on my computer because it alone can use the GPU to best advantage, while all others are weaker because they are only able to use the CPU, guess how much I (and everyone who analyzes with engines) will care?
If it is designed solely for the GPU,
I never said anything of the sort. I said it takes advantage of the GPU.

On the other hand, Stockfish can perform 3200+ elo on a mobile processor.
Your point is what exactly?

Lol, don't you even recall what you wrote a second ago... Here

Leela is designed to use a GPU for best performance. it is inherent in its design

My point again, if it can percolate through your thick head, is it is not designed for the GPU only.

Guenther · Post by **Guenther** » Mon Apr 30, 2018 10:05 pm

Daniel Shawul wrote:
Code: Select all
Testmatch
&#40;startposition 87kns&#41; &#40;166 = 5169.70&#41;
1   LCZero 0.7/166 1CPU   2057      011½10½0½0½½111½10½0  10.5/20
2   Anechka 0.08          2040     100½01½1½1½½000½01½1    9.5/20

...
That is the true elo and ID 217 maybe the latest latest network ..

What is 'startposition 87knps' ?? I guess 87 nps is meant, which is still nearly 3 times lower than my 34 Euro gpu.

LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far

Re: LCZero Accomplishments and Goals Thus Far