LCzero sacs a knight for nothing

Laskos · Post by **Laskos** » Fri Apr 20, 2018 2:13 pm

noobpwnftw wrote:Latest SFdev scaling test results:
32 threads vs 384 threads w/ HT @ 60+0.6, 2GB hash, still ongoing:

ELO: -129.80 +-36.8 (95%) LOS: 0.0%
Total: 112 W: 1 L: 41 D: 70

So where was that a/b engines can't scale well coming from?

Apparently with such hardware odds, one does not need a lame config with SF8 to get 0 loss in 100 matches.

With Lazy SMP, the wisdom about a/b parallelization changed a bit. The doubling in time your conditions seems to be about 50-60 Elo points worth for SF, so 12x thread factor gives you 2-2.5 time doublings, or 4x-6x effective speedup. Which is very good for such a huge number of threads, but still a loss of a factor 2-3 compared to 12x thread number increase.

LC0 scales probably significantly better Elo-wise with time (or hardware).

noobpwnftw · Post by **noobpwnftw** » Fri Apr 20, 2018 2:26 pm

It's significant enough for SF considering it's strength at 32 cores.

Current common hardware LC0's scaling is on the upward curve while SF is very likely to be on the downward, but neither is reaching the point where better hardware gains almost nothing.

CMCanavessi · Post by **CMCanavessi** » Fri Apr 20, 2018 2:28 pm

Daniel Shawul wrote:
Why this argument again? How many times have I already mentioned that A0 on 1080TI + 1min / move should be of comparable strength to SF8@64 cores 1min / move?
A GTX 1080 Ti is 11 TFlops, and 64 cores is 1 TFlops so that is an 11X hardware advantage. Why not use the same 64 CPU cores for it and see if it will beat Stockfish ?

Wow, the Stockfish team must be a really really stupid bunch of programmers if they have 11 TFlops at their disposal in a 1080ti but they don't make the engine take advantage of those and still insist on using those inefficient crappy CPUs that provide 11X less performance...

Milos · Post by **Milos** » Fri Apr 20, 2018 2:42 pm

Laskos wrote:Checking the time used times the nps it indeed gives p=1 for LC0 and checking the depth, it is indeed depth=1 for SF9. All games from different positions.

Actually, if you check actual log you'd see that number of nodes is 2. 1 node is cached from previous move (root node) and 1 is one extra node that is evaluated. So it is indeed 1 node evaluated but not root node, therefore the strength is a higher then if it was really just the root node. But ok, that is just nitpicking.
Anyway, here is my result with 500 balanced openings from my testing collection set limited to 6 moves.
15 games were in total double (of 1000) and those were removed.

Code: Select all

   # PLAYER                      &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 SF180418_depth=1		      &#58; 88.0    7.6    613.5     985   62.3% (+511,=205,-269&#41;
   2 Lczero_cpu_id150_p=1        &#58;  0.0    7.6    371.5     985   37.7% (+269,=205,-511&#41;
   
Games        &#58;    985 &#40;finished&#41;

White Wins   &#58;    401 &#40;40.7 %)
Black Wins   &#58;    379 &#40;38.5 %)
Draws        &#58;    205 &#40;20.8 %)
Unfinished   &#58;      0

White Perf.  &#58; 51.1 %
Black Perf.  &#58; 48.9 %

ECO A =    183 Games &#40;18.6 %)
ECO B =    286 Games &#40;29.0 %)
ECO C =    208 Games &#40;21.1 %)
ECO D =    177 Games &#40;18.0 %)
ECO E =    131 Games &#40;13.3 %)

jkiliani · Post by **jkiliani** » Fri Apr 20, 2018 2:48 pm

Milos wrote:
Laskos wrote:Checking the time used times the nps it indeed gives p=1 for LC0 and checking the depth, it is indeed depth=1 for SF9. All games from different positions.
Actually, if you check actual log you'd see that number of nodes is 2. 1 node is cached from previous move (root node) and 1 is one extra node that is evaluated. So it is indeed 1 node evaluated but not root node, therefore the strength is a higher then if it was really just the root node. But ok, that is just nitpicking.
Anyway, here is my result with 500 balanced openings from my testing collection set limited to 6 moves.
15 games were in total double (of 1000) and those were removed.
Code: Select all
   # PLAYER                      &#58; RATING  ERROR   POINTS  PLAYED    (%)
   1 SF180418_depth=1		      &#58; 88.0    7.6    613.5     985   62.3% (+511,=205,-269&#41;
   2 Lczero_cpu_id150_p=1        &#58;  0.0    7.6    371.5     985   37.7% (+269,=205,-511&#41;
   
Games        &#58;    985 &#40;finished&#41;

White Wins   &#58;    401 &#40;40.7 %)
Black Wins   &#58;    379 &#40;38.5 %)
Draws        &#58;    205 &#40;20.8 %)
Unfinished   &#58;      0

White Perf.  &#58; 51.1 %
Black Perf.  &#58; 48.9 %

ECO A =    183 Games &#40;18.6 %)
ECO B =    286 Games &#40;29.0 %)
ECO C =    208 Games &#40;21.1 %)
ECO D =    177 Games &#40;18.0 %)
ECO E =    131 Games &#40;13.3 %)
   

Well that's some data at least, and it's consistent with what I got for Id 150:

Score of Id_150 vs sf_d1: 52 - 159 - 189 [0.366] 400
Elo difference: -95.26 +/- 24.78

But note that Id_153 scored a lot better than this in my tests. I did by the way use arg="--visits=1" now, so unless you have an issue with the Komodo opening book, we established that LCZero with its raw net plays in the same ballpark as Stockfish with depth 1.

Milos · Post by **Milos** » Fri Apr 20, 2018 2:54 pm

mirek wrote:My GTX970 get's around 2k nps and it's 3.9 TFLOPs, while 1080Ti is 11 TFLOPs so I would expect more like 5k+ nps from 1080Ti (on current LC0 network size)

Also I was under impression that the 43 core TCEC machine was giving about 2-3k nps. I am not sure about it though, since you are reporting 2k nps for 16 cores, does it mean TCEC machine was actually pushing nps in like 5k+ range? Can someone comment on this?

You seems not to be reporting nps from opening and early middle game phase, but include it from ending.
If you follow Jjoshua2 twitch channel you can see what nps he is getting in real time with 1080Ti.

Regarding TCEC machine they don't use HT which would benefit them for at least 50% and as I said for such a configuration openBLAS really sucks. One should use IntelMKL library instead, i.e. recompile LC0 with it.

jdart · Post by **jdart** » Fri Apr 20, 2018 3:47 pm

Still, if on current TCEC hardware it is dropping a piece, and if you gave L0 10x the CPU power it has on the TCEC system, it still seems to me likely to me that it would fail to find tactics that Stockfish does find, especially on that hardware.

A0 on the other hand outplayed Stockfish rather convincingly in the games that I saw. It was on big custom hardware, but I have to think it must have had a different/better algorithm too.

--Jon

Michel · Post by **Michel** » Fri Apr 20, 2018 3:59 pm

Still, if on current TCEC hardware it is dropping a piece

Saying that is was dropping a piece is an exaggeration. LC0 made a nice move but missed an 8 ply tactic.

Nobody will dispute that LC0 is weak on tactics but it is still improving.

CMCanavessi · Post by **CMCanavessi** » Fri Apr 20, 2018 4:18 pm

jdart wrote:Still, if on current TCEC hardware it is dropping a piece, and if you gave L0 10x the CPU power it has on the TCEC system, it still seems to me likely to me that it would fail to find tactics that Stockfish does find, especially on that hardware.

A0 on the other hand outplayed Stockfish rather convincingly in the games that I saw. It was on big custom hardware, but I have to think it must have had a different/better algorithm too.

--Jon

Well, remember it's playing with a rather old network that had bugged matches in part of its training... just now we're starting to see it regain (and surpass) its previous strenght. We'll see how it plays next season, wouldn't surprise me if it just sweeps the entire division 4.

George Tsavdaris · Post by **George Tsavdaris** » Fri Apr 20, 2018 5:36 pm

Laskos wrote: Can somehow confirm with LittleBlitzer and InBetween, from 3-mover balanced book:

Code: Select all

Games Completed = 30 of 100 &#40;Avg game length = 2.370 sec&#41;
Settings = RR/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 195 sec elapsed, 455 sec remaining
 1.  LCZero CPU ID153 p=1     	12.0/30	7-13-10  	&#40;L&#58; m=13 t=0 i=0 a=0&#41;	&#40;D&#58; r=7 i=0 f=2 s=1 a=0&#41;	&#40;tpm=33.3 d=6.09 nps=35&#41;
 2.  SF9 depth=1              	18.0/30	13-7-10  	&#40;L&#58; m=7 t=0 i=0 a=0&#41;	&#40;D&#58; r=7 i=0 f=2 s=1 a=0&#41;	&#40;tpm=10.9 d=1.00 nps=43940&#41;

Can you repeat EXACTLY the same but with ID154 that in selfplay it gives +50 ELO compared to 153?
I want to see how the +50 ELO of selfplay are translated even in this short match.

PS: In order to do this you set Stockfish do a 1 ply search(how? i forgot about all these things) and for LC0 you just put in the parameters the "-p 1" or something else is needed also?

LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing