So where was that a/b engines can't scale well coming from?
Apparently with such hardware odds, one does not need a lame config with SF8 to get 0 loss in 100 matches.
With Lazy SMP, the wisdom about a/b parallelization changed a bit. The doubling in time your conditions seems to be about 50-60 Elo points worth for SF, so 12x thread factor gives you 2-2.5 time doublings, or 4x-6x effective speedup. Which is very good for such a huge number of threads, but still a loss of a factor 2-3 compared to 12x thread number increase.
LC0 scales probably significantly better Elo-wise with time (or hardware).
It's significant enough for SF considering it's strength at 32 cores.
Current common hardware LC0's scaling is on the upward curve while SF is very likely to be on the downward, but neither is reaching the point where better hardware gains almost nothing.
Why this argument again? How many times have I already mentioned that A0 on 1080TI + 1min / move should be of comparable strength to SF8@64 cores 1min / move?
A GTX 1080 Ti is 11 TFlops, and 64 cores is 1 TFlops so that is an 11X hardware advantage. Why not use the same 64 CPU cores for it and see if it will beat Stockfish ?
Wow, the Stockfish team must be a really really stupid bunch of programmers if they have 11 TFlops at their disposal in a 1080ti but they don't make the engine take advantage of those and still insist on using those inefficient crappy CPUs that provide 11X less performance...
Laskos wrote:Checking the time used times the nps it indeed gives p=1 for LC0 and checking the depth, it is indeed depth=1 for SF9. All games from different positions.
Actually, if you check actual log you'd see that number of nodes is 2. 1 node is cached from previous move (root node) and 1 is one extra node that is evaluated. So it is indeed 1 node evaluated but not root node, therefore the strength is a higher then if it was really just the root node. But ok, that is just nitpicking.
Anyway, here is my result with 500 balanced openings from my testing collection set limited to 6 moves.
15 games were in total double (of 1000) and those were removed.
Laskos wrote:Checking the time used times the nps it indeed gives p=1 for LC0 and checking the depth, it is indeed depth=1 for SF9. All games from different positions.
Actually, if you check actual log you'd see that number of nodes is 2. 1 node is cached from previous move (root node) and 1 is one extra node that is evaluated. So it is indeed 1 node evaluated but not root node, therefore the strength is a higher then if it was really just the root node. But ok, that is just nitpicking.
Anyway, here is my result with 500 balanced openings from my testing collection set limited to 6 moves.
15 games were in total double (of 1000) and those were removed.
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 SF180418_depth=1 : 88.0 7.6 613.5 985 62.3% (+511,=205,-269)
2 Lczero_cpu_id150_p=1 : 0.0 7.6 371.5 985 37.7% (+269,=205,-511)
Games : 985 (finished)
White Wins : 401 (40.7 %)
Black Wins : 379 (38.5 %)
Draws : 205 (20.8 %)
Unfinished : 0
White Perf. : 51.1 %
Black Perf. : 48.9 %
ECO A = 183 Games (18.6 %)
ECO B = 286 Games (29.0 %)
ECO C = 208 Games (21.1 %)
ECO D = 177 Games (18.0 %)
ECO E = 131 Games (13.3 %)
Well that's some data at least, and it's consistent with what I got for Id 150:
Score of Id_150 vs sf_d1: 52 - 159 - 189 [0.366] 400
Elo difference: -95.26 +/- 24.78
But note that Id_153 scored a lot better than this in my tests. I did by the way use arg="--visits=1" now, so unless you have an issue with the Komodo opening book, we established that LCZero with its raw net plays in the same ballpark as Stockfish with depth 1.
mirek wrote:My GTX970 get's around 2k nps and it's 3.9 TFLOPs, while 1080Ti is 11 TFLOPs so I would expect more like 5k+ nps from 1080Ti (on current LC0 network size)
Also I was under impression that the 43 core TCEC machine was giving about 2-3k nps. I am not sure about it though, since you are reporting 2k nps for 16 cores, does it mean TCEC machine was actually pushing nps in like 5k+ range? Can someone comment on this?
You seems not to be reporting nps from opening and early middle game phase, but include it from ending.
If you follow Jjoshua2 twitch channel you can see what nps he is getting in real time with 1080Ti.
Regarding TCEC machine they don't use HT which would benefit them for at least 50% and as I said for such a configuration openBLAS really sucks. One should use IntelMKL library instead, i.e. recompile LC0 with it.
Still, if on current TCEC hardware it is dropping a piece, and if you gave L0 10x the CPU power it has on the TCEC system, it still seems to me likely to me that it would fail to find tactics that Stockfish does find, especially on that hardware.
A0 on the other hand outplayed Stockfish rather convincingly in the games that I saw. It was on big custom hardware, but I have to think it must have had a different/better algorithm too.
jdart wrote:Still, if on current TCEC hardware it is dropping a piece, and if you gave L0 10x the CPU power it has on the TCEC system, it still seems to me likely to me that it would fail to find tactics that Stockfish does find, especially on that hardware.
A0 on the other hand outplayed Stockfish rather convincingly in the games that I saw. It was on big custom hardware, but I have to think it must have had a different/better algorithm too.
--Jon
Well, remember it's playing with a rather old network that had bugged matches in part of its training... just now we're starting to see it regain (and surpass) its previous strenght. We'll see how it plays next season, wouldn't surprise me if it just sweeps the entire division 4.
Games Completed = 30 of 100 (Avg game length = 2.370 sec)
Settings = RR/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD:C:\LittleBlitzer\3moves_GM_04.epd(817)
Time = 195 sec elapsed, 455 sec remaining
1. LCZero CPU ID153 p=1 12.0/30 7-13-10 (L: m=13 t=0 i=0 a=0) (D: r=7 i=0 f=2 s=1 a=0) (tpm=33.3 d=6.09 nps=35)
2. SF9 depth=1 18.0/30 13-7-10 (L: m=7 t=0 i=0 a=0) (D: r=7 i=0 f=2 s=1 a=0) (tpm=10.9 d=1.00 nps=43940)
Can you repeat EXACTLY the same but with ID154 that in selfplay it gives +50 ELO compared to 153?
I want to see how the +50 ELO of selfplay are translated even in this short match.
PS: In order to do this you set Stockfish do a 1 ply search(how? i forgot about all these things) and for LC0 you just put in the parameters the "-p 1" or something else is needed also?
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....