LCzero sacs a knight for nothing

Milos · Post by **Milos** » Fri Apr 20, 2018 12:25 pm

jkiliani wrote:
Milos wrote:
jkiliani wrote:Not sure what you're talking about here. Leela (as in Leela 0.11) certainly has tactical weaknesses, but that's an MCTS engine with a neural net, not a pure NN engine like Leela Zero.

And while Leela Zero may still have some tactical vulnerabilities, they're getting really hard to exploit, certainly for humans.

Agreed that policy guided search has some similarity to Alpha-Beta on a mature, larger neural net.
Cut the crap. Return here when LC0 network alone (single playout) is able to beat SF depth 1 search.
I'm pretty confident that will not happen any time soon, especially if you don't increase the size of NN.
Just tested exactly that, with Id 150, against Stockfish with fixed depth 1:

./cutechess-cli -rounds 400 -tournament round-robin -concurrency 2 -pgnout results_tuning.pgn \
-engine name=Id_152 cmd=lczero_tunenew2 arg="--threads=1" arg="--weights=$WDR/weights_152.txt" arg="--noponder" nodes=1 tc=inf\
-engine name=sf_d1 cmd=stockfish_x86-64 option.Threads=1 depth=1 tc=inf \
-each proto=uci

Result: 1-1-0. Obviously I ran more games than two, but it turns out that both Stockfish and Lc0 are deterministic at these settings, so the end result was 200-200-0.

Unless you can come up with a way to make Stockfish non-deterministic at fixed depth 1, I consider this point now proven.

Edit: Id 150 actually wins with 1-0-1, Id 129 also scores equal with 1-1-0, only Id 125 loses with 0-1-1.

So in total, we have Lc0 performing comparably to Stockfish Depth 1. Any further questions?

Btw. you don't even know how to test only NN eval of LC0 do you?
It is not enough to try to force through cutecheescli nodes=1 because LC0 would ignore it. You specifically have to use "-p 1" argument and even then, on a single thread it would still give you 2 playouts not one. But that's the best you can get without changing actual LC0 code.

Milos · Post by **Milos** » Fri Apr 20, 2018 12:30 pm

jkiliani wrote:
Milos wrote:
Michel wrote:
Unless you can come up with a way to make Stockfish non-deterministic at fixed depth 1, I consider this point now proven.
Use a book.
Or your precious dirichlet noise parameter . Do you know it even exists?
I know more than you think, and fyi --noise doesn't add enough variability to alter the PV in almost all cases by now. I tried the match with noise, result was exactly the same.

Your turn to produce data now.

Just use a book for god sake. You never tested engines in your life?
You know whenever ppl test on a single core they always use what is called an opening book or even better a collection of openings, almost as a rule with reversed colors

.

jkiliani · Post by **jkiliani** » Fri Apr 20, 2018 12:34 pm

Milos wrote:
jkiliani wrote:
Milos wrote:
Michel wrote:
Unless you can come up with a way to make Stockfish non-deterministic at fixed depth 1, I consider this point now proven.
Use a book.
Or your precious dirichlet noise parameter . Do you know it even exists?
I know more than you think, and fyi --noise doesn't add enough variability to alter the PV in almost all cases by now. I tried the match with noise, result was exactly the same.

Your turn to produce data now.
Just use a book for god sake. You never tested engines in your life?
You know whenever ppl test on a single core they always use what is called an opening book or even better a collection of openings, almost as a rule with reversed colors .

Then why don't you do it if you're so experienced? I only hear "this will never work" from you, but see no test results.

Michel · Post by **Michel** » Fri Apr 20, 2018 12:59 pm

Then why don't you do it if you're so experienced? I only hear "this will never work" from you, but see no test results.

Everyone can test whatever he/she wants but considering a claim "proven" after 2 games is just nonsense.

The only thing we can say is that the issue is still open and that no one has provided any credible evidence yet in either direction.

jkiliani · Post by **jkiliani** » Fri Apr 20, 2018 1:12 pm

Michel wrote:
Then why don't you do it if you're so experienced? I only hear "this will never work" from you, but see no test results.
Everyone can test whatever he/she wants but considering a claim "proven" after 2 games is just nonsense.

The only thing we can say is that the issue is still open and that no one has provided any credible evidence yet in either direction.

I agree with your sentiment, but had problems with Milos' way of only arguing without actually contributing test data.

I now installed the Komodo opening book to use with Stockfish at depth 1. Results now: Score of Id_152 vs sf_d1: 92 - 92 - 216 [0.500] 400
Elo difference: 0.00 +/- 23.09

I tested the independence of games with cat results_15.pgn | grep "PlyCount", and found that there may a a few duplicate games in there but it's clearly not the majority.
Two more runs at identical settings gave

Score of Id_152 vs sf_d1: 75 - 94 - 231 [0.476] 400
Elo difference: -16.52 +/- 22.12

Score of Id_152 vs sf_d1: 86 - 95 - 219 [0.489] 400
Elo difference: -7.82 +/- 22.90

Stockfish might have a very narrow margin, depending on how the book is randomly sampled. I hope that's sufficient to prove that the raw net of Lc0 can play at the level of Stockfish depth 1 to everyone's satisfaction.

mirek · Post by **mirek** » Fri Apr 20, 2018 1:17 pm

Milos wrote: LC0 on GTX 1080Ti gets around 2.5knps on average.
LC0 on my (old Sandybridge) 16 cores machine running on 32 threads gets 2knps on average.

My GTX970 get's around 2k nps and it's 3.9 TFLOPs, while 1080Ti is 11 TFLOPs so I would expect more like 5k+ nps from 1080Ti (on current LC0 network size)

Also I was under impression that the 43 core TCEC machine was giving about 2-3k nps. I am not sure about it though, since you are reporting 2k nps for 16 cores, does it mean TCEC machine was actually pushing nps in like 5k+ range? Can someone comment on this?

jkiliani · Post by **jkiliani** » Fri Apr 20, 2018 1:31 pm

Last results were with the Komodo book and bookdepth=10. With bookdepth=15, the result not surprisingly tilts a bit more in Stockfish's direction:

Score of Id_152 vs sf_d1: 221 - 294 - 485 [0.464] 1000
Elo difference: -25.41 +/- 15.44

Laskos · Post by **Laskos** » Fri Apr 20, 2018 1:36 pm

jkiliani wrote:
Michel wrote:
Then why don't you do it if you're so experienced? I only hear "this will never work" from you, but see no test results.
Everyone can test whatever he/she wants but considering a claim "proven" after 2 games is just nonsense.

The only thing we can say is that the issue is still open and that no one has provided any credible evidence yet in either direction.
I agree with your sentiment, but had problems with Milos' way of only arguing without actually contributing test data.

I now installed the Komodo opening book to use with Stockfish at depth 1. Results now: Score of Id_152 vs sf_d1: 92 - 92 - 216 [0.500] 400
Elo difference: 0.00 +/- 23.09

I tested the independence of games with cat results_15.pgn | grep "PlyCount", and found that there may a a few duplicate games in there but it's clearly not the majority.
Two more runs at identical settings gave

Score of Id_152 vs sf_d1: 75 - 94 - 231 [0.476] 400
Elo difference: -16.52 +/- 22.12

Score of Id_152 vs sf_d1: 86 - 95 - 219 [0.489] 400
Elo difference: -7.82 +/- 22.90

Stockfish might have a very narrow margin, depending on how the book is randomly sampled. I hope that's sufficient to prove that the raw net of Lc0 can play at the level of Stockfish depth 1 to everyone's satisfaction.

Can somehow confirm with LittleBlitzer and InBetween, from 3-mover balanced book:

Code: Select all

Games Completed = 30 of 100 &#40;Avg game length = 2.370 sec&#41;
Settings = RR/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 195 sec elapsed, 455 sec remaining
 1.  LCZero CPU ID153 p=1     	12.0/30	7-13-10  	&#40;L&#58; m=13 t=0 i=0 a=0&#41;	&#40;D&#58; r=7 i=0 f=2 s=1 a=0&#41;	&#40;tpm=33.3 d=6.09 nps=35&#41;
 2.  SF9 depth=1              	18.0/30	13-7-10  	&#40;L&#58; m=7 t=0 i=0 a=0&#41;	&#40;D&#58; r=7 i=0 f=2 s=1 a=0&#41;	&#40;tpm=10.9 d=1.00 nps=43940&#41;

Checking the time used times the nps it indeed gives p=1 for LC0 and checking the depth, it is indeed depth=1 for SF9. All games from different positions.

jkiliani · Post by **jkiliani** » Fri Apr 20, 2018 1:49 pm

jkiliani wrote:Last results were with the Komodo book and bookdepth=10. With bookdepth=15, the result not surprisingly tilts a bit more in Stockfish's direction:

Score of Id_152 vs sf_d1: 221 - 294 - 485 [0.464] 1000
Elo difference: -25.41 +/- 15.44

Newest net actually wins here (still on bookdepth=15):

Score of Id_153 vs sf_d1: 176 - 122 - 102 [0.568] 400
Elo difference: 47.19 +/- 29.60

noobpwnftw · Post by **noobpwnftw** » Fri Apr 20, 2018 2:05 pm

Latest SFdev scaling test results:
32 threads vs 384 threads w/ HT @ 60+0.6, 2GB hash, still ongoing:

ELO: -129.80 +-36.8 (95%) LOS: 0.0%
Total: 112 W: 1 L: 41 D: 70

So where was that a/b engines can't scale well coming from?

Apparently with such hardware odds, one does not need a lame config on SF8 to get 0 loss in 100 games.

LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing