LCzero sacs a knight for nothing

carldaman · Post by **carldaman** » Fri Apr 20, 2018 11:02 pm

OneTrickPony wrote:
Not at all - I think you've raised some very interesting points! MCTS averaging does seem fundamentally mismatched to Chess. That's why I was so amazed A0 actually worked.
It seems it's not that good in go either as shown by recent Leela games it blunders tactics in go on regular basis. Go is a bit different as it's possible to kill humans there without much tactical awareness but winning against humans in a board game isn't exactly a high bar these days. In engine vs engine matches it's clear that MCTS in pure form isn't working.
There is some kind of component missing, for example now you can get to 100k playouts, discover that the line is a total disaster (losing by force) and it will take a longer while for it to prefer a different move.

I personally believe policy guided search and policy being trained on many games will work but the move selection itself will be more in line with alpha/beta in the end. Overall I am excited (and wish I had time away from programming engines for card games to participate). If anything Leela plays like a naive optimistic human 2200-2300ELO and that's really cool to have

Hi Piotr,

Slightly off-topic, but I'm curious, what could you recommend as a good program/AI for hold'em (or poker in general)?

Thanks,
CL

Laskos · Post by **Laskos** » Sat Apr 21, 2018 12:11 am

Michel wrote:
And yes, the self-play match games between networks on the main site is terrible and misleading.
I wonder why that is. Now that the matches are no longer used for gating and there is much more opening variety, the graph should in principle be correct on average.

So it seems that elo is not additive in this case.

One possible explanation might be that buggy engines do not satisfy the elo model. This was an observation by HGM in a slightly different context. Of course it is bit unclear how to define a buggy engine...

Weren't they during the bug (underpromoting) self-tested against another underpromoting almost identical engine? Even I would have seen a progress, with a book and fixed time. Then they AFAIK only slowly re-trained the net (shouldn't they just start all again from ID124 or even from "smallnet" ID122?). And then tested non-buggy engine against non-buggy engine, which slowly started to increase Queen promotion, thus progress again assured (on average)? They could or should have one drop, but due to slow changes , it is barely visible, they have many lower and higher all the way.

Probably one could imagine certain bugs (say rate of time losses) and a gedanken experiment, showing that three engines don't satisfy the additivity underl Elo model.

Laskos · Post by **Laskos** » Sat Apr 21, 2018 1:37 am

CMCanavessi wrote:My tests arrive to very similar numbers to yours, Kai (though I've tested 150 as the strongest, and have not tested any newer network... maybe 156 will be the next one). And yes, the self-play match games between networks on the main site is terrible and misleading.

My gauntlet numbers:

The calculated Elo:

It seems there is large jump outside error margins with ID156, maybe you can tell the devs. It is now outside error margins the best net.

Code: Select all

Games Completed = 200 of 200 &#40;Avg game length = 98.619 sec&#41; 
Settings = Gauntlet/64MB/1000ms per move/M 2500cp for 3 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41; 
Time = 5478 sec elapsed, 0 sec remaining 
 1.  LCZero CPU ID124            65.0/200   44-114-42     &#40;L&#58; m=114 t=0 i=0 a=0&#41;   &#40;D&#58; r=31 i=4 f=4 s=2 a=1&#41;   &#40;tpm=947.8 d=12.50 nps=202&#41; 
 2.  Jabba 1.0                   135.0/200   114-44-42     &#40;L&#58; m=44 t=0 i=0 a=0&#41;   &#40;D&#58; r=31 i=4 f=4 s=2 a=1&#41;   &#40;tpm=802.9 d=8.98 nps=0&#41; 


Games Completed = 200 of 200 &#40;Avg game length = 88.613 sec&#41; 
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41; 
Time = 5056 sec elapsed, 0 sec remaining 
 1.  LCZero CPU ID131            41.5/200   27-144-29     &#40;L&#58; m=144 t=0 i=0 a=0&#41;   &#40;D&#58; r=27 i=0 f=1 s=0 a=1&#41;   &#40;tpm=947.0 d=12.50 nps=126&#41; 
 2.  Jabba 1.0                   158.5/200   144-27-29     &#40;L&#58; m=27 t=0 i=0 a=0&#41;   &#40;D&#58; r=27 i=0 f=1 s=0 a=1&#41;   &#40;tpm=803.9 d=8.82 nps=0&#41; 


Games Completed = 200 of 200 &#40;Avg game length = 92.903 sec&#41; 
Settings = Gauntlet/64MB/1000ms per move/M 5500cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41; 
Time = 5264 sec elapsed, 0 sec remaining 
 1.  LCZero CPU ID139            39.5/200   22-143-35     &#40;L&#58; m=143 t=0 i=0 a=0&#41;   &#40;D&#58; r=25 i=6 f=3 s=1 a=0&#41;   &#40;tpm=948.6 d=12.52 nps=175&#41; 
 2.  Jabba 1.0                   160.5/200   143-22-35     &#40;L&#58; m=22 t=0 i=0 a=0&#41;   &#40;D&#58; r=25 i=6 f=3 s=1 a=0&#41;   &#40;tpm=803.7 d=8.73 nps=0&#41; 
  
  
Games Completed = 200 of 200 &#40;Avg game length = 91.517 sec&#41; 
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41; 
Time = 4855 sec elapsed, 0 sec remaining 
 1.  LCZero CPU ID147            45.0/200   35-145-20     &#40;L&#58; m=145 t=0 i=0 a=0&#41;   &#40;D&#58; r=14 i=4 f=2 s=0 a=0&#41;   &#40;tpm=945.0 d=12.48 nps=178&#41; 
 2.  Jabba 1.0                   155.0/200   145-35-20     &#40;L&#58; m=35 t=0 i=0 a=0&#41;   &#40;D&#58; r=14 i=4 f=2 s=0 a=0&#41;   &#40;tpm=804.0 d=8.87 nps=0&#41; 
  
  
Games Completed = 200 of 200 &#40;Avg game length = 97.840 sec&#41; 
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41; 
Time = 5223 sec elapsed, 0 sec remaining 
 1.  LCZero CPU ID152            58.0/200   42-126-32     &#40;L&#58; m=126 t=0 i=0 a=0&#41;   &#40;D&#58; r=24 i=3 f=3 s=1 a=1&#41;   &#40;tpm=948.7 d=12.53 nps=242&#41; 
 2.  Jabba 1.0                   142.0/200   126-42-32     &#40;L&#58; m=42 t=0 i=0 a=0&#41;   &#40;D&#58; r=24 i=3 f=3 s=1 a=1&#41;   &#40;tpm=803.2 d=9.03 nps=0&#41; 


Games Completed = 200 of 200 &#40;Avg game length = 103.383 sec&#41; 
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41; 
Time = 5473 sec elapsed, 0 sec remaining 
 1.  LCZero CPU ID154            63.0/200   40-114-46     &#40;L&#58; m=114 t=0 i=0 a=0&#41;   &#40;D&#58; r=34 i=4 f=6 s=0 a=2&#41;   &#40;tpm=952.3 d=12.49 nps=282&#41; 
 2.  Jabba 1.0                   137.0/200   114-40-46     &#40;L&#58; m=40 t=0 i=0 a=0&#41;   &#40;D&#58; r=34 i=4 f=6 s=0 a=2&#41;   &#40;tpm=804.0 d=9.15 nps=0&#41;




Games Completed = 200 of 200 &#40;Avg game length = 97.010 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 5105 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID156         	79.0/200	62-104-34  	&#40;L&#58; m=104 t=0 i=0 a=0&#41;	&#40;D&#58; r=28 i=3 f=2 s=0 a=1&#41;	&#40;tpm=951.2 d=12.52 nps=171&#41;
 2.  Jabba 1.0                	121.0/200	104-62-34  	&#40;L&#58; m=62 t=0 i=0 a=0&#41;	&#40;D&#58; r=28 i=3 f=2 s=0 a=1&#41;	&#40;tpm=803.3 d=9.01 nps=0&#41;

I now want to see on test suites (tactical and positional) what happened.

CMCanavessi · Post by **CMCanavessi** » Sat Apr 21, 2018 3:45 am

I'm running the gauntlet right now with 156, and it's proving to be the best network so far, but not by much (early estimates of around 40 elo). Only 30% of the games played, so it has some way to go.

Jhoravi · Post by **Jhoravi** » Sat Apr 21, 2018 6:00 am

Is Leela really regaining the lost knowledge since the fix of the promotion bug?

I suspect that the newer training just skips those positions leading to the promotion instead of retraining them because the buggy network already tells the search that it's loosing.

As a result, the succeeding networks are seemingly getting better and better over the previous buggy one because they don't have those positions to deal with each other.

But when faced against a network before the bug like ID125 it's not much better.

Just my humble theory.

CMCanavessi · Post by **CMCanavessi** » Sat Apr 21, 2018 6:13 am

Yes it is, it's pretty obvious by watching the matches live.

Laskos · Post by **Laskos** » Sat Apr 21, 2018 10:30 am

CMCanavessi wrote:I'm running the gauntlet right now with 156, and it's proving to be the best network so far, but not by much (early estimates of around 40 elo). Only 30% of the games played, so it has some way to go.

Our error margins are large with only 200 games, but I can confirm that ID159 comes close to that high result of ID156, so it was not a 2.5 standard deviations fluke. By now, we both can confirm that the new nets are the best ones, and will probably get better an better.

OTOH, I couldn't see a significant jump on both opening positional and middlegame tactical suites, just a small improvement over say ID147. I don't know why, maybe some other aspects of the gameplay improved, say endgames.

Werewolf · Post by **Werewolf** » Sat Apr 21, 2018 11:02 am

Laskos wrote:
CMCanavessi wrote:I'm running the gauntlet right now with 156, and it's proving to be the best network so far, but not by much (early estimates of around 40 elo). Only 30% of the games played, so it has some way to go.
Our error margins are large with only 200 games, but I can confirm that ID159 comes close to that high result of ID156, so it was not a 2.5 standard deviations fluke. By now, we both can confirm that the new nets are the best ones, and will probably get better an better.

OTOH, I couldn't see a significant jump on both opening positional and middlegame tactical suites, just a small improvement over say ID147. I don't know why, maybe some other aspects of the gameplay improved, say endgames.

Do you have a list of the results of different versions of LCZero in tactics?

Laskos · Post by **Laskos** » Sat Apr 21, 2018 11:15 am

Werewolf wrote:
Laskos wrote:
CMCanavessi wrote:I'm running the gauntlet right now with 156, and it's proving to be the best network so far, but not by much (early estimates of around 40 elo). Only 30% of the games played, so it has some way to go.
Our error margins are large with only 200 games, but I can confirm that ID159 comes close to that high result of ID156, so it was not a 2.5 standard deviations fluke. By now, we both can confirm that the new nets are the best ones, and will probably get better an better.

OTOH, I couldn't see a significant jump on both opening positional and middlegame tactical suites, just a small improvement over say ID147. I don't know why, maybe some other aspects of the gameplay improved, say endgames.
Do you have a list of the results of different versions of LCZero in tactics?

Yes, some sort of list. For ECM200.epd middlegame tactical suite (200 positions), analyzed for 20s/position. At this time control and my hardware, LC0 performs overall (Elo-wise) comparably to GreKo 6.5 2330 Elo CCRL standard A/B engine, which fares much better tactically (but much worse positionally). And it seems on this tactical middlegame suite ID124 is still the best of the nets.

Here is a short list:

Code: Select all

ID124&#58;
ECM200
score=75/200 &#91;averages on correct positions&#58; depth=13.4 time=2.92 nodes=930&#93;

ID143&#58;
ECM200
score=63/200 &#91;averages on correct positions&#58; depth=12.8 time=2.56 nodes=791&#93;

ID148&#58;
ECM200
score=67/200 &#91;averages on correct positions&#58; depth=11.9 time=1.84 nodes=567&#93;

ID156&#58;
ECM200
score=68/200 &#91;averages on correct positions&#58; depth=12.6 time=2.44 nodes=944&#93;

==============================================

Compare with a similar in strength standard A/B engine&#58;


GreKo 6.5 &#40;2330 CCRL&#41;&#58;
ECM200
score=143/200 &#91;averages on correct positions&#58; depth=7.3 time=1.91 nodes=4718200&#93;

syzygy · Post by **syzygy** » Sat Apr 21, 2018 11:43 am

gladius wrote:But the entire process is designed to have it solve tactics. The policies are trained to match the output of an 800 node search, so it's being trained to take the tactics into account. Even modern chess evaluation features do this (with eg. huge penalties for queen under threat, and restricting queen mobility to "safe" squares).

Don't you think that the network can learn to predict tactics?

What I don't quite get is what the move probabilities are supposed to stand for.

If the move probabilities are supposed to single out "good" moves, then a move that simply looks bad but happens to have a deep (or even shallow) tactic behind it would score bad and would not guide the search to discover the tactic.

If the move probabilties are supposed to single out "unclear" moves, then things could work. But I don't really see how the whole updating process would work towards identifying "unclear" moves.

LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing