LCzero sacs a knight for nothing

Michel · Post by **Michel** » Sat Apr 21, 2018 7:56 pm

Laskos wrote:
Michel wrote:
If the move probabilties are supposed to single out "unclear" moves, then things could work. But I don't really see how the whole updating process would work towards identifying "unclear" moves.
Well we will have to wait to see how good (or bad) LC0 will eventually become at tactics. I am hoping that the majority of chess tactics actually depend on fairly standard patterns and that the NN (value head and policy head) can learn to recognize those patterns. This would be similar to how humans handle tactics.

Recent experiments (by Kai and Killiani) show that the policy network of LC0 is on par with SF at depth 1 (with quiescence search). This might mean that LC0 already statically recognizes some recapture patterns. Unfortunately it may also mean that SF simply prunes too much at depth 1 to be competitive...
I don't think SF9 depth=1 is excessively weak and misses much compared to lesser pruning, older engines. An older test showing depth=1 results in RR games from regular openings:
Code: Select all
Rank Name                        ELO   Games   Score   Draws 
   1 Komodo 8                     92    1000     63%     18% 
   2 Houdini 4                    78    1000     61%     26% 
   3 Hannibal 1.4                 56    1000     58%     23% 
   4 SF 14122014                  43    1000     56%     20% 
   5 Hiarcs 14                   -22    1000     47%     16% 
   6 Shredder 6PB &#40;2002&#41;        -302    1000     15%     14% 
Finished match
And IIRC the newer results are not that different.

Ah ok. Thanks!

Albert Silver · Post by **Albert Silver** » Sat Apr 21, 2018 8:21 pm

noobpwnftw wrote:For hybrid approach, I have an idea: couldn't we run some MCTS threads and make use of their simulations for root move ordering? Let's just say if we can feed one GPU with 2 CPU threads, then we have them running independently, from time to time we could reorder root moves by their eval scores scaled to win rate estimation, it may help with favoring moves that score a few centi-pawns less but more favorable in the NN's view.

How exactly is the search working? I have been trying to glean information, but had no luck. Is it using the tree search described by the Deep Mind paper, or is it something else entirely?

Uri Blass · Post by **Uri Blass** » Sat Apr 21, 2018 8:43 pm

noobpwnftw wrote:I think the approach is trying to summarize simulation results, which is good at handling general cases.

Those tactical lines are isolated incidents which it can never solve, while with minimax the search will develop that line deep and fast enough to see it.

It is not like we cannot write better evaluation code, but once a while it turns out that a simplification actually gains ELO because the search will run faster. LC0 is doing the opposite, and people seem to ignore the fact that it's evaluation is just slow, and blame the hardware for poor performance, even with A0's hardware you get some 80k NPS, convert that naively to CPU, 1 TPU ~= 10x 1080TI, one 1080TI ~= 32 CPU cores. so for A0 that's 4*10*32 = 1280 CPU cores.

Given that many CPU cores I'm sure I can get more than +100 ELO against a 64-core SF8 to get that result.

You assume that simplifications mean worse evaluation.

It is not clear and sometimes something that is more complicated is simply not better.

I believe that the problem of LC0 is not slower evaluation but a bad search algorithm and with the same speed of evaluation it can be significantly stronger with a better search algorithm.

Laskos · Post by **Laskos** » Sat Apr 21, 2018 9:11 pm

Werewolf wrote:
Laskos wrote:
George Tsavdaris wrote:
Laskos wrote: Yes, some sort of list. For ECM200.epd middlegame tactical suite (200 positions), analyzed for 20s/position. At this time control and my hardware, LC0 performs overall (Elo-wise) comparably to GreKo 6.5 2330 Elo CCRL standard A/B engine, which fares much better tactically (but much worse positionally). And it seems on this tactical middlegame suite ID124 is still the best of the nets.

Having watched around 100+ games of ID150+ and ~40 games of ID 156 versus 2100-3100 CCRL ELO opponents, i see that LC0(with that IDs as also with previous) completely outplays positionaly the other engines in many many cases, just to miss in at least 80% of them a tactical hit that either cost LC0 the win or even the draw and it loses.

LC0 is on par i dare to say with Stockfish dev in evaluation, but of course is ultra weak in tactics. It's even better than Stockfish in King attacks as i have seen. In placing its pieces to attack. Not in executing the attack since in that aspect is fails miserably due to bad tactics. The pattern recognition its NNs are offering it to see how to attack the King, seem to be extremely prosperous.

Meanwhile ID160 had a good jump in self-play ELO.
Yes, ID160 seems the strongest (at least in my test). Now I am checking its scaling, seems to scale nicely from 1s/move to 4s/move compared to similar in strength Jabba 1.0 (in my conditions).
I'd love to see your tactical results on ID 160.

My own tests are no longer totally negative, but are very mixed.

Below is the "easiest" position in my testsuite, which I've posted many times but LCZero ID 160 still cannot get in 20 minutes.

[pgn] 1.e4 e5 2.Nf3 d6 3.Nc3 g6 4.Bc4 Bg4 5.Ne5 [/pgn]

and yet, curiously it gets the following position which is MUCH harder for alpha betas.

[pgn] 1. d4 d5 2. c4 dxc4 3. Nc3 e5 4. e3 exd4 5. exd4 Nf6 6. Bxc4 Be7 7. Nf3 Nbd7 8. Bxf7+ [/pgn]

This position was a real challenge until around 1993-1995 because dedicateds thought 8.Ng5 was a simpler way to win (it's not) and 8.Bf7 requires seeing quite deeply in one line.

Yet LCZero 160 finds this in 12 seconds!

Interesting. I tested on tactical middlegame suite, 20s/position, and ID160 indeed improved significantly compared to earlier nets, but it is still far away compared to similar in strength in these conditions standard A/B engine:

Code: Select all

ID143&#58; 
ECM200 
score=63/200 &#91;averages on correct positions&#58; depth=12.8 time=2.56 nodes=791&#93; 

ID148&#58; 
ECM200 
score=67/200 &#91;averages on correct positions&#58; depth=11.9 time=1.84 nodes=567&#93; 

ID156&#58; 
ECM200 
score=68/200 &#91;averages on correct positions&#58; depth=12.6 time=2.44 nodes=944&#93; 

ID160
score=75/200 &#91;averages on correct positions&#58; depth=13.2 time=3.15 nodes=1107&#93;

============================================== 

Compare with a similar in strength standard A/B engine&#58; 


GreKo 6.5 &#40;2330 CCRL&#41;&#58; 
ECM200 
score=143/200 &#91;averages on correct positions&#58; depth=7.3 time=1.91 nodes=4718200&#93;

I also tested the scaling against standard A/B engine.

games at 1s/move:

Code: Select all

Games Completed = 200 of 200 &#40;Avg game length = 102.240 sec&#41;
Settings = Gauntlet/64MB/1000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 5426 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID160         	80.0/200	62-102-36  	&#40;L&#58; m=102 t=0 i=0 a=0&#41;	&#40;D&#58; r=26 i=6 f=3 s=0 a=1&#41;	&#40;tpm=953.0 d=12.46 nps=185&#41;
 2.  Jabba 1.0                	120.0/200	102-62-36  	&#40;L&#58; m=62 t=0 i=0 a=0&#41;	&#40;D&#58; r=26 i=6 f=3 s=0 a=1&#41;	&#40;tpm=802.5 d=9.30 nps=0&#41;

-70 Elo points for ID160

games at 4s/move (2 doublings):

Code: Select all

Games Completed = 200 of 200 &#40;Avg game length = 448.253 sec&#41;
Settings = Gauntlet/64MB/4000ms per move/M 9000cp for 30 moves, D 150 moves/EPD&#58;C&#58;\LittleBlitzer\3moves_GM_04.epd&#40;817&#41;
Time = 22769 sec elapsed, 0 sec remaining
 1.  LCZero CPU ID160         	116.5/200	96-63-41  	&#40;L&#58; m=63 t=0 i=0 a=0&#41;	&#40;D&#58; r=24 i=8 f=8 s=1 a=0&#41;	&#40;tpm=2904.2 d=14.53 nps=430&#41;
 2.  Jabba 1.0                	83.5/200	63-96-41  	&#40;L&#58; m=96 t=0 i=0 a=0&#41;	&#40;D&#58; r=24 i=8 f=8 s=1 a=0&#41;	&#40;tpm=3802.0 d=10.74 nps=0&#41;

+58 Elo points for ID160

Therefore, just a factor of 4 in time control (or hardware) (2 doublings) gives a boost of 128 Elo points compared to standard A/B engine. Or 64 Elo points per doubling. One can extrapolate:
On one CPU core at 4 s/move, from this match LC0 ID160 is about 2100 CCRL Elo. A top GPU, say Nvidia 1080 Ti, is faster by a factor of 25 compared to 1 CPU core. Tournament TC is about 40 longer than 4s/move. So, all in all, a total of a factor of 1000 time-hardware wise, or 10 doublings. So, ID160 on a top GPU and LTC would be about 2750 CCRL Elo. And on DeepMind hardware used in exhibition match, would be 3100+ CCRL Elo.

I am even beginning to suspect that they didn't release to the general consumer their products because on a normal i7 CPU and average GPU, the performance of their AlphaZero Go and Chess programs would be not that impressive, or even pretty lame (compared to the hype), especially in Chess.

Werewolf · Post by **Werewolf** » Sat Apr 21, 2018 9:37 pm

Laskos wrote:
Therefore, just a factor of 4 in time control (or hardware) (2 doublings) gives a boost of 128 Elo points compared to standard A/B engine. Or 64 Elo points per doubling. One can extrapolate:
On one CPU core at 4 s/move, from this match LC0 ID160 is about 2100 CCRL Elo. A top GPU, say Nvidia 1080 Ti, is faster by a factor of 25 compared to 1 CPU core. Tournament TC is about 40 longer than 4s/move. So, all in all, a total of a factor of 1000 time-hardware wise, or 10 doublings. So, ID160 on a top GPU and LTC would be about 2750 CCRL Elo. And on DeepMind hardware used in exhibition match, would be 3100+ CCRL Elo.

Very, very interesting.

New Volta cards are due out in Q3 this year and ID 160 could therefore be CCRL 2800 elo on one of those. Allowing for progress in the meantime...it could be very strong.

I think the tactics could slowly come. But I also think it will have "holes" in irregular positions which are not thematic.

Albert Silver · Post by **Albert Silver** » Sat Apr 21, 2018 9:38 pm

Laskos wrote:I am even beginning to suspect that they didn't release to the general consumer their products because on a normal i7 CPU and average GPU, the performance of their AlphaZero Go and Chess programs would be not that impressive, or even pretty lame (compared to the hype), especially in Chess.

I suspect they could not care less about releasing them as general consumer products, and nor was that ever even on the table.

duncan · Post by **duncan** » Sun Apr 22, 2018 12:06 am

Laskos wrote:
Therefore, just a factor of 4 in time control (or hardware) (2 doublings) gives a boost of 128 Elo points compared to standard A/B engine. Or 64 Elo points per doubling. One can extrapolate:
On one CPU core at 4 s/move, from this match LC0 ID160 is about 2100 CCRL Elo. A top GPU, say Nvidia 1080 Ti, is faster by a factor of 25 compared to 1 CPU core. Tournament TC is about 40 longer than 4s/move. So, all in all, a total of a factor of 1000 time-hardware wise, or 10 doublings. So, ID160 on a top GPU and LTC would be about 2750 CCRL Elo. And on DeepMind hardware used in exhibition match, would be 3100+ CCRL Elo.

so if alphazero elo is 3200, it is only 100 elo stronger than lc0 which means lc0 will soon stall ?

https://imgur.com/a/c04yc

Robert Pope · Post by **Robert Pope** » Sun Apr 22, 2018 1:23 am

duncan wrote:
Laskos wrote:
Therefore, just a factor of 4 in time control (or hardware) (2 doublings) gives a boost of 128 Elo points compared to standard A/B engine. Or 64 Elo points per doubling. One can extrapolate:
On one CPU core at 4 s/move, from this match LC0 ID160 is about 2100 CCRL Elo. A top GPU, say Nvidia 1080 Ti, is faster by a factor of 25 compared to 1 CPU core. Tournament TC is about 40 longer than 4s/move. So, all in all, a total of a factor of 1000 time-hardware wise, or 10 doublings. So, ID160 on a top GPU and LTC would be about 2750 CCRL Elo. And on DeepMind hardware used in exhibition match, would be 3100+ CCRL Elo.
so if alphazero elo is 3200, it is only 100 elo stronger than lc0 which means lc0 will soon stall ?

https://imgur.com/a/c04yc

Well, first off, that is a pretty big "if". Whenever you extrapolate upwards like this, there is a real risk that your assumptions won't continue to hold.

duncan · Post by **duncan** » Sun Apr 22, 2018 3:17 am

Robert Pope wrote:
duncan wrote:
Laskos wrote:
Therefore, just a factor of 4 in time control (or hardware) (2 doublings) gives a boost of 128 Elo points compared to standard A/B engine. Or 64 Elo points per doubling. One can extrapolate:
On one CPU core at 4 s/move, from this match LC0 ID160 is about 2100 CCRL Elo. A top GPU, say Nvidia 1080 Ti, is faster by a factor of 25 compared to 1 CPU core. Tournament TC is about 40 longer than 4s/move. So, all in all, a total of a factor of 1000 time-hardware wise, or 10 doublings. So, ID160 on a top GPU and LTC would be about 2750 CCRL Elo. And on DeepMind hardware used in exhibition match, would be 3100+ CCRL Elo.
so if alphazero elo is 3200, it is only 100 elo stronger than lc0 which means lc0 will soon stall ?

https://imgur.com/a/c04yc
Well, first off, that is a pretty big "if". Whenever you extrapolate upwards like this, there is a real risk that your assumptions won't continue to hold.

do you have an estimate for elo of alphazero ?

mirek · Post by **mirek** » Sun Apr 22, 2018 3:35 am

Robert Pope wrote:
duncan wrote:
Laskos wrote:
Therefore, just a factor of 4 in time control (or hardware) (2 doublings) gives a boost of 128 Elo points compared to standard A/B engine. Or 64 Elo points per doubling. One can extrapolate:
On one CPU core at 4 s/move, from this match LC0 ID160 is about 2100 CCRL Elo. A top GPU, say Nvidia 1080 Ti, is faster by a factor of 25 compared to 1 CPU core. Tournament TC is about 40 longer than 4s/move. So, all in all, a total of a factor of 1000 time-hardware wise, or 10 doublings. So, ID160 on a top GPU and LTC would be about 2750 CCRL Elo. And on DeepMind hardware used in exhibition match, would be 3100+ CCRL Elo.
so if alphazero elo is 3200, it is only 100 elo stronger than lc0 which means lc0 will soon stall ?

https://imgur.com/a/c04yc
Well, first off, that is a pretty big "if". Whenever you extrapolate upwards like this, there is a real risk that your assumptions won't continue to hold.

Exactly, and what's even more remarkable is that according to the A0 paper (figure 2) 4xTPUs will do about 80k payouts in 1s and at 80k playots A0 is only 100 - 150 elo weaker than at 1 min / move (5000k playots)

Also 1x 1080Ti (11 TFLOPS) vs 4xTPU (180 TFLOPS) means nps gets reduced to 4.8k nps Even if we assumed that the TPU is somehow more effective flops to flops by factor of 4x the resulting 1080Ti playouts would be still close to 80k per minute. Thus to me it seems quite convincing that A0 on 1080Ti would be with good confidence max 150 elo weaker at 1min / move compared to 4xTPU configuration. (and most likely not more than 100 elo weaker)

Since LC0 is tactically much weaker than A0 maybe it would scale better but unless someone measures it I would be quite skeptical that the difference could be 350+ elo (instead of something like 150 - 200 max) But to actually have data (not extrapolations) on lc0 scaling from e.g. 1000 to 5mil. playouts per move would be quite nice.

LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing

Re: LCzero sacs a knight for nothing