Lc0 with latest test30 nets is vastly superior positionally

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
yanquis1972
Posts: 1762
Joined: Tue Jun 02, 2009 10:14 pm

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by yanquis1972 » Sun Feb 03, 2019 1:51 am

Kai, have you tested t40 yet? I used drastically different parameters (60s/position, abandon after 10 or more plies+1 if best move) and stopped after 95 positions, but I wouldn’t be surprised if it’s already well ahead of AB engines. I can only guess a lot of essential knowledge from test35 was retained, because it’s opening and especially endgame play already seems comparable to T30 as of not all that long ago.

User avatar
Laskos
Posts: 9440
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by Laskos » Sun Feb 03, 2019 2:09 pm

yanquis1972 wrote:
Sun Feb 03, 2019 1:51 am
Kai, have you tested t40 yet? I used drastically different parameters (60s/position, abandon after 10 or more plies+1 if best move) and stopped after 95 positions, but I wouldn’t be surprised if it’s already well ahead of AB engines. I can only guess a lot of essential knowledge from test35 was retained, because it’s opening and especially endgame play already seems comparable to T30 as of not all that long ago.
Yes, test40 well surpassed all AB engines positionally on openings, but it's still significantly behind best test30 nets.

Openings200beta7 repeated 5 times (total 1000 positions) from 1s to 2s per position:

Code: Select all

Lc0 v20.1 ID32819: 763/1000
Lc0 v20.1 ID32458: 712/1000
Lc0 v20.1 ID40627: 675/1000
Houdini 6.03:      558/1000
Komodo 12.3:       556/1000
Stockfish 10:      524/1000
Booot 6.3.1:       494/1000
Andscacs 0.95:     484/1000
Ethereal 11.00:    457/1000
Fire 7.1:          431/1000
Texel 1.07:        419/1000

On very tactical WAC200 repeated 5 times, in same conditions, test40 is already close to test30 late nets:

Lc0 v20.1 ID32819: 737/1000
Lc0 v20.1 ID40627: 732/1000

It is known that for a long time, test30 advanced more positionally than tactically (it might even has regressed tactically), maybe test40 will have much stronger tactically nets. There are hints that test40 nets might scale better than test30 to longer TC, as it discovers more solutions with longer testing time.

yanquis1972
Posts: 1762
Joined: Tue Jun 02, 2009 10:14 pm

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by yanquis1972 » Sun Feb 03, 2019 4:48 pm

i was wondering about scaling based on its CCC performance; hopefully that's the case. for whatever reason it's been my understanding that both t10 & t30 suffered in tactical tests over time (after first LR drop), so if that reverses it'd be huge as well.

fwiw i got 153/200 with my settings; logs attached. most solved positions are found in under 1s but a handful take 20-30s; nothing found from 32-60s. (GTX 1080)
Attachments
openings200T40.7z
(109.29 KiB) Downloaded 40 times

User avatar
M ANSARI
Posts: 3408
Joined: Thu Mar 16, 2006 6:10 pm

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by M ANSARI » Mon Feb 04, 2019 7:18 am

Yes I still don't understand why Lc0 can play so incredibly strong and yet fall for 1 move tactical blunders. Fortunately that happens quite rarely, but I think that is something that should and will probably be fixed very soon. I mean to be able to see a shallow tactic that is only 3 ply deep cannot use up too much hardware and at the moment, Lc0 misses some very simple tactics even if you put it on for hours on powerful hardware. This will probably be fixed very quickly and thus this will not be an issue anymore. In the meantime I think it would be a good idea to collect all these positions that Lc0 misses shallow tactics on and use that as a measure to see how things are progressing. I think if we collect about 100 or even 1000 positions like that, it will be very useful in fixing this issue. With any new network you run it through this test set to see if the tactical bug is fixed.

Uri Blass
Posts: 8586
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by Uri Blass » Mon Feb 04, 2019 8:17 am

M ANSARI wrote:
Mon Feb 04, 2019 7:18 am
Yes I still don't understand why Lc0 can play so incredibly strong and yet fall for 1 move tactical blunders. Fortunately that happens quite rarely, but I think that is something that should and will probably be fixed very soon. I mean to be able to see a shallow tactic that is only 3 ply deep cannot use up too much hardware and at the moment, Lc0 misses some very simple tactics even if you put it on for hours on powerful hardware. This will probably be fixed very quickly and thus this will not be an issue anymore. In the meantime I think it would be a good idea to collect all these positions that Lc0 misses shallow tactics on and use that as a measure to see how things are progressing. I think if we collect about 100 or even 1000 positions like that, it will be very useful in fixing this issue. With any new network you run it through this test set to see if the tactical bug is fixed.
The question is if LC0 miss tactics or has some hole in the evaluation when lc0 simply evaluates that the side that lose material as better.

Does LC0 see the problem immediately after the relevant 3 plies when immediately means even at 1 node per position?

yanquis1972
Posts: 1762
Joined: Tue Jun 02, 2009 10:14 pm

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by yanquis1972 » Mon Feb 04, 2019 2:30 pm

M ANSARI wrote:
Mon Feb 04, 2019 7:18 am
Yes I still don't understand why Lc0 can play so incredibly strong and yet fall for 1 move tactical blunders. Fortunately that happens quite rarely, but I think that is something that should and will probably be fixed very soon. I mean to be able to see a shallow tactic that is only 3 ply deep cannot use up too much hardware and at the moment, Lc0 misses some very simple tactics even if you put it on for hours on powerful hardware. This will probably be fixed very quickly and thus this will not be an issue anymore. In the meantime I think it would be a good idea to collect all these positions that Lc0 misses shallow tactics on and use that as a measure to see how things are progressing. I think if we collect about 100 or even 1000 positions like that, it will be very useful in fixing this issue. With any new network you run it through this test set to see if the tactical bug is fixed.
One thing I found out just a bit ago is that t40 still has temperature (randomness) throughout training, whereas alphazero cut it off after move 15 (correct me if I’m wrong about either/both). I had thought it was beyond question that deepminds approach would be copied with t40 based on this quote from their blog:

CPUCT used was 2.4, plus more details on the formula used
Deep Mind set temperature to 0 after 15 moves (from both players), ensuring only the best alternative was selected from then on. Leela had used a constant temperature throughout the game, trying to find something that gave diversity during the opening but not blunder too much in the end game. Temperature settings are the main culprit behind Leelas ... sub-optimal... end game play.
First Player Urgency, FPU, was revealed to be "assume any move you haven't evaluated as losing". Leela had until now tried to estimate this value based on the parent node's evaluation.


The paper launched a range of experiments lasting roughly until December 17th. We learned that:

"Policy sharpening is bad"
"AlphaZero parameters are good"
Changing parameters mid-run gives results that might be hard to interpret.
So if I’m correct unless t40 is clearly better than alphazero after its training the question of whether the correct approach here was used will still be open. I’m sure the leela team isn’t going by pure speculation but it seems so blatantly obvious to me that the first goal should be replicate A0 and only from there try to better it. It’s much too early in t40s training to draw conclusions but a couple of the endgames vs Houdini on CCC today make me skeptical. Perhaps without temp there’d be more direct play and, if nothing else, fewer moves in which to blunder or walk into a perpetual.

jp
Posts: 815
Joined: Mon Apr 23, 2018 5:54 am

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by jp » Mon Feb 04, 2019 4:54 pm

Laskos wrote:
Sun Feb 03, 2019 2:09 pm
yanquis1972 wrote:
Sun Feb 03, 2019 1:51 am
Kai, have you tested t40 yet?
Yes, test40 well surpassed all AB engines positionally on openings, but it's still significantly behind best test30 nets.
What about positionally not on openings? Maybe this is hard to test...
It's hard to get too excited about Lc0 being good at openings (the ones it 'likes').

User avatar
Laskos
Posts: 9440
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by Laskos » Mon Feb 04, 2019 5:41 pm

jp wrote:
Mon Feb 04, 2019 4:54 pm
Laskos wrote:
Sun Feb 03, 2019 2:09 pm
yanquis1972 wrote:
Sun Feb 03, 2019 1:51 am
Kai, have you tested t40 yet?
Yes, test40 well surpassed all AB engines positionally on openings, but it's still significantly behind best test30 nets.
What about positionally not on openings? Maybe this is hard to test...
It's hard to get too excited about Lc0 being good at openings (the ones it 'likes').
There are no large databases of human play middlegame positions, quiet position by quiet position. One would need a very strong human to build a test suite, and even then I am not sure of reliability of a suite built in such a way.
The openings are varied human openings, if Leela likes same openings, it's not intentional.

S.Taylor
Posts: 8338
Joined: Thu Mar 09, 2006 2:25 am
Location: Jerusalem Israel

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by S.Taylor » Mon Feb 04, 2019 8:46 pm

Uri Blass wrote:
Tue Jan 08, 2019 6:08 pm
Lion wrote:
Tue Jan 08, 2019 9:19 am
This doesn't surprise me although its good to have some data showing it.

As you said, its tactically that Lc0 has some weakness which gives it a little human like behaviour where it can loose in 25 moves tactical combination to SF dev and the following game show a positionnal master piece vs SF dev.

Its a welcome new era of computer chess...

rgds
Humans are weaker both tactically and positionally relative to chess engines.
If LC0's weaknesses in tactics give it a little human like behaviour then stockfish's weakness in positional play give it also a little human like behaviour.
I believe carlsen is very good at chess, and he says he doesn't like using chess engines because it makes a human less good.
Maybe if machines were just a little weaker, they would be as great as carlsen, and could teach us to play better.
REALLY, can't human chess books teach us better chess than what computers can play?

jp
Posts: 815
Joined: Mon Apr 23, 2018 5:54 am

Re: Lc0 with latest test30 nets is vastly superior positionally

Post by jp » Mon Feb 04, 2019 9:16 pm

Laskos wrote:
Mon Feb 04, 2019 5:41 pm
jp wrote:
Mon Feb 04, 2019 4:54 pm
Laskos wrote:
Sun Feb 03, 2019 2:09 pm
test40 well surpassed all AB engines positionally on openings
What about positionally not on openings? Maybe this is hard to test...
It's hard to get too excited about Lc0 being good at openings (the ones it 'likes').
There are no large databases of human play middlegame positions, quiet position by quiet position. One would need a very strong human to build a test suite, and even then I am not sure of reliability of a suite built in such a way.
The openings are varied human openings, if Leela likes same openings, it's not intentional.
Yep, I totally understand. (That's why I said maybe it'd be hard to test.) Given the way it's trained, I just don't think it's all that exciting being good at openings. It just seems to get weaker and weaker the longer the game goes on.

Post Reply