LC0 on the WAC test suite, bad results

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: LC0 on the WAC test suite, bad results

Post by CMCanavessi »

Looks like the new (still under testing and ongoing parallel training) 192x15 network has made a HUGE jump in tactics strenght.

I'll let the graph speak for itself:

Image
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
Uri Blass
Posts: 10279
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: LC0 on the WAC test suite, bad results

Post by Uri Blass »

syzygy wrote:
Uri Blass wrote:
carldaman wrote:
Albert Silver wrote:
Damir wrote:Leela's King Safety and Tactical ability are very weak. :( :( . This must be improved. :o :o
I find Leela's understanding of king safety very advanced actually. This is seen not only in her impressive attacking ability, but the evaluations she displays when one king is exposed or stuck in the center.
+1

And the tactics are "weak" only relative to other strong engines. Human masters have a tough time outplaying Leela tactically. This means its tactics are really not that weak.
No
It does not mean it.

The way to test if it weak tactically relative to humans is not by games against humans.

The way to test it is by competition with humans in solving chess tactical problems.

Of course you need to define what is tactics and you can say that WAC is not the right test to test tactics because tactics is not only sacrifices.
If you define "tactically weak relative to humans" as "human masters can easily outplay it tactically", then Carl's observation that "human masters have a tough time outplaying Leela tactically" shows that it is not tactically weak relative to humans.

And I think that definition makes a lot of sense.

As to tactical problems, if you give me a position and tell me that there is some tactic to be found, I am more likely to find it than when I would encounter the same position in a game.
winning in chess does not mean being better in chess at everything.

I understand your objection to tactical test suites but I also think that you can win all games and still be weaker than your opponents in some chess knowledge and a player who do not know how to mate with bishop and knight can beat a player who know it but never get the right positions in games.

The question is how you define tactics and how do you measure tactical strength.
Without answering this question a discussion about the question if LCzero is weaker or stronger than humans in tactics is meaningless.
peter
Posts: 3185
Joined: Sat Feb 16, 2008 7:38 am
Full name: Peter Martan

Re: LC0 on the WAC test suite, bad results

Post by peter »

Uri Blass wrote: The question is how you define tactics and how do you measure tactical strength.
Without answering this question a discussion about the question if LCzero is weaker or stronger than humans in tactics is meaningless.
As well as the discussion about it being weaker or stronger tactically than other engines.

What you can say for sure is, if the performance of LC0 is weaker or stronger in a certain test suite, and that's what the thread is about.

I doubt most opening test suites as for positional testing even more than most of the tactical best moves tests.
Yet I'd like to see the performance of LC0 in an opening test suite full of gambit openings e.g.

I wouldn't say it was weaker or stronger then positionally as human masters or as other engines because of its performance in a single certain opening test suite of course, but that's what's to be read now and then here and otherwhere too. It's told it's rather strong positionally because of results in opening test suites.
So what?
Seeing it play Gambit lines against other engines isn't about tactics only for sure but it (she? :)) doesn't show positional strength to me at all in such games neither till now.

Celo (Computer- Elo) as well as Elo is a matter of how to measure and rate.

A matter of opponents, openings and other test positions, at least that's what LC0-development showed us again and again very impressively.
Peter.
mirek
Posts: 52
Joined: Sat Mar 24, 2018 4:18 pm

Re: LC0 on the WAC test suite, bad results

Post by mirek »

peter wrote:
Uri Blass wrote: The question is how you define tactics and how do you measure tactical strength.
Without answering this question a discussion about the question if LCzero is weaker or stronger than humans in tactics is meaningless.
As well as the discussion about it being weaker or stronger tactically than other engines.
Indeed. Here someone posted results of Nolot test suite, which one could also consider to be tactical puzzles.

https://groups.google.com/forum/#!topic ... 3EGJQrH-Zw
https://en.wikipedia.org/wiki/Nolot

From what I have tried so far with engines like Fritz11, Stockfish 1.3.1 , Crafty 23.05x64... it actually seems that lczero performs better in Nolot test suite than those A/B engines. So I would conclude that while lczero definitely has huge tactical blind-spots, saying that it simply sux in tactics would be too big of an oversimplification. Because by using patterns that it already learned it can play brilliant tactics. While if there are patterns required it doesn't know yet, lczero can be bested in even by the very basic A/B exhaustive search engines (well below lczeros strength) - especially when it's just simple shallow depth tactics.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: LC0 on the WAC test suite, bad results

Post by Laskos »

mirek wrote:
peter wrote:
Uri Blass wrote: The question is how you define tactics and how do you measure tactical strength.
Without answering this question a discussion about the question if LCzero is weaker or stronger than humans in tactics is meaningless.
As well as the discussion about it being weaker or stronger tactically than other engines.
Indeed. Here someone posted results of Nolot test suite, which one could also consider to be tactical puzzles.

https://groups.google.com/forum/#!topic ... 3EGJQrH-Zw
https://en.wikipedia.org/wiki/Nolot

From what I have tried so far with engines like Fritz11, Stockfish 1.3.1 , Crafty 23.05x64... it actually seems that lczero performs better in Nolot test suite than those A/B engines. So I would conclude that while lczero definitely has huge tactical blind-spots, saying that it simply sux in tactics would be too big of an oversimplification. Because by using patterns that it already learned it can play brilliant tactics. While if there are patterns required it doesn't know yet, lczero can be bested in even by the very basic A/B exhaustive search engines (well below lczeros strength) - especially when it's just simple shallow depth tactics.
Confirmed with WM-Test testsuite, 100 positions, which can be called tactical, and are selected from games of strong humans.
LC0 comes significantly above Fruit 2.1 at 20s/positions on CPU.
LC0 seems to know selectively some tactics, probably those types most encountered in its self-games.