LCZero update

jkiliani · Post by **jkiliani** » Fri Mar 16, 2018 3:08 pm

Uri Blass wrote:
Leo wrote:Its logical to try LCZero to see if it works. I am skeptical but not a pessimist.
I look at the games and I see a lot of stupid one ply blunder that lose material.
I do not know what they do but if after many thousands of games it plays like that then I do not believe in it.

Just look up Fig.1 in the AlphaZero paper. Deepmind needed around 22k training steps, equivalent to around 1.3 million training games, to get to 800 Elo (which is roughly where LCZero is now). LCZero achieved the same with currently 360k games.

Reinforcement learning needs a lot of data to learn good strategies, that's always been the main problem with it, especially if you start from random. Just give it time.

Uri Blass · Post by **Uri Blass** » Fri Mar 16, 2018 3:23 pm

CMCanavessi wrote:
Uri Blass wrote:
Leo wrote:Its logical to try LCZero to see if it works. I am skeptical but not a pessimist.
I look at the games and I see a lot of stupid one ply blunder that lose material.
I do not know what they do but if after many thousands of games it plays like that then I do not believe in it.
Of course it will make mistakes like that, the learning phase started 1 week ago. You can't expect it to become the new stockfish in 1 week. It has gained like 700 elo since it started training...

I did not expect it to become the new stockfish in one week but I expected it at least to be better than the best humans after one week and I believe that there are intelligent adults who are new to chess and can learn not to give pieces by one ply mistakes after one week.

jkiliani · Post by **jkiliani** » Fri Mar 16, 2018 3:30 pm

Uri Blass wrote:I did not expect it to become the new stockfish in one week but I expected it at least to be better than the best humans after one week and I believe that there are intelligent adults who are new to chess and can learn not to give pieces by one ply mistakes after one week.

Possible, if you have the resources of a big company, and bug-free code before you start the training process. Since LCZero is entirely a volunteer project with the compute generated by enthusiasts with no institutional backing, that just won't work for us.

About humans learning to not give away pieces for free in one week, you're misunderstanding what the "Zero" approach actually means. At no point do we tell the engine that specific pieces have a certain value, and it has to protect them to win. It has to figure that out by itself, from the statistical observation that whoever loses more and stronger pieces, is more likely to lose the game. Any human learning chess will be told by the one who teaches him what the approximate value of pieces is.

Uri Blass · Post by **Uri Blass** » Fri Mar 16, 2018 3:45 pm

jkiliani wrote:
Milos wrote:
jkiliani wrote:
Milos wrote:
CMCanavessi wrote:So from Gen 6 to Gen 8, LCZero got +190 elo, in 2 days. Imagine if the project catches up and more people help train it. It would blow our minds. I believe that Gen 10-12 will already be around 1000 elo, by the weekend.
Like most of the ppl you don't quite understand how DCNN training works.
Sooner than later you'll hit a plateau and the training will saturate. The higher you go, the harder it becomes to improve DCNN.
While you are correct in principle, you're overlooking that there are ways to deal with that, as Leela Zero already demonstrated. Once you reach a plateau, you can simply use the self-play games you already have to bootstrap a larger neural net, which will usually achieve a significant initial jump and be able to train to a higher level. Once you stall again, rinse and repeat.
In some cases larger net would help, in others not, which is most probably the case with A0.
Training larger net requires much more resources for self-play games. In case of LCZero, they are already struggling with relatively small net as it is now to get a decent number of games.
So enthusiasm regarding LCZero is pretty much in vain.
A larger net always helps, since more weights can get better representation of the search output, and a ResNet also doesn't suffer from vanishing gradients anymore like earlier network architectures. Leela Zero got a big boost with every network expansion, and you can't tell me that somehow chess is so fundamentally different that the same wouldn't apply here.

Sure, self-play speed will go down with every network expansion, but it will also go up a lot as the project gains traction and more people contribute. The stronger it gets, the more publicity will there be.

Enthusiasm for LCZero is very well founded, and shared by everyone who read the Deepmind papers and understood them at least at some level. No-one is forcing you to contribute, feel free to watch the downfall of the Alpha-Beta engines from the sidelines.

I do not believe in the downfall of the alpha-beta.
I believe that it is possible to achieve better level by improving alpha-beta instead of trying to replace it.

I believe that basically if you have stockfish play against itself as part of the evaluation function of an engine you may get stronger engine than stockfish at long time control with correct search rules without more knowledge in the evaluation.

The idea is that the new engine can simply call stockfish play against itself at fixed depth d1(n,i) for ply i of the game (1<=i<d2(n,i))
and use the sequence of d2 evaluations to calculate the evaluation function when it searches iteration n.

d1 and d2 are going to be bigger when n is bigger.

I believe that with right tuning of d1 and d2 you should get a stronger engine than stockfish(at small depthes of course d2(n,i)=0 so the engine is the same as stockfish because it does not get play against itself and use the static evaluation function.

I believe that this idea can help to detect fortress at huge depths because when the depth is big enough playing against yourself leads to a draw.

Note that first step should be to try to replace the evaluation by searching to depth 1 for 1 ply(at iteration bigger than some n) to see if it helps.

I believe it should help when n is big enough and you may lose at most constant speed difference between evaluation and searching to depth 1 when you earn smarter evaluation(for example there may be cases when the program does not see a stalemate and think one side has a big advantage because the stalemate is evaluated wrong and the search find ways to delay the stalemate and if you do one ply search it simply does not happen.

Uri Blass · Post by **Uri Blass** » Fri Mar 16, 2018 3:54 pm

Here is an example to what I mean

[D]7k/7P/8/8/6p1/8/2B4P/K7 b - - 0 1

stockfish without tablebases cannot see a draw score because the search always get at some nodes g3(forced) hxg3 without understanding that hxg3 is a stalemate.

Thinking about it here one ply search will not help to see the draw because one ply search may stop one ply earlier but still I feel more accurate evaluation can help even if it is only one ply search instead of static evaluation.

It can help in pruning decisions because mate evaluation means that you do not need to search.

jkiliani · Post by **jkiliani** » Fri Mar 16, 2018 4:01 pm

Uri Blass wrote:I do not believe in the downfall of the alpha-beta.
I believe that it is possible to achieve better level by improving alpha-beta instead of trying to replace it.

You're probably right that outright replacement of Alpha-Beta won't happen, but what will definitely happen is widespread adoption of deep neural networks as part of the search, by combining it with Alpha-Beta in some way. A common mindset in the chess community seems to be that MCTS+NN could never be better than Alpha-Beta, mainly since Alpha-Beta worked best in the past. The success of AlphaZero showed that changed. The future belongs to hybrid engines combining traditional search methods with neural nets trained by self-play.

Milos · Post by **Milos** » Fri Mar 16, 2018 4:06 pm

jkiliani wrote:The success of AlphaZero showed that changed. The future belongs to hybrid engines combining traditional search methods with neural nets trained by self-play.

The "success" of AlphaZero only created a delusion and suddenly many ppl started behaving as if they had a crystal ball which obviously they don't.

koedem · Post by **koedem** » Fri Mar 16, 2018 4:10 pm

But human adults aren't zero knowledge. They don't start with random play e.g. every human even if you just show him the game will deduct that it's better to have more pieces than to have fewer. LCZero obviously doesn't know that. It needs to more or less try out "Is it good to give away a pawn on a2? No?! What about on b2? No?!" etc.
Obviously that will take forever but once it learned all that it knows not only that it's a bad idea but also exactly why which leads to it especially also knowing when it can be a good idea. (and then play brilliant long term sacrifices like in some of the games against Stockfish)

As for the question whether alpha-beta will die out. I would guess eventually yes but we probably don't have the hardware yet to get there. Even A0 on super fast special Google hardware was inferior to SF at very short time controls. Having slow and not optimized hardware then should mean that even very well tuned LCZeros on much larger nets than the current one will probably only beat SF at huge time controls. But I wouldn't be surprised if over the years when hardware get's faster and possible TPUs are cheap enough to use them, alpha-beta slowly got out of favor.

David Xu · Post by **David Xu** » Fri Mar 16, 2018 4:13 pm

From my perspective, it's people like you who are trapped in the old paradigm and refuse to update your views until the evidence literally forces you to do so.

For those of us who like to come to correct conclusions before said conclusions are staring us in the face, your way of thinking is less useful.

jkiliani · Post by **jkiliani** » Fri Mar 16, 2018 4:20 pm

Milos wrote:
jkiliani wrote:The success of AlphaZero showed that changed. The future belongs to hybrid engines combining traditional search methods with neural nets trained by self-play.
The "success" of AlphaZero only created a delusion and suddenly many ppl started behaving as if they had a crystal ball which obviously they don't.

What is your problem with AlphaZero? Even if it had lost the match against Stockfish narrowly, it would still be a phenomenal achievement to hack together an engine without domain knowledge or explicit parameter tuning, by just coding in the game rules and a representation of input/output, and letting it train itself from that point to a strength rivalling the strongest engines developed with years of work. If you think the Stockfish they used was that crippled, then why don't you replicate a match where you measure the strength of Stockfish 8 on 64 threads, 1 GB hash tables, and no book or table bases, against Stockfish 9 with any hardware or tuning you can think of. At that point you would have a discussion basis for the assertion that the Stockfish in their match is crippled, but not before.

AlphaZero simply initiated a paradigm shift in computer chess programming, by showing that something completely different works as well or even better with some refinement.

LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update