LCZero update

CMCanavessi · Post by **CMCanavessi** » Tue Mar 20, 2018 2:53 am

Oh, I forgot to remove the extension! Please change it to network or edit the Play.bat file and add .txt there.

Mind you, it will only affect the engine when you want to use it to play matches or make some tournaments. For training, it will always automatically download the latest network, so no problem there.

noobpwnftw · Post by **noobpwnftw** » Tue Mar 20, 2018 6:46 am

Running one instance of client on a 1080TI utilizes 100% CPU in one thread and 55% GPU, running two gets about 85% GPU usage. Usually it takes about a minute or less to finish a game. So it would be like 2000 games a day tops.

On the other hand, unfortunately there seems to be no working AMD OpenCL driver for Linux kernel 4.15 which I use.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue Mar 20, 2018 6:48 am

Thanks Carlos, I can set up my laptop super easy with your files. I can review my games easily as well. But still I dont understand how current Leela Zero is playing. In my machine, he is still moving pieces like random.

It is understandable that in early phase of learning, a machine may not know whether 1. e4 has high winning chance or 1.a3 has high winning chance.

But he should be able to see 1 ply move that capture free pieces, ? add additional knowledge/ extra basic rules?

koedem · Post by **koedem** » Tue Mar 20, 2018 7:29 am

Nay Lin Tun wrote:But he should be able to see 1 ply move that capture free pieces, ? add additional knowledge/ extra basic rules?

Well, the whole point of this is to not give the program any additional knowledge. Of course that means it will need a long time to even start playing half decent chess but ideally in the long run it leads to a stronger program.

koedem · Post by **koedem** » Tue Mar 20, 2018 7:37 am

When trying to run lczero.exe on Windows on receiving a go-command (e.g. go wtime 900000 btime 900000) the program crashes. The crash report contains
<ProblemSignatures>
<EventType>APPCRASH</EventType>
<Parameter0>lczero.exe</Parameter0>
<Parameter1>0.0.0.0</Parameter1>
<Parameter2>5aaff4fa</Parameter2>
<Parameter3>ucrtbase.DLL</Parameter3>
<Parameter4>10.0.10586.0</Parameter4>
<Parameter5>5632d193</Parameter5>
<Parameter6>40000015</Parameter6>
<Parameter7>000000000006990f</Parameter7>
</ProblemSignatures>

Any idea what the problem could be?

lantonov · Post by **lantonov** » Tue Mar 20, 2018 10:02 am

Wouldn't it be faster if initially you train against Stockfish? Once the learning curve levels you can continue with self-play.

jkiliani · Post by **jkiliani** » Tue Mar 20, 2018 10:22 am

lantonov wrote:Wouldn't it be faster if initially you train against Stockfish? Once the learning curve levels you can continue with self-play.

I don't see how training against Stockfish is practical, since the reinforcement learning algorithm relies on self-play. Training on Stockfish games is possible of course, and has already been done before starting the reinforcement learning at all. From what I recall it resulted in decent amateur level play.

Starting reinforcement learning from such a supervised net would likely also have been possible, but this would no longer be the "Zero human knowledge" approach. I think that in the long run, it likely wouldn't matter, and reinforcement learning would converge to the same results whether starting from a random net as we did, or from a supervised learning net trained on Stockfish games.

Nay Lin Tun · Post by **Nay Lin Tun** » Tue Mar 20, 2018 10:26 am

koedem wrote:
Nay Lin Tun wrote:But he should be able to see 1 ply move that capture free pieces, ? add additional knowledge/ extra basic rules?
Well, the whole point of this is to not give the program any additional knowledge. Of course that means it will need a long time to even start playing half decent chess but ideally in the long run it leads to a stronger program.

I understand what you mean. But without knowing the basic logic. " You cant win your opponent without material or positional advantage", how will Leela Zero choose a better move? Without those knowledge, it may wrongly get false conclusion.

For example, Leela Zero made 1. e4 Nf6 2. Qh5 Nxh5. and eventually lose the game and will get winning percentage about 40%. At the same time, Leela Zero.made 1. a4 b5. 2. axb5 Nc6. and finally win the game and will get winning chance about 60%

And NN will wrongly conclude that 1.a4 has higher winning chance than e4.

jkiliani · Post by **jkiliani** » Tue Mar 20, 2018 10:36 am

Nay Lin Tun wrote:I understand what you mean. But without knowing the basic logic. " You cant win your opponent without material or positional advantage", how will Leela Zero choose a better move? Without those knowledge, it may wrongly get false conclusion.

For example, Leela Zero made 1. e4 Nf6 2. Qh5 Nxh5. and eventually lose the game and will get winning percentage about 40%. At the same time, Leela Zero.made 1. a4 b5. 2. axb5 Nc6. and finally win the game and will get winning chance about 60%

And NN will wrongly conclude that 1.a4 has higher winning chance than e4.

The net would draw false conclusions like that if it had too little data, but each new network is based on tens of thousands of new self-play games in addition to games from older network generations. It will eventually draw the correct conclusions about opening moves, but the opening is one part of the game where we have to expect it to converge later than midgame and endgame.

For Leela Zero it was the same: The program could already do pretty decent midgame fighting but still played horrible opening moves. It just takes time, that's all.

CMCanavessi · Post by **CMCanavessi** » Tue Mar 20, 2018 11:16 am

lantonov wrote:Wouldn't it be faster if initially you train against Stockfish? Once the learning curve levels you can continue with self-play.

Yes, it would be faster, way faster. And it was already tried and it worked fine and Leela was indeed quite strong. But that's not the idea of the project, it was just made as a test to see if all was working ok in a quick manner.

The idea of the project is to have an engine that learsn from zero, by playing itself. Self-learning, not reinforced learning. That's why the "Zero". It starts from zero knowledge. If you introduce Stockfish, then it's not zero anymore.

LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update