LCZero update (2)

Rein Halbersma · Post by **Rein Halbersma** » Sun Mar 25, 2018 3:08 pm

[Moderation] This thread was split off from the original LCZero update thread ( http://talkchess.com/forum/viewtopic.ph ... &start=260 ), and meant to continue the discussion, because the other was getting unmanageably long.

lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning

The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.

Uri Blass · Post by **Uri Blass** » Sun Mar 25, 2018 8:08 pm

Rein Halbersma wrote:
lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.

if bias is a difference from perfect play then random play has a bigger bias than top human level.

Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.

David Xu · Post by **David Xu** » Sun Mar 25, 2018 9:31 pm

Uri Blass wrote:
Rein Halbersma wrote:
lucasart wrote: Sorry if that's a dumb newbie question. I'm not familiar at all with NN.

But wouldn't it save a lof of time to:
* first train the net on high quality games (ie. Stockfish level)
* then you'd start with a reasonably strong net to improve with reinforced learning
The original AlphaGo approach was indeed to train a NN using supervised learning on top-level human games, and improve this NN with self-play using reinforcement learning (it's more complicated than that, since they had NN for both the move selection and the position evaluation).

The AlphaGoZero and AlphaZero approaches started from scratch using self-play reinforcement learning. The claim is that this ultimately leads to better results. Apparently, top-human level play has some biases that prevents or slows down progress beyond a certain level of play. Another surprising claim is that the full self-play algorithm actually requires less games to get to the top level.
if bias is a difference from perfect play then random play has a bigger bias than top human level.

Fixing the errors that random play has seems to me an harder task than fixing the errors of top humans.

"Bias" does not mean the same thing as "imperfection". A bias is a systematic difference from imperfection, i.e. a difference that has a directional component. This is pertinent because it could cause the network to get stuck in a local minimum some distance away from the global optimum.

CMCanavessi · Post by **CMCanavessi** » Sun Mar 25, 2018 9:41 pm

In the end, TSCP proved to still be too much for poor Leela Gen 20, even at 40/40. The match ended 8-2 in favor of TSCP.

I'll upload the full pgn in a minute

CMCanavessi · Post by **CMCanavessi** » Sun Mar 25, 2018 9:57 pm

Here are the 10 games: http://www.mediafire.com/file/yydmm8u5j ... nchmark.7z

Werewolf · Post by **Werewolf** » Sun Mar 25, 2018 11:33 pm

Sorry to hear that Carlos. But your engine will get stronger and catch up

Milos · Post by **Milos** » Sun Mar 25, 2018 11:55 pm

CMCanavessi wrote:In the end, TSCP proved to still be too much for poor Leela Gen 20, even at 40/40. The match ended 8-2 in favor of TSCP.

I'll upload the full pgn in a minute

TSCP is 1700Elo on CCRL40/4. Error margins are huge, but 8:2 score is at least 200Elo difference for TSCP.
That means LeelaZero is more than 2000 Elo behind SF9.
Still extremely long way to go.

JJJ · Post by **JJJ** » Mon Mar 26, 2018 1:02 am

I can still win against it at 200 ms per move, but I wouldn't try against more time control, I wouldn't stand a chance.
[pgn]
1. d4 d5
2. c4 dxc4
3. e3 Nf6
4. Bxc4 e6
5. Nf3 Be7
6. O-O O-O
7. Nc3 c5
8. Bd3 cxd4
9. Nxd4 e5
10. Nb3 a6
11. Qc2 Nc6
12. Ne4 Nb4
13. Nxf6+ Bxf6
14. Bxh7+ Kh8
15. Qb1 Bh4
16. Bf5 Qf6
17. Bxc8 Raxc8
18. g3 Nc2
19. gxh4 Nxa1
20. Qxa1 Qc6
21. Bd2 Qf3
22. Qd1 Qxd1
23. Rxd1 Rfd8
24. Kf1 Rc4
25. Ke2 Rxh4
26. Rh1 Rh5
27. Bc3 Kh7
28. Nd2 b5
29. Ne4 Re8
30. Nd6 Rf8
31. h4 f6
32. Kf3 Rd8
33. Ne4 Kg6
34. a3 Kf7
35. Ng3 Rhh8
36. h5 Rhe8
37. Nf5 g6
38. hxg6+ Kxg6
39. e4 Kg5
40. Rg1+ Kh5
41. Ng7+ Kh4
42. Nxe8 Rxe8
43. Rd1 Kg5
44. Rd6 Rc8
45. Rxa6 Rd8
46. Ke3 Rc8
47. Rb6 Rg8
48. Rxb5 Rc8
49. a4 Ra8
50. a5 Kh5
51. Rb6 Kg6
52. a6 Kg5
53. b4 Kg4
54. b5 Rc8
55. Bb4 Ra8
56. Rb7 f5
57. exf5 Kxf5
58. a7 Ke6
59. b6 Kd5
60. Rb8 Rxb8
61. axb8=Q Kc6
62. Qxe5 Kb7
63. Qc7+ Ka8
64. Qa7# [/pgn]

Jhoravi · Post by **Jhoravi** » Mon Mar 26, 2018 5:44 am

CMCanavessi wrote:Here are the 10 games: http://www.mediafire.com/file/yydmm8u5j ... nchmark.7z

Thanks. But how are you able to make LCZero vary its opening when it doesn't have opening book?

David Xu · Post by **David Xu** » Mon Mar 26, 2018 5:49 am

LCZero possesses a "-noise" command that applies Dirichlet noise to its move selection, thereby causing randomness in its play.

LCZero update (2)

LCZero update (2)

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update

Re: LCZero update