You use one network to train another (usually smaller) network, rather than being trained by self-play or supervised learning data.
Distilled Networks for Lc0
Moderators: hgm, Rebel, chrisw
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: Distilled Networks for Lc0
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: Distilled Networks for Lc0
Everything runs on 1 CPU.
Although on CCRL, crafty 25.2 is at 3057.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Distilled Networks for Lc0
When you "distill" a network to get a smaller and faster NN it may lost some information.
What is your experience about it?
Did you make tests between original and "distilled" NN?
How faster the "distilled" NN-s are?
-
- Posts: 6340
- Joined: Mon Mar 13, 2006 2:34 pm
- Location: Acworth, GA
Re: Distilled Networks for Lc0
Allot of his test results are posted in Discord (https://discordapp.com/), but there is a sample posted here. https://github.com/dkappe/leela-chess-w ... d-Networks
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
__________________________________________________________________
Ted Summers
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Distilled Networks for Lc0
Thanks, but I should like to read the personal opinion of Mr.Kappe.AdminX wrote: ↑Sun Jan 27, 2019 11:05 amAllot of his test results are posted in Discord (https://discordapp.com/), but there is a sample posted here. https://github.com/dkappe/leela-chess-w ... d-Networks
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: Distilled Networks for Lc0
See the distilled network page for an extensive tournament testing various sizes. On gpu, the bigger the network, the stronger it plays. On cpu, it’s a trade off between speed and smarts.
https://github.com/dkappe/leela-chess-w ... d-Networks
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Distilled Networks for Lc0
OK, thanks.dkappe wrote: ↑Sun Jan 27, 2019 6:58 pmSee the distilled network page for an extensive tournament testing various sizes. On gpu, the bigger the network, the stronger it plays. On cpu, it’s a trade off between speed and smarts.
https://github.com/dkappe/leela-chess-w ... d-Networks
But about the lost of information during the process there is no any data/opinion.
I think in the case of GPUs the playing power also a trade off between speed and knowledge.
The recent 20x256 net size is not enough to overcharge an RTX 2080Ti.
Maybe a 40x256 can do it.
I think you know well the structure of Leela NN.
A question: Leela NN what we can download contains the policy head/net too?
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: Distilled Networks for Lc0
There’s lots of opinion on knowledge distillation and academic papers to boot. Here’s a layperson appropriate article for starters: https://medium.com/neural-machines/know ... 241d7c2322corres wrote: ↑Mon Jan 28, 2019 12:04 am
OK, thanks.
But about the lost of information during the process there is no any data/opinion.
I think in the case of GPUs the playing power also a trade off between speed and knowledge.
The recent 20x256 net size is not enough to overcharge an RTX 2080Ti.
Maybe a 40x256 can do it.
I think you know well the structure of Leela NN.
A question: Leela NN what we can download contains the policy head/net too?
You’re right. There isn’t unlimited headroom on a gpu, but no one has trained a network big enough to find that spot yet.
All networks have value and policy heads. The lc0 search needs and expects them to have those.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Distilled Networks for Lc0
OK, but these papers do not tell us your own works and the results depend on what you actually did. So I think an interpretation of your works would be needed.dkappe wrote: ↑Mon Jan 28, 2019 2:11 am There’s lots of opinion on knowledge distillation and academic papers to boot. Here’s a layperson appropriate article for starters: https://medium.com/neural-machines/know ... 241d7c2322
Naturally LC0 has the both head. But there are NN with separated value and policy head and there are NN in which the both are in the same structure.
It is pity that although LC0 is an open project yet its developers give us a very desultory and defective write down about LC0. Maybe they follow the precedent of Google Team?
If you only "expect" something about network of LC0 who is the man who know what is the truth?
-
- Posts: 1631
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: Distilled Networks for Lc0
I used the following branch of the lczero training code.corres wrote: ↑Mon Jan 28, 2019 9:00 am
OK, but these papers do not tell us your own works and the results depend on what you actually did. So I think an interpretation of your works would be needed.
Naturally LC0 has the both head. But there are NN with separated value and policy head and there are NN in which the both are in the same structure.
It is pity that although LC0 is an open project yet its developers give us a very desultory and defective write down about LC0. Maybe they follow the precedent of Google Team?
If you only "expect" something about network of LC0 who is the man who know what is the truth?
https://github.com/Ttl/lczero-training/tree/distill
As far explaining the code or the network architecture (beyond what’s already been written by the developers), I’m not the man.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".