Lc0.28 released

bmp1974 · Post by **bmp1974** » Thu Aug 26, 2021 12:39 pm

The much awaited Lc0.28 version is released.
https://github.com/LeelaChessZero/lc0/releases

In this release:

Multigather is now made the default (and also improved). Some search settings have changed meaning, so if you have modified values please discard them. Specifically, max-collision-events, max-collision-visits and max-out-of-order-evals-factor have changed default values, but other options also affect the search. Similarly, check that your GUI is not caching the old values.
Updated several other default parameter values, including the MLH ones.
Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM.
Support for policy focus during training.
Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network.
The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move).
The legacy time manager is again the default and will use more time for the first move after a long book line.
The --preload command line flag will initialize the backend and load the network during startup. This may help in cases where the GUI is confused by long start times, but only if backend and network are not changed via UCI options.
A 'fen' command was added as a UCI extension to print the current position.
Experimental onednn backend for recent intel CPUs and GPUs.
Added support for ONNX network files and runtime with the onnx backend.
Several bug and stability fixes.

Note: Some small third-party nets seem to play really bad with the dx12 backend and certain GPU drivers, setting the enable-gemm-metacommand=false backend option is reported to work around this issue.

bmp1974 · Post by **bmp1974** » Thu Aug 26, 2021 12:44 pm

I was wondering how to activate the current move tab for Lc0.28? It is not coming up automatically.

MMarco · Post by **MMarco** » Mon Aug 30, 2021 2:00 am

Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM.

Nice!

The multi_stream option gave me a +21% speed-up on my 3080 using the large J94-100 network. The speed-up was smaller with the newer 15b default net (around 6-7%, but I didn't save the results).

./lc0.exe benchmark

Code: Select all

J94-100
===========================
Total time (ms) : 340891
Nodes searched  : 10836183
Nodes/second    : 31788

./lc0.exe benchmark --backend-opts=multi_stream=true

Code: Select all

J94-100
===========================
Total time (ms) : 340524
Nodes searched  : 13105102
Nodes/second    : 38485

Jouni · Post by **Jouni** » Wed Sep 01, 2021 9:57 pm

For already slow GTX 1650 card 0.28 is 20% slower than 0.27

. BTW what's the slowest reasonable time control for Lc0? Default time buffer is 200 ms (SF has 10 ms). So 60+0,6 isn't making any sense! Even 1 sec increment is short vs 200 ms. Can I change time buffer to 10 ms!?

AlexChess · Post by **AlexChess** » Thu Sep 02, 2021 8:50 pm

bmp1974 wrote: ↑Thu Aug 26, 2021 12:39 pm The much awaited Lc0.28 version is released.
https://github.com/LeelaChessZero/lc0/releases

In this release:

Multigather is now made the default (and also improved). Some search settings have changed meaning, so if you have modified values please discard them. Specifically, max-collision-events, max-collision-visits and max-out-of-order-evals-factor have changed default values, but other options also affect the search. Similarly, check that your GUI is not caching the old values.
Updated several other default parameter values, including the MLH ones.
Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM.
Support for policy focus during training.
Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network.
The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move).
The legacy time manager is again the default and will use more time for the first move after a long book line.
The --preload command line flag will initialize the backend and load the network during startup. This may help in cases where the GUI is confused by long start times, but only if backend and network are not changed via UCI options.
A 'fen' command was added as a UCI extension to print the current position.
Experimental onednn backend for recent intel CPUs and GPUs.
Added support for ONNX network files and runtime with the onnx backend.
Several bug and stability fixes.

Note: Some small third-party nets seem to play really bad with the dx12 backend and certain GPU drivers, setting the enable-gemm-metacommand=false backend option is reported to work around this issue.

Giancarlo, can I assign the Belgian flag to LC0 on my SuperBlitz, or should I set an Earth flag? Thank you, just started testing the latest version

Viren · Post by **Viren** » Thu Sep 02, 2021 11:28 pm

Jouni wrote: ↑Wed Sep 01, 2021 9:57 pm For already slow GTX 1650 card 0.28 is 20% slower than 0.27 . BTW what's the slowest reasonable time control for Lc0? Default time buffer is 200 ms (SF has 10 ms). So 60+0,6 isn't making any sense! Even 1 sec increment is short vs 200 ms. Can I change time buffer to 10 ms!?

Are you using the cuda package instead of the cudnn one? For that card cudnn will probably be faster.

Also MoveOverheadMs is "Amount of time, in milliseconds, that the engine subtracts from it’s total available time (to compensate for slow connection, interprocess communication, etc)." so unrelated to increment. You can find descriptions for most of the parameters here:

https://lczero.org/play/flags/

Guenther · Post by **Guenther** » Fri Sep 03, 2021 10:15 am

AlexChess wrote: ↑Thu Sep 02, 2021 8:50 pm
bmp1974 wrote: ↑Thu Aug 26, 2021 12:39 pm The much awaited Lc0.28 version is released.
https://github.com/LeelaChessZero/lc0/releases

Giancarlo, can I assign the Belgian flag to LC0 on my SuperBlitz, or should I set an Earth flag? Thank you, just started testing the latest version :D

bmp1974

Code: Select all

Full name: Prasanna Bandihole

mclane · Post by **mclane** » Fri Sep 03, 2021 10:43 am

multi_stream=true maybe only works for CUDA, do we have something similar for AMD GPUs using DX12 ?!

Jouni · Post by **Jouni** » Fri Sep 03, 2021 4:29 pm

Benchmark in GTX 1650:

0.27.0
===========================
Total time (ms) : 346868
Nodes searched : 616413
Nodes/second : 1777

0.28.0
===========================
Total time (ms) : 351131
Nodes searched : 489606
Nodes/second : 1394

0.28.0 removed from hard drive now.

brianr · Post by **brianr** » Fri Sep 03, 2021 4:44 pm

For better or for worse there are great many configuration options for Lc0.
Which options are best depends on the GPU (or CPU) hardware (and drivers and libraries), the net size, time controls and Lc0 version.
The changes with v28 may be slower or faster in nps depending on a particular configuration.
However, there are search changes, so nps speed is not as important as actual playing strength.
That said, overall the 1650 is a pretty weak GPU for Lc0.

Even for SF, nps in the NNUE era does not mean much relative to strength.
See: https://github.com/dkappe/leela-chess-w ... NPS-Debate

Lc0.28 released

Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released

Re: Lc0.28 released