Multigather is now made the default (and also improved). Some search settings have changed meaning, so if you have modified values please discard them. Specifically, max-collision-events, max-collision-visits and max-out-of-order-evals-factor have changed default values, but other options also affect the search. Similarly, check that your GUI is not caching the old values.
Updated several other default parameter values, including the MLH ones.
Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM.
Support for policy focus during training.
Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network.
The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move).
The legacy time manager is again the default and will use more time for the first move after a long book line.
The --preload command line flag will initialize the backend and load the network during startup. This may help in cases where the GUI is confused by long start times, but only if backend and network are not changed via UCI options.
A 'fen' command was added as a UCI extension to print the current position.
Experimental onednn backend for recent intel CPUs and GPUs.
Added support for ONNX network files and runtime with the onnx backend.
Several bug and stability fixes.
Note: Some small third-party nets seem to play really bad with the dx12 backend and certain GPU drivers, setting the enable-gemm-metacommand=false backend option is reported to work around this issue.
Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM.
Nice!
The multi_stream option gave me a +21% speed-up on my 3080 using the large J94-100 network. The speed-up was smaller with the newer 15b default net (around 6-7%, but I didn't save the results).
For already slow GTX 1650 card 0.28 is 20% slower than 0.27 . BTW what's the slowest reasonable time control for Lc0? Default time buffer is 200 ms (SF has 10 ms). So 60+0,6 isn't making any sense! Even 1 sec increment is short vs 200 ms. Can I change time buffer to 10 ms!?
Multigather is now made the default (and also improved). Some search settings have changed meaning, so if you have modified values please discard them. Specifically, max-collision-events, max-collision-visits and max-out-of-order-evals-factor have changed default values, but other options also affect the search. Similarly, check that your GUI is not caching the old values.
Updated several other default parameter values, including the MLH ones.
Performance improvements for the cuda/cudnn backends. This includes the multi_stream cuda backend option that is off by default. You should test adding multi_stream=true to backend-opts (command line) or BackendOptions (UCI) if you have a recent GPU with a lot of VRAM.
Support for policy focus during training.
Larger/stronger 15b default net for all packages except android, blas and dnnl that get a new 10b network.
The distributed binaries come with the mimalloc memory allocator for better performance when a large tree has to be destroyed (e.g. after an unexpected move).
The legacy time manager is again the default and will use more time for the first move after a long book line.
The --preload command line flag will initialize the backend and load the network during startup. This may help in cases where the GUI is confused by long start times, but only if backend and network are not changed via UCI options.
A 'fen' command was added as a UCI extension to print the current position.
Experimental onednn backend for recent intel CPUs and GPUs.
Added support for ONNX network files and runtime with the onnx backend.
Several bug and stability fixes.
Note: Some small third-party nets seem to play really bad with the dx12 backend and certain GPU drivers, setting the enable-gemm-metacommand=false backend option is reported to work around this issue.
Giancarlo, can I assign the Belgian flag to LC0 on my SuperBlitz, or should I set an Earth flag? Thank you, just started testing the latest version
Chess engines and dedicated chess computers fan since 1981 macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64. ProteusSF Dev Forum
Jouni wrote: ↑Wed Sep 01, 2021 9:57 pm
For already slow GTX 1650 card 0.28 is 20% slower than 0.27 . BTW what's the slowest reasonable time control for Lc0? Default time buffer is 200 ms (SF has 10 ms). So 60+0,6 isn't making any sense! Even 1 sec increment is short vs 200 ms. Can I change time buffer to 10 ms!?
Are you using the cuda package instead of the cudnn one? For that card cudnn will probably be faster.
Also MoveOverheadMs is "Amount of time, in milliseconds, that the engine subtracts from it’s total available time (to compensate for slow connection, interprocess communication, etc)." so unrelated to increment. You can find descriptions for most of the parameters here:
For better or for worse there are great many configuration options for Lc0.
Which options are best depends on the GPU (or CPU) hardware (and drivers and libraries), the net size, time controls and Lc0 version.
The changes with v28 may be slower or faster in nps depending on a particular configuration.
However, there are search changes, so nps speed is not as important as actual playing strength.
That said, overall the 1650 is a pretty weak GPU for Lc0.