xr_a_y wrote: ↑
Mon Nov 16, 2020 3:08 pm
Where does this speed come from. 256M sfens in 10 min seems very fast. On my cpu, I need 1day for 100M !
Is that only coming from gpu usage ?
Hmm... With Seer's current training code, when I tested it a while ago, performance on an RTX 2080 TI was somewhere around 80,000-100,000 packed fens per second, though the CPU was proving a bit of a bottleneck even with 24 threads allocated to data loading. On my home system with a GTX 950, I'm getting more like ~15,000 packed fens per second. Therefore, even if you have a quite weak GPU, it should be possible to significantly outperform your past results with Seer's current training code.
In any case, my training code is currently not competitive with Gary's pytorch-nnue training code in terms of performance. This is both because I'm not using sparse tensors and my data loading code is still written Python.
I'm currently working on a new version of my training code which more tightly integrates with my engine by exposing a good chunk of my engine to Python as a module through PyBind11. This enables me to efficiently load data in C++ into Python using sparse tensors, use the qsearch leaf mapping appproach and control RL data generation all from within my training Python scripts. Additionally, I'm working towards making my incremental updates and board state -> input feature mapping completely generic such that any subset of the set of quadratic board features (piece*piece relations) is supported (I'm inclined to doubt king*piece relations are close to the optimal subset of quadratic features). As I explore more exotic input features, this will enable for guaranteed consistency between Python and my C++ implementation.
This new version, being less standalone, will, unfortunately, likely be far more difficult to introduce into Minic. My recommendation would be to fork Gary's pytorch-nnue code and get it to export Seer style networks if you're interested in getting better performance. This shouldn't be too diffcult as Gary's code was originally roughly based around my training code (it's a bit like the Ship of Theseus at this point with all the improvements and general cleanup though