Egbb dll neural network support

Daniel Shawul · Post by **Daniel Shawul** » Wed May 30, 2018 12:01 am

Hello,

I have just added neural network evaluation support via egbb dlls. This is not about training a neural network on endgame tablebases
(although nothing will prevent us from doing that), but of providing evaluation for any position using a neural network using the existing
interface of dll/so of Scorpio egbbs. I am posting to get feedback on the interface and approach before releasing. The way it works is similar to how
egbbs were probed using new functions

Code: Select all

typedef int (CDECL *PPROBE_NN) (int player, int* piece,int* square);
typedef void (CDECL *PLOAD_NN) (char* path);

These functions are loaded at the time egbbs are loaded.

The egbbdll are compiled with a tensorflow backend that will also allow you to run the NN evaluation on a GPU or CPU without you engine code
knowing anything about it. The work required to get your engine using neural networks is very minimal which is why I like this approach. The programmer can focus on experimenting with search approaches (alpha-beta, MCTS, hybrid etc) without worrying about the NN aspect of it.

I have a working implementation now. I have generated four 4 networks (1x32, 3x64, 6x64, 12x128 modified resnets) on a set of 2 million epd positions.
My alpha-beta and MCTS search work with it well.

Thoughts:
a) Do we need to have the policy values as well or is the eval network enough ? I only use the value network in my engine but
some people may want to have both for an MCTS implementation. It seems redundant to have both value + policy, but it seems
it is more efficient in an actor-critic setting though i don't fully understand it.

b) One can train a network of his own using PGN (both policy+eval) and labeled EPD files (value network only) from any set of games.
eg. Leela's 15 million games.. The format is a google's protobuf (.pb files) and this can be shared. One can experiment with any kind of
neural network and the interface remains unchanged since the network file will be part of the .pb files.

c) We can train neural networks on specific endgames, waiting for 7-men syzygy to finsih

d) The tensorflow interface is currently working only on linux (don't know how to compile it on windows). The libtensorflow_cc library is not
statically linked with egbbdll.so, which means you can download and use a libtensorflow_cc that works on the GPU, optimized for specific
cpu architecture etc. I could ofcourse link tensorflow statically and ship only egbbdll.so but the latter gives more options for the user but
probably more headaches too.

e) We will totally rely on tensorflow backend. The good: automatic support for cuDNN and optimized subroutines such as winograd transform
on 3x3 kernels, The bad: No OpenCL support yet.

... and probably many other things

regards,
Daniel

brianr · Post by **brianr** » Wed May 30, 2018 6:35 pm

Daniel Shawul wrote: ↑Wed May 30, 2018 12:01 am
I have a working implementation now. I have generated four 4 networks (1x32, 3x64, 6x64, 12x128 modified resnets) on a set of 2 million epd positions.

This looks quite interesting. I have been, ahem, tinkering with NNs for a while. These are just informal type comments based on my stumbling around.

First, 2MM epd positions seems like a very small number for chess training. I think Alpha Zero used 4,096 as the batch size; 1,024 with Leela Zero Chess. They are described as "mini-batches", and seem to be used more to count mini-batch "steps", and are not necessarily the gpu batch sizes. With 1,024 epd positions (samples) per batch, 2MM is less than 2K steps. Most training runs are for around 50,000 steps. However, only one pass thru all of the steps is generally done, or 1 epoch. I think I saw 20 epochs in your code. I gather that the idea with chess is that many more samples are better than many epochs with fewer samples.

Next, the smallest NNs that I have tried and seen any minimal chess training is 10x128. 7x256 is quite a bit better. I have not trained any larger NNs long enough yet.

And, although I have not taken an in depth look at the code as yet, the size of the value head (in resnet.py) should be large enough to encompass the granularity of the scores that you want.

Naturally, these comments relate to the NN training, and not to their use in your framework. I mention them thinking that the framework might not be an effective thing to test unless the NNs used are "pretty good" to begin with. Lastly, I am not a "zero" purist, and think using endgame tablebases are a good idea with NNs. I don't think of egtbs as chess strategy, and having them would seem to shorten the NN training time as it does not have to learn about those positions.

Thank you for sharing all of your work for so many years.

Daniel Shawul · Post by **Daniel Shawul** » Wed May 30, 2018 11:58 pm

brianr wrote: ↑Wed May 30, 2018 6:35 pm
Daniel Shawul wrote: ↑Wed May 30, 2018 12:01 am
I have a working implementation now. I have generated four 4 networks (1x32, 3x64, 6x64, 12x128 modified resnets) on a set of 2 million epd positions.
This looks quite interesting. I have been, ahem, tinkering with NNs for a while. These are just informal type comments based on my stumbling around.

First, 2MM epd positions seems like a very small number for chess training. I think Alpha Zero used 4,096 as the batch size; 1,024 with Leela Zero Chess. They are described as "mini-batches", and seem to be used more to count mini-batch "steps", and are not necessarily the gpu batch sizes. With 1,024 epd positions (samples) per batch, 2MM is less than 2K steps. Most training runs are for around 50,000 steps. However, only one pass thru all of the steps is generally done, or 1 epoch. I think I saw 20 epochs in your code. I gather that the idea with chess is that many more samples are better than many epochs with fewer samples.

Thanks for the feedback Brian! Those 2 million positions were what I used for tuning standard eval of my engine, but you are right that it is too small for training any decent sized resnet. It is hard to find labeled epd positions so I think I am going to focus on training from PGN files instead. As you also pointed out a couple of passes should be enough once you have a lot of data to train your network with.

Next, the smallest NNs that I have tried and seen any minimal chess training is 10x128. 7x256 is quite a bit better. I have not trained any larger NNs long enough yet.

I did a 1x32 just to see by how much it slow it down (about 400x). The tensorflow overhead already makes it like 50-100x slower even without evaluating a neural network anyway.

And, although I have not taken an in depth look at the code as yet, the size of the value head (in resnet.py) should be large enough to encompass the granularity of the scores that you want.

Note that what I am doing is not exactly a resnet in that it has some auxillary inputs, currently material differences for Q,R,B,N. This is to kind of "shortcut" determination of piece values and also not to waste a whole 8x8 plane on a scalar feature like material differences, or side to move.
My 12 input planes are attack maps instead of piece placements so that it would have an easier time figuring out mobility, king safety etc.

I also use symmetry such that neural network evaluates positions only from white to move, and also horizontal symmetry as well (i don't consdier castling). Without that the NN was giving me different results for mirrored positions and i think it is better to remove the symmetry during training. Although not worth the effort, pawnless postions have 8-fold symmetry just like in the tbs.

Naturally, these comments relate to the NN training, and not to their use in your framework. I mention them thinking that the framework might not be an effective thing to test unless the NNs used are "pretty good" to begin with. Lastly, I am not a "zero" purist, and think using endgame tablebases are a good idea with NNs. I don't think of egtbs as chess strategy, and having them would seem to shorten the NN training time as it does not have to learn about those positions.

Neither am I a "zero" purists as you can tell from my NN design choices.

Thank you for sharing all of your work for so many years.

Thanks for the feedbacks!

brianr · Post by **brianr** » Thu May 31, 2018 4:12 am

I use pgn files as input since I don't have the resources myself for millions of self-play games. Once I understand things better and can get reproducible results, then I want to see how important the prior move "history" inputs are.

After the initial slow-down for TF, the size of the NN might not matter quite as much if you are running on a fairly robust GPU. Changing the batch size is yet another parameter to try.

The value head size I mentioned is for the output tensor shape just before applying the tanh. There should be room for many features.

I think most use white-to-move and simplify the input quite a lot. Of course, castle status, ep, and rep (3-fold and 50) are always problematic (to normalize or not, one full plane or just scalars, etc). That said, generally, the number of inputs is a relatively small factor relative to the size of the NN. To minimize CPU RAM I have tried ints for large sample shuffle spaces and convert to floats within the NN. I think Leela Zero chess primarily uses bits to make things even smaller. Right now, I am opting for raw pgn input over more pre-processing for flexibility.

Daniel Shawul · Post by **Daniel Shawul** » Thu May 31, 2018 4:41 pm

brianr wrote: ↑Thu May 31, 2018 4:12 am I use pgn files as input since I don't have the resources myself for millions of self-play games. Once I understand things better and can get reproducible results, then I want to see how important the prior move "history" inputs are.

After the initial slow-down for TF, the size of the NN might not matter quite as much if you are running on a fairly robust GPU. Changing the batch size is yet another parameter to try.

The value head size I mentioned is for the output tensor shape just before applying the tanh. There should be room for many features.

I think most use white-to-move and simplify the input quite a lot. Of course, castle status, ep, and rep (3-fold and 50) are always problematic (to normalize or not, one full plane or just scalars, etc). That said, generally, the number of inputs is a relatively small factor relative to the size of the NN. To minimize CPU RAM I have tried ints for large sample shuffle spaces and convert to floats within the NN. I think Leela Zero chess primarily uses bits to make things even smaller. Right now, I am opting for raw pgn input over more pre-processing for flexibility.

The reason why they have history planes is probably mostly for the sake of Go where move patterns are common. Direct policy gradient reinforcement learning works well in that domain and history planes are quite important (policy network alone is over 3000 elo there). I don't think they bring much benefit for chess but as you said additional input planes do not affect size as they go away after the first convolution.

I agree about the output tensor shape being small, will expand it to 128 or 256.

Albert Silver · Post by **Albert Silver** » Fri Jun 01, 2018 6:36 pm

Daniel Shawul wrote: ↑Wed May 30, 2018 12:01 am Hello,

I have just added neural network evaluation support via egbb dlls. This is not about training a neural network on endgame tablebases
(although nothing will prevent us from doing that), but of providing evaluation for any position using a neural network using the existing
interface of dll/so of Scorpio egbbs. I am posting to get feedback on the interface and approach before releasing. The way it works is similar to how
egbbs were probed using new functions
Code: Select all
typedef int (CDECL *PPROBE_NN) (int player, int* piece,int* square);
typedef void (CDECL *PLOAD_NN) (char* path);
These functions are loaded at the time egbbs are loaded.

The egbbdll are compiled with a tensorflow backend that will also allow you to run the NN evaluation on a GPU or CPU without you engine code
knowing anything about it. The work required to get your engine using neural networks is very minimal which is why I like this approach. The programmer can focus on experimenting with search approaches (alpha-beta, MCTS, hybrid etc) without worrying about the NN aspect of it.

I have a working implementation now. I have generated four 4 networks (1x32, 3x64, 6x64, 12x128 modified resnets) on a set of 2 million epd positions.
My alpha-beta and MCTS search work with it well.

Thoughts:
a) Do we need to have the policy values as well or is the eval network enough ? I only use the value network in my engine but
some people may want to have both for an MCTS implementation. It seems redundant to have both value + policy, but it seems
it is more efficient in an actor-critic setting though i don't fully understand it.

b) One can train a network of his own using PGN (both policy+eval) and labeled EPD files (value network only) from any set of games.
eg. Leela's 15 million games.. The format is a google's protobuf (.pb files) and this can be shared. One can experiment with any kind of
neural network and the interface remains unchanged since the network file will be part of the .pb files.

c) We can train neural networks on specific endgames, waiting for 7-men syzygy to finsih

d) The tensorflow interface is currently working only on linux (don't know how to compile it on windows). The libtensorflow_cc library is not
statically linked with egbbdll.so, which means you can download and use a libtensorflow_cc that works on the GPU, optimized for specific
cpu architecture etc. I could ofcourse link tensorflow statically and ship only egbbdll.so but the latter gives more options for the user but
probably more headaches too.

e) We will totally rely on tensorflow backend. The good: automatic support for cuDNN and optimized subroutines such as winograd transform
on 3x3 kernels, The bad: No OpenCL support yet.

... and probably many other things

regards,
Daniel

I think this is quite brilliant. I would ask, if at all possible, two things:

1) Easy-ish to use tool to build NN from material (games, etc)
2) A final NN format that is either compatible with LCZero (meaning would be a standard that was a two-way street), or a tool that could convert this (I suspect impossible). If your format is somehow better for some reason, then provide instructions so LCZero devs could add support.

Daniel Shawul · Post by **Daniel Shawul** » Sat Jun 02, 2018 10:46 am

Albert Silver wrote: ↑Fri Jun 01, 2018 6:36 pm
I think this is quite brilliant. I would ask, if at all possible, two things:

1) Easy-ish to use tool to build NN from material (games, etc)
2) A final NN format that is either compatible with LCZero (meaning would be a standard that was a two-way street), or a tool that could convert this (I suspect impossible). If your format is somehow better for some reason, then provide instructions so LCZero devs could add support.

1) Currently you can build a NN from set of games or EPD files. It is a 400 python code that takes advantage of the python-chess library.

2) I think so too -- though note that you can also take just the 15 million or so games and train your network.
It would be great to have the lc0 backend (not lczero) in egbbso as it supports lots of other backends. Currently I only have the tensorflow backend (just one backend among many) implemented in egbbdll. If someone from lc0 group is willing to work with me on this this should be done quickly.

I really hate the dependency hell with deep learning libraries like lczero's and now totally understand why GCP decided to go with hand-written openCL kernels.

Daniel Shawul · Post by **Daniel Shawul** » Tue Jan 15, 2019 7:48 pm

Egbbdll now supports leela networks, which have policy head, as well besides value head. To use the lc0 networks, they should
be converted to formats readable by egbbdll (i.e. either Protobuf (pb) format for Tensorflow backend, and UFF format for TensorRT support).
The cpu version does not work yet as it seems NCHW input format is embedded in the networks.
I have tested this works with the 9xxx networks but lcztools seems to have a problem with other version of networks 10, 20, 30 etc.
I am not sure if I got everything correctly, specially the network input format, so let me know if you find mistakes.
It seems to play decent but is not yet able to beat standard scorpio on 16 cores with the 9xxx networks -- maybe that would change with the bigger nets.

A sample run from the initial position with ID 9154 and tensorRT i get 64k nps

Code: Select all

# [st = 11112ms, mt = 29249ms , hply = 0 , moves_left 10]
63 5 111 70646  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8
64 5 222 143728  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3
65 4 334 216311  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8
66 4 445 287279  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
67 4 557 358522  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
68 4 668 428926  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
69 3 779 500345  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
70 3 891 570794  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
71 3 1002 642176  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2

# Move   Value=(V,P,V+P)   Policy  Visits                  PV
#----------------------------------------------------------------------------------
#  1   (0.508,0.000,0.508)  17.25  262209   d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8
#  2   (0.511,0.000,0.511)  12.52  138945   Ng1-f3 e7-e6 c2-c4 Ng8-f6 d2-d4 d7-d5 Nb1-c3 d5xc4 e2-e3 a7-a6 a2-a4 c7-c5 Bf1xc4 Nb8-c6 Ke1-g1 Bf8-e7 d4xc5 Qd8xd1 Rf1xd1 Be7xc5 Bc4-f1 Ke8-e7 Bc1-d2 Rh8-d8 Ra1-c1 Bc5-b4 Bd2-e1 Rd8xd1
#  3   (0.510,0.000,0.510)   9.08   72852   e2-e4 e7-e6 d2-d4 d7-d5 Nb1-c3 Ng8-f6 e4-e5 Nf6-d7 Nc3-e2 c7-c5 c2-c3 Nb8-c6 Ng1-f3 Bf8-e7 g2-g3 Qd8-a5 a2-a3 c5xd4 b2-b4 Qa5-b6 c3xd4 a7-a5 b4-b5 Qb6xb5 Ne2-c3 Qb5-b6 Ra1-b1 Qb6-d8
#  4   (0.513,0.000,0.513)   8.27   60907   c2-c4 c7-c5 Ng1-f3 Ng8-f6 Nb1-c3 g7-g6 g2-g3 Bf8-g7 Bf1-g2 d7-d5 c4xd5 Nf6xd5 Ke1-g1 Ke8-g8 d2-d4 c5xd4 Nf3xd4 Nd5xc3 b2xc3 Qd8-c7 Bc1-e3
#  5   (0.511,0.000,0.511)   7.36   48060   e2-e3 e7-e6 d2-d4 Ng8-f6 Ng1-f3 d7-d5 c2-c4 Bf8-e7 Nb1-c3 Ke8-g8 a2-a3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6
#  6   (0.506,0.000,0.506)   5.06   22415   h2-h3 d7-d5 d2-d4 c7-c5 e2-e3 Nb8-c6 Ng1-f3 Ng8-f6 d4xc5 e7-e6 a2-a3 a7-a5 c2-c4 Bf8xc5 Nb1-c3 Ke8-g8 Bf1-e2 d5xc4 Qd1xd8 Rf8xd8 Be2xc4 Bc5-f8 Ke1-e2 Bc8-d7 Rh1-d1 Ra8-c8 Bc1-d2
#  7   (0.504,0.000,0.504)   4.46   17370   Nb1-c3 d7-d5 e2-e4 d5-d4 Nc3-e2 c7-c5 Ne2-g3 Nb8-c6 Bf1-c4 Ng8-f6 d2-d3 h7-h5 h2-h3 h5-h4 Ng3-e2 e7-e6 Ng1-f3 a7-a6 a2-a4
#  8   (0.496,0.000,0.496)   4.20   15113   b2-b3 e7-e6 Bc1-b2 Ng8-f6 Ng1-f3 d7-d5 e2-e3 c7-c5 Bf1-b5 Nb8-d7 Ke1-g1 Bf8-e7 c2-c4 Ke8-g8
#  9   (0.503,0.000,0.503)   3.61   11331   a2-a3 c7-c5 e2-e3 d7-d5 Ng1-f3 Ng8-f6 d2-d4 c5xd4 e3xd4 Nb8-c6 c2-c3 Bc8-f5 Bc1-f4 h7-h6 Qd1-b3 Qd8-c8
# 10   (0.500,0.000,0.500)   3.48   10474   c2-c3 e7-e5 d2-d4 Nb8-c6 d4-d5 Nc6-e7 e2-e4 Ne7-g6 Ng1-f3 Ng8-f6 h2-h4 h7-h5 Bc1-g5 Bf8-e7 Nb1-d2 Nf6-g4 Bg5xe7 Qd8xe7 g2-g3 d7-d6
# 11   (0.497,0.000,0.497)   3.26    9111   g2-g3 d7-d5 Ng1-f3 c7-c5 Bf1-g2 Ng8-f6 Ke1-g1 Nb8-c6 d2-d4 e7-e6 c2-c4 d5xc4 d4xc5 Qd8xd1 Rf1xd1 Bf8xc5 Nb1-d2 c4-c3 b2xc3
# 12   (0.490,0.000,0.490)   3.17    8493   d2-d3 d7-d5 Ng1-f3 Ng8-f6 d3-d4 c7-c5 e2-e3 Nb8-c6 d4xc5 e7-e6 a2-a3 a7-a5 c2-c4 Bf8xc5 Nb1-c3 Ke8-g8 Bf1-e2 d5xc4 Qd1xd8 Rf8xd8 Be2xc4 Bc5-f8 Ke1-e2 Bc8-d7 Rh1-d1 Ra8-c8 Bc4-b5 Bd7-e8
# 13   (0.492,0.000,0.492)   3.07    7989   b2-b4 e7-e6 Bc1-b2 Ng8-f6 a2-a3 d7-d5 Ng1-f3 c7-c5 b4xc5 Bf8xc5 e2-e3 Ke8-g8 c2-c4 Nb8-c6 d2-d4 Bc5-e7 Nb1-d2 b7-b6 Bf1-d3 Nc6-a5 c4xd5
# 14   (0.491,0.000,0.491)   2.99    7572   a2-a4 c7-c5 e2-e3 d7-d5 Ng1-f3 Ng8-f6 d2-d4 Nb8-c6 Bf1-b5 e7-e6 Ke1-g1 a7-a6 Bb5xc6 b7xc6 Nb1-d2 c5xd4
# 15   (0.477,0.000,0.477)   2.17    3858   Nb1-a3 e7-e5 c2-c4 Ng8-f6 Na3-c2 d7-d5 c4xd5 Nf6xd5 d2-d3 c7-c5 Ng1-f3 Nb8-c6 e2-e4 Nd5-f6
# 16   (0.475,0.000,0.475)   2.15    3771   h2-h4 d7-d5 d2-d4 c7-c5 e2-e3 Nb8-c6 Ng1-f3 Bc8-g4 Bf1-e2 Ng8-f6 Ke1-g1 e7-e6 Nb1-d2 c5xd4 e3xd4 Bf8-d6 c2-c3
# 17   (0.475,0.000,0.475)   2.12    3669   f2-f4 Ng8-f6 g2-g3 d7-d5 Ng1-f3 c7-c5 Bf1-g2 g7-g6 c2-c3 Bf8-g7 d2-d4 Ke8-g8 d4xc5
# 18   (0.469,0.000,0.469)   2.12    3622   Ng1-h3 d7-d5 d2-d4 c7-c5 e2-e3 Bc8xh3 g2xh3 e7-e6 Bf1-g2 Nb8-c6 Ke1-g1 Ng8-f6 c2-c4 c5xd4 c4xd5 Nf6xd5 e3xd4 Bf8-e7
# 19   (0.470,0.000,0.470)   1.97    3131   f2-f3 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 Nb1-c3 Qd5-d8 Qd1-e2 Nb8-c6 f3-f4 Nc6-d4 Qe2xe5 Ng8-e7 Bf1-d3 f7-f6 Qe5-e4 Bc8-f5
# 20   (0.432,0.000,0.432)   1.59    1878   g2-g4 d7-d5 Bf1-g2 e7-e5 d2-d3 Bc8xg4 c2-c4 c7-c6 Qd1-b3 Qd8-b6 c4xd5 Qb6xb3 a2xb3 Ng8-f6 d5xc6 Nb8xc6 Bg2xc6

# nodes = 7964351 <0% qnodes> time = 11116ms nps = 716476 eps = 0 nneps = 61014
# Tree: nodes = 22298165 depth = 33 pps = 64121 visits = 712771 
#       qsearch_calls = 0 search_calls = 0

Daniel

chrisw · Post by **chrisw** » Wed Jan 16, 2019 10:30 am

Daniel Shawul wrote: ↑Tue Jan 15, 2019 7:48 pm Egbbdll now supports leela networks, which have policy head, as well besides value head. To use the lc0 networks, they should
be converted to formats readable by egbbdll (i.e. either Protobuf (pb) format for Tensorflow backend, and UFF format for TensorRT support).
The cpu version does not work yet as it seems NCHW input format is embedded in the networks.
I have tested this works with the 9xxx networks but lcztools seems to have a problem with other version of networks 10, 20, 30 etc.
I am not sure if I got everything correctly, specially the network input format, so let me know if you find mistakes.
It seems to play decent but is not yet able to beat standard scorpio on 16 cores with the 9xxx networks -- maybe that would change with the bigger nets.

A sample run from the initial position with ID 9154 and tensorRT i get 64k nps

Code: Select all

# [st = 11112ms, mt = 29249ms , hply = 0 , moves_left 10]
63 5 111 70646  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8
64 5 222 143728  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3
65 4 334 216311  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8
66 4 445 287279  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
67 4 557 358522  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
68 4 668 428926  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
69 3 779 500345  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
70 3 891 570794  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2
71 3 1002 642176  d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8 Qd1-c2

# Move   Value=(V,P,V+P)   Policy  Visits                  PV
#----------------------------------------------------------------------------------
#  1   (0.508,0.000,0.508)  17.25  262209   d2-d4 d7-d5 c2-c4 e7-e6 Nb1-c3 Ng8-f6 Bc1-g5 Bf8-e7 e2-e3 h7-h6 Bg5-h4 Ke8-g8 Ng1-f3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6 Ke1-g1 Rf8-e8 h2-h3 Nd7-f8
#  2   (0.511,0.000,0.511)  12.52  138945   Ng1-f3 e7-e6 c2-c4 Ng8-f6 d2-d4 d7-d5 Nb1-c3 d5xc4 e2-e3 a7-a6 a2-a4 c7-c5 Bf1xc4 Nb8-c6 Ke1-g1 Bf8-e7 d4xc5 Qd8xd1 Rf1xd1 Be7xc5 Bc4-f1 Ke8-e7 Bc1-d2 Rh8-d8 Ra1-c1 Bc5-b4 Bd2-e1 Rd8xd1
#  3   (0.510,0.000,0.510)   9.08   72852   e2-e4 e7-e6 d2-d4 d7-d5 Nb1-c3 Ng8-f6 e4-e5 Nf6-d7 Nc3-e2 c7-c5 c2-c3 Nb8-c6 Ng1-f3 Bf8-e7 g2-g3 Qd8-a5 a2-a3 c5xd4 b2-b4 Qa5-b6 c3xd4 a7-a5 b4-b5 Qb6xb5 Ne2-c3 Qb5-b6 Ra1-b1 Qb6-d8
#  4   (0.513,0.000,0.513)   8.27   60907   c2-c4 c7-c5 Ng1-f3 Ng8-f6 Nb1-c3 g7-g6 g2-g3 Bf8-g7 Bf1-g2 d7-d5 c4xd5 Nf6xd5 Ke1-g1 Ke8-g8 d2-d4 c5xd4 Nf3xd4 Nd5xc3 b2xc3 Qd8-c7 Bc1-e3
#  5   (0.511,0.000,0.511)   7.36   48060   e2-e3 e7-e6 d2-d4 Ng8-f6 Ng1-f3 d7-d5 c2-c4 Bf8-e7 Nb1-c3 Ke8-g8 a2-a3 Nb8-d7 c4xd5 e6xd5 Bf1-d3 c7-c6
#  6   (0.506,0.000,0.506)   5.06   22415   h2-h3 d7-d5 d2-d4 c7-c5 e2-e3 Nb8-c6 Ng1-f3 Ng8-f6 d4xc5 e7-e6 a2-a3 a7-a5 c2-c4 Bf8xc5 Nb1-c3 Ke8-g8 Bf1-e2 d5xc4 Qd1xd8 Rf8xd8 Be2xc4 Bc5-f8 Ke1-e2 Bc8-d7 Rh1-d1 Ra8-c8 Bc1-d2
#  7   (0.504,0.000,0.504)   4.46   17370   Nb1-c3 d7-d5 e2-e4 d5-d4 Nc3-e2 c7-c5 Ne2-g3 Nb8-c6 Bf1-c4 Ng8-f6 d2-d3 h7-h5 h2-h3 h5-h4 Ng3-e2 e7-e6 Ng1-f3 a7-a6 a2-a4
#  8   (0.496,0.000,0.496)   4.20   15113   b2-b3 e7-e6 Bc1-b2 Ng8-f6 Ng1-f3 d7-d5 e2-e3 c7-c5 Bf1-b5 Nb8-d7 Ke1-g1 Bf8-e7 c2-c4 Ke8-g8
#  9   (0.503,0.000,0.503)   3.61   11331   a2-a3 c7-c5 e2-e3 d7-d5 Ng1-f3 Ng8-f6 d2-d4 c5xd4 e3xd4 Nb8-c6 c2-c3 Bc8-f5 Bc1-f4 h7-h6 Qd1-b3 Qd8-c8
# 10   (0.500,0.000,0.500)   3.48   10474   c2-c3 e7-e5 d2-d4 Nb8-c6 d4-d5 Nc6-e7 e2-e4 Ne7-g6 Ng1-f3 Ng8-f6 h2-h4 h7-h5 Bc1-g5 Bf8-e7 Nb1-d2 Nf6-g4 Bg5xe7 Qd8xe7 g2-g3 d7-d6
# 11   (0.497,0.000,0.497)   3.26    9111   g2-g3 d7-d5 Ng1-f3 c7-c5 Bf1-g2 Ng8-f6 Ke1-g1 Nb8-c6 d2-d4 e7-e6 c2-c4 d5xc4 d4xc5 Qd8xd1 Rf1xd1 Bf8xc5 Nb1-d2 c4-c3 b2xc3
# 12   (0.490,0.000,0.490)   3.17    8493   d2-d3 d7-d5 Ng1-f3 Ng8-f6 d3-d4 c7-c5 e2-e3 Nb8-c6 d4xc5 e7-e6 a2-a3 a7-a5 c2-c4 Bf8xc5 Nb1-c3 Ke8-g8 Bf1-e2 d5xc4 Qd1xd8 Rf8xd8 Be2xc4 Bc5-f8 Ke1-e2 Bc8-d7 Rh1-d1 Ra8-c8 Bc4-b5 Bd7-e8
# 13   (0.492,0.000,0.492)   3.07    7989   b2-b4 e7-e6 Bc1-b2 Ng8-f6 a2-a3 d7-d5 Ng1-f3 c7-c5 b4xc5 Bf8xc5 e2-e3 Ke8-g8 c2-c4 Nb8-c6 d2-d4 Bc5-e7 Nb1-d2 b7-b6 Bf1-d3 Nc6-a5 c4xd5
# 14   (0.491,0.000,0.491)   2.99    7572   a2-a4 c7-c5 e2-e3 d7-d5 Ng1-f3 Ng8-f6 d2-d4 Nb8-c6 Bf1-b5 e7-e6 Ke1-g1 a7-a6 Bb5xc6 b7xc6 Nb1-d2 c5xd4
# 15   (0.477,0.000,0.477)   2.17    3858   Nb1-a3 e7-e5 c2-c4 Ng8-f6 Na3-c2 d7-d5 c4xd5 Nf6xd5 d2-d3 c7-c5 Ng1-f3 Nb8-c6 e2-e4 Nd5-f6
# 16   (0.475,0.000,0.475)   2.15    3771   h2-h4 d7-d5 d2-d4 c7-c5 e2-e3 Nb8-c6 Ng1-f3 Bc8-g4 Bf1-e2 Ng8-f6 Ke1-g1 e7-e6 Nb1-d2 c5xd4 e3xd4 Bf8-d6 c2-c3
# 17   (0.475,0.000,0.475)   2.12    3669   f2-f4 Ng8-f6 g2-g3 d7-d5 Ng1-f3 c7-c5 Bf1-g2 g7-g6 c2-c3 Bf8-g7 d2-d4 Ke8-g8 d4xc5
# 18   (0.469,0.000,0.469)   2.12    3622   Ng1-h3 d7-d5 d2-d4 c7-c5 e2-e3 Bc8xh3 g2xh3 e7-e6 Bf1-g2 Nb8-c6 Ke1-g1 Ng8-f6 c2-c4 c5xd4 c4xd5 Nf6xd5 e3xd4 Bf8-e7
# 19   (0.470,0.000,0.470)   1.97    3131   f2-f3 e7-e5 e2-e4 d7-d5 e4xd5 Qd8xd5 Nb1-c3 Qd5-d8 Qd1-e2 Nb8-c6 f3-f4 Nc6-d4 Qe2xe5 Ng8-e7 Bf1-d3 f7-f6 Qe5-e4 Bc8-f5
# 20   (0.432,0.000,0.432)   1.59    1878   g2-g4 d7-d5 Bf1-g2 e7-e5 d2-d3 Bc8xg4 c2-c4 c7-c6 Qd1-b3 Qd8-b6 c4xd5 Qb6xb3 a2xb3 Ng8-f6 d5xc6 Nb8xc6 Bg2xc6

# nodes = 7964351 <0% qnodes> time = 11116ms nps = 716476 eps = 0 nneps = 61014
# Tree: nodes = 22298165 depth = 33 pps = 64121 visits = 712771 
#       qsearch_calls = 0 search_calls = 0

Daniel

64knps? Fast! Are you batching positions? c or python?

Daniel Shawul · Post by **Daniel Shawul** » Wed Jan 16, 2019 6:49 pm

It is not really fast because it is a 6x64 net for which i belive lczero gets 100knps with.
I also get 100knps with my test version of 6x64 networks that have policy networks now.
I wanted to try out policy network because i may have misunderstood their purpose.
Sure they are weaker than qsearch() tactically but when and iff traps are absent (most cases), they
can drive your search to positionally good alternatives.

The probing code is c++11 for both Tensorflow and TensorRT support. Most optimized inference libraries are written
in c++ while training is done with python.

The good thing about my approach is that the format of the network is stored with the nets themselves (unlike leela's)
so you can experiment with lots of networks as long as the input/output nodes remain the same.

Egbb dll neural network support

Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support

Re: Egbb dll neural network support