NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

MichaelL · Post by **MichaelL** » Tue Sep 24, 2024 2:20 pm

Hi,

I've just started working on a Chess engine project (ANSI C - just for personal fun, nothing serious).

I've completed the first step - board representation, move generation and perft tests ('Kiwi Pete' and some curated set I found online).

I pass all the pert tests. I average around 2.6s for perft of start position to depth 6. I think is likely fine as optimising this any further may not be worth it, I'm assuming the search will be the most important wrt. performance.

I've been thinking about the evaluation function next. It is something I can build and test, before getting onto the tricker elements of search and optimising that.

So, it looks like I have some options:

Firstly, build a hand crafted evaluation function. I've been using the chess programming wiki site, it looks pretty straightforward.
However most engines are now using Neural Nets it seems and I've a lot to learn to even get started there.

Another option would be to utilise the prebuilt model from Leela or NNUE: looking at the source from those projects, it doesn't appear simple to untangle into an API that I can embed in my fledgling engine.

Does anyone know if there are libraries/code snippets that already do this?

My other question is, obviously training a NN to a good standard may require significant resources, has anyone managed to get
a simple example NN playing to a reasonable standard, training on one machine?
Any pointers (no pun intended) or tips on how to get started in NN for chess would be welcome.

Thanks in advance,
Mike Lewis

smatovic · Post by **smatovic** » Thu Sep 26, 2024 6:38 am

If fun is the metric I suggest to code at first your own handcrafted eval, "the journey is the reward", then one step further you can implement neural networks.

Re: How to get started with NNUE
viewtopic.php?p=957108#p957108

--
Srdja

op12no2 · Post by **op12no2** » Thu Sep 26, 2024 11:24 am

Another advantage of doing a HCE first is that you can then use it to add score and win/draw/loss labels to positions generated from self play and use those to train your first net.

JacquesRW · Post by **JacquesRW** » Thu Sep 26, 2024 11:31 am

MichaelL wrote: ↑Tue Sep 24, 2024 2:20 pm My other question is, obviously training a NN to a good standard may require significant resources, has anyone managed to get
a simple example NN playing to a reasonable standard, training on one machine?

Training a NN that will beat any HCE you could produce takes incredibly little time on a single consumer machine. We're talking single digit hours of datagen (if you start from your HCE), and <1 hour of total training time (with or without a gpu).

Almost every engine in the top 50 trains its networks on the author's own single machine (although not everyone does their own datagen, which is the most time consuming part).

MichaelL wrote: ↑Tue Sep 24, 2024 2:20 pm Another option would be to utilise the prebuilt model from Leela or NNUE: looking at the source from those projects, it doesn't appear simple to untangle into an API that I can embed in my fledgling engine.

Using other people's networks (other than for testing your inference works) is strongly, strongly discouraged. And if you want to use one to test inference you should use a simpler network architecture - even the earliest networks of SF NNUE is a terrible reference point for beginners.

op12no2 wrote: ↑Thu Sep 26, 2024 11:24 am Another advantage of doing a HCE first is that you can then use it to add score and win/draw/loss labels to positions generated from self play and use those to train your first net.

This will just save a very minor amount of time over starting from a random network.

Ciekce · Post by **Ciekce** » Thu Sep 26, 2024 12:25 pm

^^^ in full agreement with JW's post above

MichaelL wrote: ↑Tue Sep 24, 2024 2:20 pm prebuilt model from Leela or NNUE

I'd also add that "NNUE" is the entire technique, it doesn't refer to specifically Stockfish networks, and Leela nets themselves are not going to work in an a/b engine.

MichaelL · Post by **MichaelL** » Fri Sep 27, 2024 2:26 am

Thanks for the advise, I'll start with a handcrafted eval. I managed mainly thanks to peppering the code with 'inline __attribute__((always_inline))' to get my perft 6 from startpos to an average of 2.117s which probably could
go a bit quicker... I've assumed that make/undo is going to be faster than copy-make.

JacquesRW · Post by **JacquesRW** » Fri Sep 27, 2024 10:53 am

MichaelL wrote: ↑Fri Sep 27, 2024 2:26 am I've assumed that make/undo is going to be faster than copy-make.

What makes you think that is a valid assumption to make? Copying is extremely fast. In perft it probably depends on the size of your board - for me copy/make has always been faster in perft (and it hardly matters in the actual search).

MichaelL · Post by **MichaelL** » Fri Sep 27, 2024 8:41 pm

JacquesRW wrote: ↑Fri Sep 27, 2024 10:53 am What makes you think that is a valid assumption to make? Copying is extremely fast. In perft it probably depends on the size of your board - for me copy/make has always been faster in perft (and it hardly matters in the actual search).

That's a fair point, but I'd be amazed if memcpy of a set of bitboards would be quicker than an in-situ undo, but you are correct, I probably shouldn't assume but test!

JacquesRW · Post by **JacquesRW** » Fri Sep 27, 2024 10:08 pm

MichaelL wrote: ↑Fri Sep 27, 2024 8:41 pm
JacquesRW wrote: ↑Fri Sep 27, 2024 10:53 am What makes you think that is a valid assumption to make? Copying is extremely fast. In perft it probably depends on the size of your board - for me copy/make has always been faster in perft (and it hardly matters in the actual search).
That's a fair point, but I'd be amazed if memcpy of a set of bitboards would be quicker than an in-situ undo, but you are correct, I probably shouldn't assume but test!

Even if we allow for your board state to be rather large, e.g. almost 200 bytes, look at what it compiles to on modern CPUs:
https://godbolt.org/z/8q698T4je
But yes, just test.

chesskobra · Post by **chesskobra** » Sat Sep 28, 2024 12:39 pm

Sorry if I am asking a question that was answered in this or another similar thread. But it was suggested that there are engines that use a very simple architecture like an array of 768 bits for input position and a small hidden layer of 16 neurons. Here viewtopic.php?p=957361#p957361 it was explained by lithander in some detail.

What are some engines that implement a simple network like that? I would like to look at some small engines, preferably for CPU, with easy to understand C or C++ code. I am not interested in creating my own engine. But I would like to create a simple network, plug it into an existing engine and see what happens. Are there any scripts that I can use for data generation and training such a network?

NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine

Re: NN Eval functions/Embedding Leela/NNUE prebuilt models into Engine