Hacking around CFish NNUE

Discussion of chess software programming and technical issues.

Moderators: Harvey Williamson, bob, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
maksimKorzh
Posts: 334
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Hacking around CFish NNUE

Post by maksimKorzh » Thu Oct 15, 2020 11:00 am

Hi guys

I'm trying to embed NNUE from CFish by Robert De Man to my engine BBC.
Please don't hate me for that.
Assuming how noob I am I can hardly believe I would ever succeed in this.
Andy Grant ones said that it's the matter of several hours to embed NNUE to your engine,
well, for me it's probably a matter of several lives...

Anyway, even if I won't success in embedding it at least I want to learn how to apply it.
I'm staring at oldnnue.c https://github.com/syzygy1/Cfish/blob/m ... /oldnnue.c

What I've realized so far (sorry for a dumb level of understanding)
1. is that it's using current position in order to get appropriate weight from weights table
2. It uses some processor specific instructions for optimizing performance if possible and dummy calculation otherwise
3. It uses CFish specific types for pieces/color etc.

The current implementation is TOO COMPLICATED for my understanding.
I would like to simplify it the following way:

1. Load weights
2, Return eval for current position (I have global array of bitboards to represent board position)

Do it without fancy processor command optimization and literally drop off everything it could work without, obtain a bare bare minimum implementation, no matter if it would be slow. And then I just want to test via setting position, calling evaluate() and retrieving score like in handcrafted eval.

Can I achieve this in some other away but to born in new body with new consciousness and spending years studying math in university (you can't even imagine how bad I am in math)?

May be some simplified implementation of NNUE exists?
Or at least some implementation that is engine agnostic?

I mean stand alone NNUE implementation so user can send a position as input and retrieve score as output.
Thanks in advance.

User avatar
maksimKorzh
Posts: 334
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Hacking around CFish NNUE

Post by maksimKorzh » Thu Oct 15, 2020 2:38 pm

Well ok, no reply is also an answer...

What I've managed to achieve so far:
1. Compiled nnue.c separately
2. Initialized weights from file (well at least think so...)

Now in order to call nnue_evaluate(Position *pos) the only thing I have a lack of is position object.
First I was trying to initialize it from FEN, but getting segmentation fault all the time...
But then I've realized that probably there might not be a need of placing pieces on board
because the only fields of position object used in nnue.c are dirtyPiece and accumulator

So let me narrow my question from why life is unfair to code monkeys to the following:
1. Is anyone aware of what are dirtyPiece and accumulator
2. HOW can I initialize them?

User avatar
hgm
Posts: 25049
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Hacking around CFish NNUE

Post by hgm » Thu Oct 15, 2020 4:04 pm

Why are you bothering with code written by others? Without fancy CPU optimizations NNUE is pretty trivial, right? You just need 2*64*256 piece-square tables, 256 for each location of the white King, and 256 for each location of the black King. The 2*256 PST sums for the current King position are recalculated from scratch when you move a King, or incrementally updated when you move another piece. You then multiply each of these 512 values by a weight, add them and set the result to zero if it was negative, and do that 32 times (each time with a different set of weights). With the 32 results you repeat the multiply - sum - clip 32 times, to get again 32 results. These you just mutiply and add (no clipping), to get the evaluation score.

I am sure writing code like

Code: Select all

for(i=0; i<32; i++) {
  int sum = 0;
  for(j=0; j<512; j++) {
    sum += weights1[i][j] * layer1[j]
  }
  layer2[i] = max(0, sum);
}
is not really a challenge for anyone.

See https://www.chessprogramming.org/Stockfish_NNUE .
Get rid of the shit: vote for SHID!

Daniel Shawul
Posts: 4053
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: Hacking around CFish NNUE

Post by Daniel Shawul » Thu Oct 15, 2020 4:56 pm

maksimKorzh wrote:
Thu Oct 15, 2020 2:38 pm
Well ok, no reply is also an answer...

What I've managed to achieve so far:
1. Compiled nnue.c separately
2. Initialized weights from file (well at least think so...)

Now in order to call nnue_evaluate(Position *pos) the only thing I have a lack of is position object.
First I was trying to initialize it from FEN, but getting segmentation fault all the time...
But then I've realized that probably there might not be a need of placing pieces on board
because the only fields of position object used in nnue.c are dirtyPiece and accumulator

So let me narrow my question from why life is unfair to code monkeys to the following:
1. Is anyone aware of what are dirtyPiece and accumulator
2. HOW can I initialize them?
Accumulator is something NNUE takes care of for you. You just have to allocate space for it on the stack.
DirtyPiece is only needed for incremental updating.
Maybe you can try first to do without incremental evaluation (i.e. no need to update dirtyPiece or accumulator in make).
It is only 14-18% slower. To disable incremental update comment out this line:
https://github.com/syzygy1/Cfish/blob/m ... ue.c#L1025
Then you don't have to worry about updating accumulator or dirtyPiece in make move.

User avatar
maksimKorzh
Posts: 334
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Hacking around CFish NNUE

Post by maksimKorzh » Thu Oct 15, 2020 4:59 pm

hgm wrote:
Thu Oct 15, 2020 4:04 pm
Why are you bothering with code written by others? Without fancy CPU optimizations NNUE is pretty trivial, right? You just need 2*64*256 piece-square tables, 256 for each location of the white King, and 256 for each location of the black King. The 2*256 PST sums for the current King position are recalculated from scratch when you move a King, or incrementally updated when you move another piece. You then multiply each of these 512 values by a weight, add them and set the result to zero if it was negative, and do that 32 times (each time with a different set of weights). With the 32 results you repeat the multiply - sum - clip 32 times, to get again 32 results. These you just mutiply and add (no clipping), to get the evaluation score.

I am sure writing code like

Code: Select all

for(i=0; i<32; i++) {
  int sum = 0;
  for(j=0; j<512; j++) {
    sum += weights1[i][j] * layer1[j]
  }
  layer2[i] = max(0, sum);
}
is not really a challenge for anyone.

See https://www.chessprogramming.org/Stockfish_NNUE .
re: Without fancy CPU optimizations NNUE is pretty trivial, right?
- not to me unfortunately. I feel your explanation is brilliant, but still a rocket science to me

Can you please clarify the code:

Code: Select all

// 
for(i=0; i<32; i++) {
  int sum = 0;
  for(j=0; j<512; j++) {
    sum += weights1[i][j] * layer1[j]
  }
  layer2[i] = max(0, sum);
}
questions:
1. How can I initialize weights1?
2. weights1 is 2 dimensional array here, what values I need in 1st and 2nd indices when I define array? // e.g. weights1[?][?]
3. same question for layer1 amd and layer2 (I only understand that NNUE has 4 layers but that's rocket science to me)

Could you please provide the code the would be doing following(or give a link on implementation):
1. Init everything needed from "*.nnue" file with weights
2. then I guess the code you've already provided
3. And then somehow magically obtain a score

I would greatly appreciate the each line comment like you did in microMax

P.S. I mean really - I'm too dumb and my mind is collapsing. I swear understanding the move generator of microMax and implementing it on my own was a piece of cake (I followed your webstite tutorial) compared to this rocket science. HGM, if you can, PLEASE just give me commented code so I could see WHAT to input (how on earth to input board position) and get say, 0.20 score after d4 made in initial position. Sorry, but I can't learn from explanations, literally going insane, but I understand when every line is code is commented like in your microMax. Btw this is the reason to dig in someones code - that's the only way I can learn. All this rocket science CPW explanations are not an option for idiots.

User avatar
maksimKorzh
Posts: 334
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Hacking around CFish NNUE

Post by maksimKorzh » Thu Oct 15, 2020 5:01 pm

Daniel Shawul wrote:
Thu Oct 15, 2020 4:56 pm
maksimKorzh wrote:
Thu Oct 15, 2020 2:38 pm
Well ok, no reply is also an answer...

What I've managed to achieve so far:
1. Compiled nnue.c separately
2. Initialized weights from file (well at least think so...)

Now in order to call nnue_evaluate(Position *pos) the only thing I have a lack of is position object.
First I was trying to initialize it from FEN, but getting segmentation fault all the time...
But then I've realized that probably there might not be a need of placing pieces on board
because the only fields of position object used in nnue.c are dirtyPiece and accumulator

So let me narrow my question from why life is unfair to code monkeys to the following:
1. Is anyone aware of what are dirtyPiece and accumulator
2. HOW can I initialize them?
Accumulator is something NNUE takes care of for you. You just have to allocate space for it on the stack.
DirtyPiece is only needed for incremental updating.
Maybe you can try first to do without incremental evaluation (i.e. no need to update dirtyPiece or accumulator in make).
It is only 14-18% slower. To disable incremental update comment out this line:
https://github.com/syzygy1/Cfish/blob/m ... ue.c#L1025
Then you don't have to worry about updating accumulator or dirtyPiece in make move.
Thanks for your advice. I would sacrifice anything just to make output a score from a given FEN...

Daniel Shawul
Posts: 4053
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: Hacking around CFish NNUE

Post by Daniel Shawul » Thu Oct 15, 2020 5:07 pm

hgm wrote:
Thu Oct 15, 2020 4:04 pm
Why are you bothering with code written by others? Without fancy CPU optimizations NNUE is pretty trivial, right? You just need 2*64*256 piece-square tables, 256 for each location of the white King, and 256 for each location of the black King. The 2*256 PST sums for the current King position are recalculated from scratch when you move a King, or incrementally updated when you move another piece. You then multiply each of these 512 values by a weight, add them and set the result to zero if it was negative, and do that 32 times (each time with a different set of weights). With the 32 results you repeat the multiply - sum - clip 32 times, to get again 32 results. These you just mutiply and add (no clipping), to get the evaluation score.

I am sure writing code like

Code: Select all

for(i=0; i<32; i++) {
  int sum = 0;
  for(j=0; j<512; j++) {
    sum += weights1[i][j] * layer1[j]
  }
  layer2[i] = max(0, sum);
}
is not really a challenge for anyone.

See https://www.chessprogramming.org/Stockfish_NNUE .
I wonder why auto-vectorization is not used instead of the manual SIMD code NNUE currently has. There is separate code for AVX2, SSE3,SSE2,SSE etc which is kind of ugly. Your code above can be easily auto-vectorized by the compiler, so I wonder why this approach is not taken. I don't see any operation preventing auto-vectorization in a simple dense network. The NNUE code either doesn't have easily vectorizable "default code" or compilers do a really bad job at it as it seems it is 3x slower without vectorization.

User avatar
maksimKorzh
Posts: 334
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Hacking around CFish NNUE

Post by maksimKorzh » Thu Oct 15, 2020 5:18 pm

Daniel Shawul wrote:
Thu Oct 15, 2020 4:56 pm
maksimKorzh wrote:
Thu Oct 15, 2020 2:38 pm
Well ok, no reply is also an answer...

What I've managed to achieve so far:
1. Compiled nnue.c separately
2. Initialized weights from file (well at least think so...)

Now in order to call nnue_evaluate(Position *pos) the only thing I have a lack of is position object.
First I was trying to initialize it from FEN, but getting segmentation fault all the time...
But then I've realized that probably there might not be a need of placing pieces on board
because the only fields of position object used in nnue.c are dirtyPiece and accumulator

So let me narrow my question from why life is unfair to code monkeys to the following:
1. Is anyone aware of what are dirtyPiece and accumulator
2. HOW can I initialize them?
Accumulator is something NNUE takes care of for you. You just have to allocate space for it on the stack.
DirtyPiece is only needed for incremental updating.
Maybe you can try first to do without incremental evaluation (i.e. no need to update dirtyPiece or accumulator in make).
It is only 14-18% slower. To disable incremental update comment out this line:
https://github.com/syzygy1/Cfish/blob/m ... ue.c#L1025
Then you don't have to worry about updating accumulator or dirtyPiece in make move.
Hold on a sec...
If I don't need neither dirtyPiece nor accumulator then I don't need Position *pos at all? Is that correct?
But then I feel completely lost while trying to understand HOW board position is used as an input to get score from NNUE?

OMG why is this so complicated (rhetoric question)
Why don't somebody smarter than I create a standalone NNUE program that would take FEN as input and give score as output?
Is that possible? Maybe someone has done it already?
That would be the best source of learning for me.

User avatar
maksimKorzh
Posts: 334
Joined: Sat Sep 08, 2018 3:37 pm
Location: Ukraine
Full name: Maksim Korzh
Contact:

Re: Hacking around CFish NNUE

Post by maksimKorzh » Thu Oct 15, 2020 5:21 pm

Yeah guys, just to avoid keep torturing you with my dumbness I would ask question in a bit different way.

In a perfect world I would like to get the following program:

1. Take FEN string as input
2. Return NNUE score as output

That's it.

Please don't tell me this is SLOW and doesn't make sense.
Just tell me - is that possible?
If so - what steps to take to create that program?
Or maybe someone has done it before?

Daniel Shawul
Posts: 4053
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: Hacking around CFish NNUE

Post by Daniel Shawul » Thu Oct 15, 2020 5:27 pm

Think of Position*, containing your FEN and (Accumulator and DirtyPiece) structures.
NNUE populate these structures using the function.

Code: Select all

void half_kp_append_active_indices
Modify that to be based on your FEN rather than the bitboards code that it assumes the engine uses exactly like Stockfish does.
Also don't forget to comment out the incremental update as I mentioned, otherwise it will touch parts of the code that are corrupt.

If you are frustrated, you can wait for me to add NNUE it to my library that already does EGTB and NN probe :)

Post Reply