A new(?) type of Neural Network for move ordering

LeviGibson · Post by **LeviGibson** » Sat Mar 26, 2022 6:08 pm

Hello,
I've recently found success in move ordering with a neural network. It takes much inspiration from NNUE. So much so that I decided to call it NNOM (Move Ordering Neural Network).
The architecture is 98304x512x384

The 98304 is the HalfKA feature set that the network uses. (12*64*64*2)
The 512 is a arbitrary size for the hidden layer.
The output, 384, are all the piece-square combos for a single side (6*64).

I've found a very fast way to implement it in the engine. I don't know if this technique is already out there somewhere, so let me know if it is

The hidden layer (512) is efficiently updated just like NNUE. This takes virtually no time.
In the output layer, only the output neurons that correspond to legal moves are calculated. This also takes virtually no time.

The network is also quantized to int16 much like NNUE.

Do take a look at the implementation (C):
https://github.com/LeviGibson/eggnog-ch ... ain/nnom.c
as well as the training (Tensorflow):
https://github.com/LeviGibson/policy-network

Let me know if you have any suggestions!

smatovic · Post by **smatovic** » Sat Mar 26, 2022 6:56 pm

LeviGibson wrote: ↑Sat Mar 26, 2022 6:08 pm [...]
Let me know if you have any suggestions!
[...]

Implement in Stockfish as a proof of concept

Do you have any numbers to share for NNOM? Drop in NPS? Elo gain in self-play?

--
Srdja

LeviGibson · Post by **LeviGibson** » Sat Mar 26, 2022 7:29 pm

Thanks for you suggestion!
I'll probably implement it in an engine like Etherial (C++ is confusing haha)
here are some results:

Code: Select all

Eggnog Chess Engine by Levi Gibson
position fen r1b1qr1k/4b1pp/1pn1pn2/p1pp2N1/3P1P2/2P1B1PB/PP2N2P/R2Q1RK1 w - - 2 15
go depth 10
info score cp 39 depth 1 seldepth 12 nodes 335 nps 30454 qnodes 294 tbhits 0 time 11 pv h3e6 
info string aspiration research. Window = 13
info score cp 6 depth 2 seldepth 22 nodes 13113 nps 234160 qnodes 12833 tbhits 0 time 56 pv h3e6 f6g4 
info score cp 17 depth 3 seldepth 24 nodes 33811 nps 272669 qnodes 33247 tbhits 0 time 124 pv h3e6 h7h6 e6c8 
info score cp 17 depth 4 seldepth 24 nodes 46443 nps 307569 qnodes 45211 tbhits 0 time 151 pv h3e6 h7h6 e6c8 a8c8 
info string aspiration research. Window = 6
info score cp 22 depth 5 seldepth 25 nodes 191069 nps 322207 qnodes 186844 tbhits 0 time 593 pv h3e6 f6g4 e3c1 e7g5 e6c8 
info string aspiration research. Window = 13
info score cp 12 depth 6 seldepth 29 nodes 336820 nps 336483 qnodes 322767 tbhits 0 time 1001 pv g5e6 e8h5 g1g2 h5h6 e3c1 h6h5 
info string aspiration research. Window = 27
info score cp 46 depth 7 seldepth 26 nodes 623711 nps 339157 qnodes 591029 tbhits 0 time 1839 pv g5e6 e8h5 g1g2 e7d6 e2g1 h5d1 f1d1 
info score cp 19 depth 8 seldepth 28 nodes 1016149 nps 344574 qnodes 957855 tbhits 0 time 2949 pv g5e6 e8h5 g1g2 c6d8 f4f5 d8e6 f5e6 h5g6 
info string aspiration research. Window = 27
info score cp 26 depth 9 seldepth 28 nodes 2422654 nps 336666 qnodes 2267890 tbhits 0 time 7196 pv g5e6 e8h5 g1g2 c6d8 f4f5 d8e6 f5e6 f6g4 f1f8 e7f8 
info string aspiration research. Window = 55
info score cp -1 depth 10 seldepth 35 nodes 21965296 nps 335532 qnodes 20724121 tbhits 0 time 65464 pv d1c2 e7d6 a1e1 h7h6 g5f3 f6e4 f3d2 e4f6 d2f3 f6e4 
bestmove d1c2

And the profile (gprof)
Any function with nnom in the name plays it's part in running the network.

Code: Select all

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 45.69     20.46    20.46 21923135     0.00     0.00  nnue_evaluate
 16.39     27.80     7.34 98417331     0.00     0.00  nnom_pop_bit
  8.89     31.78     3.98 50556501     0.00     0.00  nnom_set_bit
  5.63     34.30     2.52 226931031     0.00     0.00  make_move
  4.22     36.19     1.89   489558     0.00     0.00  quiesce
  3.73     37.86     1.67  9811671     0.00     0.00  generate_moves
  2.66     39.05     1.19 98417331     0.00     0.00  nnue_pop_bit
  1.88     39.89     0.84 25903769     0.00     0.00  see
  1.68     40.64     0.75 352577902     0.00     0.00  score_move
  1.32     41.23     0.59 50556501     0.00     0.00  nnue_set_bit
  1.23     41.78     0.55  1289737     0.00     0.00  refresh_accumulator
  0.83     42.15     0.37 26347577     0.00     0.00  get_smallest_attacker
  0.67     42.45     0.30       16     0.02     2.79  search
  0.65     42.74     0.29 158105325     0.00     0.00  is_move_direct_check
  0.63     43.02     0.28 15821667     0.00     0.00  clamp_accumulator
  0.49     43.24     0.22  9333417     0.00     0.00  calculate_l2_value
  0.49     43.46     0.22   132468     0.00     0.00  nnom_refresh_l1_helper
  0.38     43.63     0.17 15216497     0.00     0.00  is_square_attacked
  0.34     43.78     0.15  1241174     0.00     0.00  ProbeHash
  0.31     43.92     0.14  1172049     0.00     0.00  refresh_weak_squares
  0.28     44.05     0.13 26347577     0.00     0.00  get_move_key
  0.27     44.17     0.12 56345403     0.00     0.00  piece_at
  0.25     44.28     0.11 13975322     0.00     0.00  can_piece_move_to
  0.22     44.38     0.10 28138762     0.00     0.00  seeCapture
  0.20     44.47     0.09 15821667     0.00     0.00  propogate_l3
  0.12     44.52     0.06 13975322     0.00     0.00  update_occupancies
  0.11     44.57     0.05  1198999     0.00     0.00  gen_captures
  0.09     44.61     0.04  1198999     0.00     0.00  probe_table
  0.07     44.64     0.03                             insertion_sort
  0.07     44.67     0.03                             network_set_bit
  0.07     44.70     0.03                             search_position
  0.04     44.72     0.02  9333417     0.00     0.00  get_nnom_score
  0.03     44.74     0.02        1     0.02     0.02  init_zobrist_keys
  0.02     44.75     0.01  1198999     0.00     0.00  tb_probe_wdl_impl
  0.02     44.76     0.01   264936     0.00     0.00  generate_nnom_indicies_helper
  0.02     44.77     0.01        1     0.01     0.01  load_nnue
  0.02     44.78     0.01        1     0.01     0.01  transposition_free
  0.01     44.78     0.01  1241175     0.00     0.00  is_threefold_repetition
  0.00     44.78     0.00  1198999     0.00     0.00  get_wdl
  0.00     44.78     0.00  1198999     0.00     0.00  probe_wdl
  0.00     44.78     0.00   272981     0.00     0.00  RecordHash
  0.00     44.78     0.00    66582     0.00     0.00  make_null_move
  0.00     44.78     0.00    66234     0.00     0.00  nnom_refresh_l1
  0.00     44.78     0.00    10716     0.00     0.00  communicate
  0.00     44.78     0.00     5248     0.00     0.00  bishop_attacks_on_the_fly
  0.00     44.78     0.00       57     0.00     0.00  print_move
  0.00     44.78     0.00        2     0.00     0.00  generate_zobrist_key
  0.00     44.78     0.00        2     0.00     0.00  parse_fen
  0.00     44.78     0.00        1     0.00     0.00  change_to_correct_directory
  0.00     44.78     0.00        1     0.00     0.00  generate_only_legal_moves
  0.00     44.78     0.00        1     0.00     0.00  init_bishop_magics
  0.00     44.78     0.00        1     0.00     0.00  init_bishop_relevant_occupancies
  0.00     44.78     0.00        1     0.00     0.00  init_bitboards
  0.00     44.78     0.00        1     0.00     0.00  init_rook_magics
  0.00     44.78     0.00        1     0.00     0.00  init_see
  0.00     44.78     0.00        1     0.00     0.00  load_nnom
  0.00     44.78     0.00        1     0.00     0.00  parse_go
  0.00     44.78     0.00        1     0.00     0.00  parse_position
  0.00     44.78     0.00        1     0.00     0.00  reinit_transposition
  0.00     44.78     0.00        1     0.00     0.00  reset_hash_table
  0.00     44.78     0.00        1     0.00     0.00  start_time
  0.00     44.78     0.00        1     0.00     0.00  uci_loop

AndrewGrant · Post by **AndrewGrant** » Sat Mar 26, 2022 9:27 pm

Without having to read through the policy-network Repo, can you describe the training process briefly?
What exactly are the target labels in this case? What is your loss function(s)?

LeviGibson · Post by **LeviGibson** » Mon Mar 28, 2022 3:52 am

Hi,
The labels are just piece-square combos. Each piece to each square has it's own feature, and the network is set up like a classifier.
This has the weakness where, for example, two different knights moving to the same square would have the same label, but for move ordering I figured it wouldn't matter very much.
For loss I just used Tensorflow's categorical crossentropy although I'm not sure what it does (I'm new to neural networks).
For data I used the lichess masters database. I just went through all the games and matched each position to the move that was played in the game.

As a side note, thank you for Etherial! I found a huge bug in Eggnog chess while scrolling through Etherial.
When setting best moves in the transposition table for move ordering, I was only setting the best move if it produced a score greater than alpha.
so thanks for that!

A new(?) type of Neural Network for move ordering

A new(?) type of Neural Network for move ordering

Re: A new(?) type of Neural Network for move ordering

Re: A new(?) type of Neural Network for move ordering

Re: A new(?) type of Neural Network for move ordering

Re: A new(?) type of Neural Network for move ordering