Didactical neural nets
Moderators: hgm, Rebel, chrisw
-
- Posts: 50
- Joined: Sun Apr 07, 2019 1:10 am
- Full name: Giovanni Lavorgna
Didactical neural nets
I would like to learn how neural nets have replaced the hand tuned values of the engine's evaluation functions. However, present neural nets have grown too big and I am afraid also that the complexity of generating them would be too much for me (and for my hardware). So, I thought that I could start from a simpler position, say rook + king vs. king alone. Is there already someone who has done something similar for didactical purposes? Also, if I should build it myself where I should start from?
-
- Posts: 27986
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Didactical neural nets
I don't think neural nets could do KRK, at least not with the standard training method. Because the latter only tries to teach the net how large the probability for winning is. Which in KRK is always 100% (with white to move). The important thing you need to know for playing KRK is how long it will take to win from a given position, not what your winning chances are.
Of course an engine that searches deep enough might be able to find the checkmate in KRK even without any evaluation.
Of course an engine that searches deep enough might be able to find the checkmate in KRK even without any evaluation.
-
- Posts: 50
- Joined: Sun Apr 07, 2019 1:10 am
- Full name: Giovanni Lavorgna
Re: Didactical neural nets
Thanks for your answer. Then, how about the following ending (Fischer- Taimanov, 1970)? It is a relatively more complex position, where commentators saw a White advantage, mainly due to the fact that Bishop is better than the Knight. Would it be possible (and simpler) to build a neural net from this position?
[d]7r/3k1pp1/pp3n1p/2p1R3/5P2/6P1/PPP3BP/6K1 w - - 0 25
[d]7r/3k1pp1/pp3n1p/2p1R3/5P2/6P1/PPP3BP/6K1 w - - 0 25
-
- Posts: 762
- Joined: Sun Jul 25, 2010 10:07 pm
- Location: the Netherlands
- Full name: Jef Kaan
Re: Didactical neural nets
some decades ago the Dutchman Alex van Tiggelen did research on such endgame nets and did a Phd
about the topic (in that time much simpler than nowadays, but apparently he was in the right direction:
https://www.chessprogramming.org/Alex_van_Tiggelen
note there was some discussion about the complete program (Alexs) has having copied
some stuff from another engine (Nightmare) but that's beyond your topic
about the topic (in that time much simpler than nowadays, but apparently he was in the right direction:
https://www.chessprogramming.org/Alex_van_Tiggelen
note there was some discussion about the complete program (Alexs) has having copied
some stuff from another engine (Nightmare) but that's beyond your topic
-
- Posts: 539
- Joined: Thu Mar 09, 2006 3:01 pm
Re: Didactical neural nets
Older CNN format NNs with Lc0 as small as 8x1 can easily learn KRvK.
At least the nets can mate with Lc0 doing some searching, FWIW.
Note that this type of net is not the same as the Stockfish NNs.
I could not train nets to mate with KBNvK however.
-
- Posts: 27986
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Didactical neural nets
A conventional evaluation with only piece-square tables is also a neural net, a very simple one. And penalizing the bare Kingfor approaching the corner of the shade of the Bishop should be enough to make an engine win KBNK even with very shallow thinking. So it is strange that you could not train it. It must be due to the training method.
-
- Posts: 1473
- Joined: Mon Apr 23, 2018 7:54 am
Re: Didactical neural nets
Older posts:
brianr wrote: ↑Fri May 20, 2022 3:43 pm Endgames with a few pieces do not work, at least for me.
I was able to quickly train small nets to mate with KQvK or KRvK.
However, I could not train a net to mate with KBNvK.
I suspect it is because there a[re] too FEW positions available.
When training those test nets I generated every possible endgame position as input.
It could also be related to the fact that NN search is generally quite poor at tactics compared to A/B engines which excel at tactics.
With "early" endgame positions with more pieces there are enough positions to train.
However, then the problem becomes that with table bases available, it is a "waste" to train nets to learn what the TBs already contain as "perfect" information. In practice those endgame nets will fare poorly vs "standard" nets with TB access, so why bother.
Others have trained endgame-specific nets with around 98% accuracy (IIRC), but of course that is useless in actual play when TBs are 100%.
Are you saying Lc0 without TBs can mate with KBNvK? IIRC, that was not the case for a long time in its development. If it can, but only with deep search, I would question how big an NN achievement that is...
-
- Posts: 244
- Joined: Thu Jul 21, 2022 12:30 am
- Full name: Chesskobra
Re: Didactical neural nets
Since we are discussing Patricia 3 in another thread, I thought the following game would interest this discussion as well as the discussion about Patricia 3. Today I ran a few games between Patricia 3 and Lc0 (net 792013 available on the Lc0 best nets page), at time control 10m+6s, on a CPU, without tablebases. I loved the game for the very nice technique displayed by Lc0 (792013). TL;DR: at some point it was Patricia's KQ vs KRBNP of Lc0. Leela displayed nice skills to gradually push the pawn, while protecting her king from checks, but I was not sure if she would convert. Then Patricia rightly gave up the queen for rook and pawn, leaving with KBN vs K endgame. Great practical decision I thought. I was again not sure if Lc0 would convert. At some point Patricia's king went to the wrong corner. But finally Lc0 did convert. At least to my amateur eyes, the endgame technique displayed here is pleasure to watch, although I am not claiming that it was the network magic as opposed to search. (On the other hand observe that Patricia sees mate but Leela is at least not declaring mate.)jp wrote: ↑Tue Jul 16, 2024 8:53 amOlder posts:
brianr wrote: ↑Fri May 20, 2022 3:43 pm Endgames with a few pieces do not work, at least for me.
I was able to quickly train small nets to mate with KQvK or KRvK.
However, I could not train a net to mate with KBNvK.
I suspect it is because there a[re] too FEW positions available.
When training those test nets I generated every possible endgame position as input.
It could also be related to the fact that NN search is generally quite poor at tactics compared to A/B engines which excel at tactics.
With "early" endgame positions with more pieces there are enough positions to train.
However, then the problem becomes that with table bases available, it is a "waste" to train nets to learn what the TBs already contain as "perfect" information. In practice those endgame nets will fare poorly vs "standard" nets with TB access, so why bother.
Others have trained endgame-specific nets with around 98% accuracy (IIRC), but of course that is useless in actual play when TBs are 100%.Are you saying Lc0 without TBs can mate with KBNvK? IIRC, that was not the case for a long time in its development. If it can, but only with deep search, I would question how big an NN achievement that is...
My apologies if it is not good etiquette to post long games; if so next time I will upload it somewhere like lichess study and post a link.
[pgn]
[Event "My Tournament"]
[Site "?"]
[Date "2024.07.16"]
[Round "2"]
[White "patricia-3-avx2"]
[Black "lc0-30 (792013)"]
[Result "0-1"]
[ECO "A43"]
[GameDuration "00:50:43"]
[GameEndTime "2024-07-16T11:39:24.337 -03"]
[GameStartTime "2024-07-16T10:48:40.567 -03"]
[Opening "Old Benoni defense"]
[PlyCount "324"]
[TimeControl "600+6"]
1. d4 {book} c5 {book} 2. d5 {book} e6 {book} 3. c4 {book} exd5 {book}
4. cxd5 {+0.65/27 24s} Nf6 {-0.76/10 52s} 5. Nf3 {+0.64/27 24s}
d6 {-0.71/11 34s} 6. e4 {+0.74/27 26s} g6 {-0.65/10 13s} 7. Bf4 {+0.73/27 46s}
Bg7 {-0.54/9 20s} 8. Bd3 {+0.62/27 26s} Nh5 {-0.35/9 26s} 9. Bg5 {+0.75/28 19s}
Bf6 {-0.30/9 7.0s} 10. Bh6 {+0.66/26 22s} Bxb2 {-0.04/10 16s}
11. Nbd2 {+0.55/27 23s} Bg7 {-0.02/10 10s} 12. Bxg7 {+0.58/26 19s}
Nxg7 {-0.03/9 7.3s} 13. a4 {+0.49/26 22s} O-O {+0.06/8 24s}
14. O-O {+0.63/26 20s} b6 {+0.10/8 21s} 15. a5 {+0.73/24 15s} Na6 {0.00/10 20s}
16. Be2 {+0.78/24 22s} Bb7 {+0.22/7 24s} 17. Re1 {+0.63/23 17s}
Nb4 {+0.13/7 13s} 18. Nb1 {+0.81/23 15s} bxa5 {+0.21/7 29s}
19. Nc3 {+0.82/22 16s} a6 {+0.20/6 20s} 20. Nd2 {+0.87/22 14s} f5 {+0.42/8 11s}
21. Bc4 {+0.65/23 14s} fxe4 {+0.53/8 13s} 22. Ndxe4 {+0.54/22 20s}
Nf5 {+0.57/10 27s} 23. h3 {+0.27/22 15s} Bc8 {+0.94/8 16s}
24. Qd2 {+0.21/23 19s} Nd4 {+1.26/10 11s} 25. Rac1 {+0.09/22 12s}
Bf5 {+1.02/9 16s} 26. Ne2 {0.00/24 13s} Ndc2 {+0.88/11 13s}
27. Rxc2 {0.00/28 16s} Bxe4 {+0.68/12 9.6s} 28. Rc3 {-0.09/29 41s}
Bf5 {+0.87/13 13s} 29. Nf4 {+0.06/22 12s} Qg5 {+1.19/9 19s}
30. Rg3 {+0.01/25 16s} Qh6 {+1.15/9 21s} 31. Re2 {+0.12/24 13s}
Rf7 {+1.28/10 12s} 32. Ree3 {-0.14/24 13s} Nc2 {+0.90/9 40s}
33. Re6 {-0.61/27 98s} Rd8 {+0.85/14 3.6s} 34. Re2 {-0.78/26 7.4s}
Nd4 {+2.54/8 18s} 35. Ree3 {-0.42/26 8.1s} Qxf4 {+2.27/13 14s}
36. Re8+ {-0.41/26 7.5s} Rxe8 {+2.00/13 9.1s} 37. Qxf4 {-0.60/28 13s}
Bd3 {+1.92/12 8.7s} 38. Qxd6 {-0.51/29 6.4s} Bxc4 {+1.82/11 8.4s}
39. Rg4 {-0.53/25 7.7s} Bb5 {+1.72/10 36s} 40. Qxc5 {-0.67/27 11s}
Nf5 {+1.66/9 26s} 41. h4 {-0.62/26 13s} Ng7 {+1.65/8 4.7s}
42. f3 {-0.74/26 7.9s} a4 {+1.54/7 38s} 43. d6 {-0.45/25 10s} Rf5 {+1.54/6 10s}
44. Qd4 {-0.81/25 9.4s} Rd8 {+1.58/7 16s} 45. Re4 {-1.07/26 17s}
Rd7 {+1.67/8 5.5s} 46. Re3 {-1.13/26 9.0s} h5 {+1.63/7 14s}
47. Rc3 {-1.19/27 8.7s} Rf8 {+1.63/7 21s} 48. Qe5 {-1.28/26 8.7s}
Re8 {+2.01/7 14s} 49. Qf6 {-1.33/31 5.2s} a3 {+2.59/8 9.1s}
50. Rxa3 {-1.33/27 4.7s} Re6 {+2.25/7 9.9s} 51. Qf4 {-1.33/33 7.5s}
Rdxd6 {+2.14/7 13s} 52. Re3 {-1.47/33 11s} Be8 {+2.30/6 19s}
53. Re5 {-1.47/29 6.4s} Kh7 {+2.45/5 10s} 54. Rxe6 {-1.57/29 9.4s}
Rxe6 {+2.14/6 5.2s} 55. Qc7 {-1.61/30 5.4s} Rf6 {+2.17/5 11s}
56. Qa7 {-1.68/28 4.6s} Bb5 {+1.94/5 13s} 57. Qb8 {-1.78/31 14s}
Rf7 {+2.14/5 8.7s} 58. Qe5 {-1.78/31 4.3s} Rf5 {+2.44/5 7.6s}
59. Qe4 {-1.78/28 4.7s} Be8 {+4.42/5 6.3s} 60. Qe7 {-1.88/28 4.4s}
a5 {+5.53/5 6.4s} 61. Qe4 {-2.11/33 11s} a4 {+6.02/5 7.6s}
62. Kh2 {-2.11/36 4.6s} Ra5 {+5.03/5 12s} 63. Qb4 {-2.11/37 4.6s}
Ra7 {+4.67/5 7.9s} 64. Qa3 {-2.11/36 4.7s} Bf7 {+5.01/5 6.8s}
65. g4 {-2.00/27 8.3s} Bb3 {+6.64/5 6.4s} 66. Qb4 {-1.97/29 4.9s}
Rf7 {+7.00/5 6.5s} 67. Kg2 {-1.97/30 5.3s} Bd5 {+10.24/6 8.0s}
68. gxh5 {-1.91/31 4.9s} Nxh5 {+8.63/6 5.5s} 69. Qxa4 {-2.01/33 7.0s}
Rxf3 {+7.75/5 6.6s} 70. Kg1 {-2.10/34 6.8s} Nf6 {+8.05/5 6.3s}
71. Qd4 {-2.00/34 9.3s} Rf5 {+6.12/5 7.1s} 72. Qd2 {-2.00/30 6.7s}
Ng4 {+5.51/4 7.0s} 73. Qa5 {-2.11/31 4.9s} Kh6 {+5.13/5 6.3s}
74. Qd8 {-2.00/36 4.8s} Kh5 {+3.84/6 5.6s} 75. Qh8+ {-2.12/29 5.6s}
Nh6 {+3.27/1 0s} 76. Qg7 {-2.19/31 4.7s} Bc4 {+3.17/5 13s}
77. Kg2 {-2.11/30 4.3s} Be2 {+3.30/5 5.8s} 78. Qh8 {-2.19/35 10.0s}
Ba6 {+3.15/5 6.5s} 79. Qg7 {-2.19/35 8.3s} Bc4 {+3.14/5 5.7s}
80. Qh8 {-2.19/33 9.0s} Bd5+ {+2.27/5 6.1s} 81. Kg1 {-2.19/35 4.4s}
Bf3 {+2.35/5 6.3s} 82. Kf2 {-2.13/32 5.6s} Bh1+ {+2.61/5 6.4s}
83. Ke2 {-2.13/32 4.5s} Bg2 {+2.85/4 5.6s} 84. Kd2 {-2.21/32 8.1s}
Bh3 {+2.76/4 5.5s} 85. Kc3 {-2.21/28 5.1s} Rf3+ {+3.49/5 5.6s}
86. Kd2 {-2.22/32 4.8s} Rf4 {+3.60/5 6.5s} 87. Qe5+ {-2.22/32 5.2s}
Rf5 {+2.80/6 2.8s} 88. Qh8 {-2.23/34 11s} Rf3 {+2.82/5 9.6s}
89. Qd8 {-2.23/32 4.4s} Nf7 {+3.58/6 4.2s} 90. Qg8 {-2.23/32 7.5s}
Bg4 {+2.91/6 8.6s} 91. Ke1 {-2.23/34 4.4s} Rf4 {+2.84/6 4.4s}
92. Qg7 {-2.23/35 5.1s} Nh6 {+3.04/6 7.1s} 93. Qe5+ {-2.18/35 4.8s}
Rf5 {+2.35/6 2.9s} 94. Qh8 {-2.18/33 5.1s} Rb5 {+2.01/5 9.5s}
95. Kd2 {-2.23/33 4.4s} Rc5 {+1.99/4 6.3s} 96. Kd3 {-2.23/36 5.1s}
Bf5+ {+2.86/5 4.6s} 97. Kd4 {-2.23/31 8.3s} Ra5 {+3.32/6 5.6s}
98. Qe8 {-2.23/33 21s} Ra2 {+4.14/6 4.7s} 99. Qh8 {-2.28/26 3.7s}
Ra4+ {+6.20/6 8.7s} 100. Kc5 {-2.31/30 5.4s} Rxh4 {+9.91/5 3.6s}
101. Qe5 {-2.34/30 4.4s} Rh3 {+9.01/5 6.4s} 102. Qe8 {-2.34/27 6.9s}
Rd3 {+9.46/5 8.4s} 103. Qe2+ {-2.40/29 9.5s} Ng4 {+16.21/5 3.7s}
104. Qe8 {-2.40/28 14s} Ne3 {+19.27/5 7.2s} 105. Qh8+ {-2.50/26 4.0s}
Kg4 {+16.46/5 5.4s} 106. Qh2 {-2.63/30 7.3s} Kf3 {+11.06/5 6.4s}
107. Qh1+ {-2.86/33 8.9s} Ng2 {+14.51/5 4.4s} 108. Qa1 {-2.62/22 3.4s}
Nf4 {+13.92/5 8.8s} 109. Qf1+ {-2.86/30 10.0s} Kg4 {+13.02/5 4.5s}
110. Qg1+ {-2.97/29 7.3s} Kh5 {+12.12/5 6.4s} 111. Qh2+ {-3.06/30 3.7s}
Nh3 {+14.29/5 6.1s} 112. Qe5 {-3.06/29 8.5s} Rf3 {+18.19/5 7.0s}
113. Qe2 {-3.42/30 5.5s} Kg4 {+9.31/5 5.7s} 114. Qg2+ {-3.42/36 3.4s}
Rg3 {+9.06/6 5.4s} 115. Qe2+ {-3.42/36 8.8s} Kh4 {+11.83/5 3.9s}
116. Kd6 {-3.38/25 6.2s} g5 {+17.34/5 8.0s} 117. Ke5 {-3.58/26 4.9s}
Bg4 {+20.14/5 6.8s} 118. Qe1 {-2.78/25 6.2s} Nf4 {+12.00/5 5.1s}
119. Kf6 {-2.37/27 3.7s} Bf3 {+11.99/4 6.2s} 120. Qxg3+ {-0.91/23 3.8s}
Kxg3 {+39.56/6 4.0s} 121. Kxg5 {-1.28/26 3.5s} Nd5 {+43.81/5 8.7s}
122. Kf5 {-1.33/25 3.8s} Kf2 {+18.73/5 5.9s} 123. Ke6 {-1.39/26 3.8s}
Ke3 {+35.66/4 5.6s} 124. Ke5 {-1.40/27 3.7s} Be4 {+11.81/5 6.4s}
125. Kd6 {-1.62/27 5.0s} Kd4 {+16.37/5 4.9s} 126. Kd7 {-1.66/27 15s}
Ke5 {+26.53/4 7.0s} 127. Kc6 {-9996.64/31 11s} Ne3+ {+22.75/4 6.5s}
128. Kd7 {-M88/33 5.2s} Nf5 {+63.37/4 6.0s} 129. Kc7 {-M96/33 4.6s}
Nd4 {+25.19/4 5.9s} 130. Kd8 {-M100/33 3.5s} Kd6 {+19.77/4 6.0s}
131. Ke8 {-M88/33 3.6s} Bg6+ {+28.93/5 6.1s} 132. Kf8 {-9984.58/29 4.0s}
Ne6+ {+10.42/5 6.0s} 133. Kg8 {-9996.42/32 7.5s} Be8 {+20.65/5 5.7s}
134. Kh8 {-9996.12/33 3.7s} Nf4 {+16.38/5 6.2s} 135. Kg7 {-9996.06/31 3.8s}
Ke7 {+11.61/4 5.4s} 136. Kg8 {-M174/34 6.9s} Kf6 {+10.64/5 6.7s}
137. Kh7 {-9997.78/32 4.6s} Bd7 {+9.82/5 6.0s} 138. Kh6 {-M128/35 11s}
Ng6 {+11.34/5 5.9s} 139. Kh7 {-M148/34 3.7s} Bb5 {+12.56/5 6.1s}
140. Kh6 {-M100/35 6.7s} Ne5 {+12.65/6 6.0s} 141. Kh7 {-M86/36 4.9s}
Bd3+ {+9.60/5 5.9s} 142. Kh8 {-M108/35 4.5s} Ng6+ {+7.27/6 5.6s}
143. Kh7 {-M122/33 7.6s} Bc4 {+5.14/6 6.0s} 144. Kh6 {-M62/34 4.0s}
Bg8 {+5.60/6 5.4s} 145. Kh5 {-M76/35 4.1s} Ne5 {+7.16/7 3.4s}
146. Kh4 {-M48/35 4.3s} Kf5 {+7.21/7 5.8s} 147. Kg3 {-M38/34 4.6s}
Ng4 {+8.12/8 4.5s} 148. Kg2 {-M30/35 4.7s} Bc4 {+9.24/7 8.4s}
149. Kf3 {-M26/36 6.2s} Bd3 {+7.85/8 8.2s} 150. Kg3 {-M24/37 7.7s}
Be4 {+7.90/8 4.8s} 151. Kh4 {-M24/37 6.0s} Kf4 {+4.85/8 5.8s}
152. Kh5 {-M22/38 6.1s} Bf5 {+5.73/9 7.7s} 153. Kh4 {-M20/38 5.2s}
Bg6 {+6.87/9 3.9s} 154. Kh3 {-M18/39 9.0s} Ne3 {+8.35/9 3.4s}
155. Kh4 {-M16/39 5.0s} Ng2+ {+M15/6 2.6s} 156. Kh3 {-M14/40 5.3s}
Kf3 {+M13/1 0s} 157. Kh2 {-M12/40 5.1s} Nf4 {+M11/1 0s} 158. Kg1 {-M10/39 6.2s}
Bd3 {+M9/1 0s} 159. Kh2 {-M8/40 10s} Kf2 {+M7/1 0s} 160. Kh1 {-M6/40 6.0s}
Kg3 {+M5/1 0s} 161. Kg1 {-M4/41 13s} Nh3+ {+M3/1 0s} 162. Kh1 {-M2/42 8.1s}
Be4# {+M1/1 0s, Black mates} 0-1
[/pgn]
-
- Posts: 1473
- Joined: Mon Apr 23, 2018 7:54 am
Re: Didactical neural nets
Thanks for posting this. So it shows Lc0 can convert (at least in this instance!).chesskobra wrote: ↑Tue Jul 16, 2024 10:52 pm ... Then Patricia rightly gave up the queen for rook and pawn, leaving with KBN vs K endgame. Great practical decision I thought. I was again not sure if Lc0 would convert. At some point Patricia's king went to the wrong corner. But finally Lc0 did convert. At least to my amateur eyes, the endgame technique displayed here is pleasure to watch, although I am not claiming that it was the network magic as opposed to search. (On the other hand observe that Patricia sees mate but Leela is at least not declaring mate.)
...
I started comparing the moves with TB's from move 121 to see moves where there are big discrepancies (i.e. DTM goes up by more than a few plies). There were a couple of instances (DTM going up by maybe 8 or so) in the first 20-ish moves, but my memory's not certain of which ones. Perhaps I'll re-do and complete the comparison...
-
- Posts: 514
- Joined: Tue Feb 04, 2014 12:25 pm
- Full name: Colin Jenkins
Re: Didactical neural nets
Focusing the didactic nature of the OP's query, this is the sort of thing I am doing...
Create a material (or PST) only version of your current eval.
Implement and artificially construct a 768x1x1 network with the material/PST values. i.e. generalising the size of the hidden layer, but it happens to be 1.
Write an eval to use the network, calculating the result from scratch and check it against the material/PST eval. (Use linear activation).
Implement an incremental update version of the network eval and test against the 'from scratch' version to make sure the results are the same. You now have NNUE!
Train a net with a slightly bigger (relu) hidden layer (say 8) using self-play didactic-net games - adding some randomness to avoid duplicate games. I found Claude and ChatGPT are very good at helping to write trainers.
Now using relu activation, make sure the new net beats the didactic net.
Repeat N times, increasing the number of games and size of the hidden layer.
Filter positions where the bm is a capture, promotion or the stm is in check and see if that helps training. Or filter based on qsearch.
Make beating your HCE a goal. I found a small 768x64x1 net trained on around 8M positions could easily beat my 2700 (CCRL) HCE.
At some point add perspective and quantization. Also try king based input layers.
Keep experimenting...
A wonderful resource:-
https://github.com/official-stockfish/n ... cs/nnue.md
Create a material (or PST) only version of your current eval.
Implement and artificially construct a 768x1x1 network with the material/PST values. i.e. generalising the size of the hidden layer, but it happens to be 1.
Write an eval to use the network, calculating the result from scratch and check it against the material/PST eval. (Use linear activation).
Implement an incremental update version of the network eval and test against the 'from scratch' version to make sure the results are the same. You now have NNUE!
Train a net with a slightly bigger (relu) hidden layer (say 8) using self-play didactic-net games - adding some randomness to avoid duplicate games. I found Claude and ChatGPT are very good at helping to write trainers.
Now using relu activation, make sure the new net beats the didactic net.
Repeat N times, increasing the number of games and size of the hidden layer.
Filter positions where the bm is a capture, promotion or the stm is in check and see if that helps training. Or filter based on qsearch.
Make beating your HCE a goal. I found a small 768x64x1 net trained on around 8M positions could easily beat my 2700 (CCRL) HCE.
At some point add perspective and quantization. Also try king based input layers.
Keep experimenting...
A wonderful resource:-
https://github.com/official-stockfish/n ... cs/nnue.md