NNUE and game phase

Dann Corbit · Post by **Dann Corbit** » Mon Jan 18, 2021 1:54 pm

Strong chess engines have information to help them handle changes in game state from opening to middle to endgame (some more complicated than others)

Why not do this with NNUE engines?

Analyze 100 million opening positions to make an opening NNUE clump.
Analyze 100 million midgame positions to make a midgame NNUE clump.
Analyze 100 million endgame positions to make an ending NNUE clump.
Then smoothly interpolate as we go from one phase to the next.

IOW an edge A or H pawn is worth less than a center pawn in the opening and more in the endgame. One size shoe does not fit all.

Dann Corbit · Post by **Dann Corbit** » Mon Jan 18, 2021 1:57 pm

The reason that this occurs to me is that LC0 is a tiger in the opening and a pussy-cat in the endgame.

hgm · Post by **hgm** » Mon Jan 18, 2021 2:05 pm

A single NNUE does this already automatically. The KPST that determine the inputs of the NN (as SUM_over_sqr KPST[n][pieceType[sqr]][sqr][kingSqr]) can also be used to calculate game phase. (By making the KPST independent of sqr and KingSqr, and using the same sign for both colors.) And when it is useful, it will certainly learn how to do that.

Look · Post by **Look** » Tue Jan 19, 2021 8:51 am

Hi,

As I mentioned in another thread there could be different types of NNUEs. The one related to game phase could be called King shelter. That is, bonus for pawns usually in front of the king. This should work well in middlegame , securing the king from opponents attacks. In endgame however this is not much of an issue, since king should become active like a knight or a bishop.

Sesse · Post by **Sesse** » Tue Jan 19, 2021 11:05 am

Look wrote: ↑Tue Jan 19, 2021 8:51 am Hi,

As I mentioned in another thread there could be different types of NNUEs. The one related to game phase could be called King shelter. That is, bonus for pawns usually in front of the king. This should work well in middlegame , securing the king from opponents attacks. In endgame however this is not much of an issue, since king should become active like a knight or a bishop.

As was pointed out in both that thread and this thread, the network does this itself automatically if it finds it useful. It will find its own king shelter feature, its own consideration of game phase, and how to weight it depending on game phase. Except you can't really untangle it from everything else that's going on.

tomitank · Post by **tomitank** » Tue Jan 19, 2021 8:48 pm

Dann Corbit wrote: ↑Mon Jan 18, 2021 1:54 pm Strong chess engines have information to help them handle changes in game state from opening to middle to endgame (some more complicated than others)

Why not do this with NNUE engines?

Analyze 100 million opening positions to make an opening NNUE clump.
Analyze 100 million midgame positions to make a midgame NNUE clump.
Analyze 100 million endgame positions to make an ending NNUE clump.
Then smoothly interpolate as we go from one phase to the next.

IOW an edge A or H pawn is worth less than a center pawn in the opening and more in the endgame. One size shoe does not fit all.

Because of its futile. It's depend on the inputs. NN learn this.
Please read more about neural networks.

mmt · Post by **mmt** » Thu Jan 21, 2021 2:03 am

tomitank wrote: ↑Tue Jan 19, 2021 8:48 pm Because of its futile. It's depend on the inputs. NN learn this.
Please read more about neural networks.

Smug and wrong, not a good combo. It might easily turn out that a different board representation as the input to the NN or a different NN architecture or a different number of parameters is superior in the endgame as opposed to the opening because you might get a smaller or more parallelizable net, allowing deeper search, or faster training or lower resulting loss. E.g. if you're in an endgame without bishops and queens, all knowledge the NN has about bishops and queens could slow down inference and lead to a shallower search.

tomitank · Post by **tomitank** » Thu Jan 21, 2021 7:24 am

mmt wrote: ↑Thu Jan 21, 2021 2:03 am
tomitank wrote: ↑Tue Jan 19, 2021 8:48 pm Because of its futile. It's depend on the inputs. NN learn this.
Please read more about neural networks.
Smug and wrong, not a good combo. It might easily turn out that a different board representation as the input to the NN or a different NN architecture or a different number of parameters is superior in the endgame as opposed to the opening because you might get a smaller or more parallelizable net, allowing deeper search, or faster training or lower resulting loss. E.g. if you're in an endgame without bishops and queens, all knowledge the NN has about bishops and queens could slow down inference and lead to a shallower search.

I add the NN evaluation to HCE. So I had to train an extremely little net. (768x16x1) And i used only 2.7M example.
This is not the norm either, but I don’t see the point in splitting it into multiple NNs.
If you have enough training example, this is not necessary. (IMO)
If there is one and working better, please let me know.

mmt · Post by **mmt** » Sat Jan 23, 2021 3:46 am

I don't know what would work well in your case. I did an experiment with a specific ending to check how well a custom-trained net can predict whether a position is a mate or not and the net could get 98% right vs 90% for SF NNUE. Based on this, I am sure that it's possible to improve SF NNUE by some Elo points by having multiple different nets, as this pretty much proves it. But is it worth the additional complexity and training costs for what could be a small gain? That's unclear. But the main idea does not deserve to be dismissed out of hand just because a general net can calculate the game state and "choose" different evals itself.

There is an important difference between NNs for chess and other board games and most other neural nets: we can generate huge amounts of training data very cheaply. This opens up some ways of doing things that don't generally apply to all NNs. Having multiple specialized nets with various architectures and inputs is one of the possibilities thanks to this, as you'll never run out of training data for your specialized net.

hgm · Post by **hgm** » Sat Jan 23, 2021 10:25 am

It is bound to be worse, because what you are in fact doing is forge a large net from multiple smaller nets, but then force a way to combine their results (like linear tapering) without giving the net the opportunity to optimize on that. If you would have started with a single bigger net, trained on the combined training sets, it would have had this opportunity, andwould have used it to do better.

Of course a larger net can do better than a small net; no surprise in that. It comes at a price, though, in terms of the nodes per second you can evaluate.

NNUE and game phase

NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase

Re: NNUE and game phase