NNUE Question - King Placements

hgm · Post by **hgm** » Sat Oct 24, 2020 5:56 pm

syzygy wrote: ↑Sat Oct 24, 2020 5:20 pmThe flip takes place when converting the incrementally updated accumulator to the input values for the first hidden layer. If white is to move, the white half of the accumulator gives coefficients 0-255. If black is to move, the black half of the accumulator gives coefficients 0-255.

You mean the King location of the side not to move is not taken into account at all? Are you sure than that it is not the other way around? In Shogi it makes eminent sense to ignore the King of the side to move. In Shogi ('Tsume') problems they don't even put such a King on the board. The opponent is the one you have to checkmate if you have the initiative, so how your pieces constrict the net around his King is of paramount importance. The side that doesn't have the initiative will be in check all the time, and will not evaluate.

BTW, the white and black accumulators are still doing what I called leap-frogging: they are used only every alternate ply. I had just expected that there would be similar tables (with different weights, of course) for the other player, rather than effectively zeroing all weights there by not using the outputs in the next layer. That seems much more natural fo Chess, were the defending side also evaluates.

How many "big mistakes" have you made that work so incredibly well as NNUE in its current implementation?

Well, everything is relative. Perhaps it would have been 600 Elo stronger with a better net topology.

I've seen just one post that claims that, and that seemed to have been based (as usual) on a single position and probably on a single pair of play-outs from that position.

Even then, bungling an obviously won position even one time doesn't qualify as 'incredibly well'. But I always cared more for worst-case behavior than for average behavior.

I've seen another post reporting that NNUE improves endplay considerably more than TBs do. (And then NNUE+TBs turned out to do even better still, which is not surprising as they are orthogonal concepts.)

I've also seen SF-NNUE play games and from what I have seen I can only conclude that it knows very well which endgames are won and which are drawn or lost.

OK, so the posting I had seen might be a bit alarmist.

It is clear that the input layer fails to capture a lot of obviously important chess knowledge, but apparently the two hidden layers make up for that very well.

Indeed, large NN can do miraculous things. But the computational costs are significant. Perhaps you could do with half the number of neurons if you used slightly more relevant inputs, and I understand that this would give a significant speedup.

It would be surprising if the current implementation could not be improved.

It seems adding Kk inputs basically means making the 256-element biases vector dependent on the positions of the two kings. (But from a learning perspective this might not be a useful way to look at things.)

One of the other obvious things to try is not alternately use the full white and the full black accumulator to feed the next layer, but alternately use half-white & half-black and then the other half-white and half-black.

syzygy · Post by **syzygy** » Sat Oct 24, 2020 6:51 pm

hgm wrote: ↑Sat Oct 24, 2020 5:56 pm
syzygy wrote: ↑Sat Oct 24, 2020 5:20 pmThe flip takes place when converting the incrementally updated accumulator to the input values for the first hidden layer. If white is to move, the white half of the accumulator gives coefficients 0-255. If black is to move, the black half of the accumulator gives coefficients 0-255.
You mean the King location of the side not to move is not taken into account at all?

No, the other side goes to coeficients 256-511. I had already explained this in detail in this thread so I thought it was not necessary to repeat everything (and that mentioning "0-255" at all would be enough of a hint that there was more).

I've seen just one post that claims that, and that seemed to have been based (as usual) on a single position and probably on a single pair of play-outs from that position.
Even then, bungling an obviously won position even one time doesn't qualify as 'incredibly well'. But I always cared more for worst-case behavior than for average behavior.

If you care about worst-case behaviour then all engines suck without exception, and all humans too.

hgm · Post by **hgm** » Sat Oct 24, 2020 7:07 pm

syzygy wrote: ↑Sat Oct 24, 2020 6:51 pmNo, the other side goes to coeficients 256-511. I had already explained this in detail in this thread so I thought it was not necessary to repeat everything (and that mentioning "0-255" at all would be enough of a hint that there was more).

Oh, sorry. I falsely remembered the number of cells as only 256. OK, then it exactly behaves as I would expect.

If you care about worst-case behaviour then all engines suck without exception, and all humans too.

Indeed, and this has always worried me. No one seems to care; they sacrifice worst-case for a better average (= Elo) all the time.

I wonder what the Xiangqi / Janggi equivalent of NNUE would be. Making the PST dependent on King position there is much less helpful, because you know the King must be in the Palace, which is pretty small (9 King locations). So indeed ordinary PST already tend to draw the pieces towards the Palace very strongly. Of course you could take the entire Palace into account (King + 2 Advisers); together these have 119 constellations in Xiangqi (but 333 in Janggi, where the Advisers can get everywhere too). In Xiangqi you could double the number of Palace states by also encoding whether there is an Elephant in the Palace (which can only be on one square). In Janggi an Elephant can reach every square, so it is not so obvious you would like to use it as a defender at all, and if so, where.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Oct 24, 2020 8:39 pm

hgm wrote: ↑Sat Oct 24, 2020 2:56 pm So the images are still wrong: they state there are 41K inputs, while in fact there are 2 x 41K inputs.

Sorry, had a wonderful bike tour today

syzygy · Post by **syzygy** » Sat Oct 24, 2020 8:59 pm

hgm wrote: ↑Sat Oct 24, 2020 7:07 pm
If you care about worst-case behaviour then all engines suck without exception, and all humans too.
Indeed, and this has always worried me. No one seems to care; they sacrifice worst-case for a better average (= Elo) all the time.

I think in most areas it is more fruitful to worry about the average case than the worst case. In the field of error-correcting codes, people used to worry about the worst case, i.e. the minimum distance between any two code words. But nowadays engineers only care about the average case. It turns out that randomly picked error-correcting codes have great average-case performance when the code words are long enough (close to the Shannon limit) and the search for better codes is now a search for codes that behave like random codes but have enough structure that they can be implemented efficiently and with few hardware resources.

I wonder what the Xiangqi / Janggi equivalent of NNUE would be. Making the PST dependent on King position there is much less helpful, because you know the King must be in the Palace, which is pretty small (9 King locations). So indeed ordinary PST already tend to draw the pieces towards the Palace very strongly. Of course you could take the entire Palace into account (King + 2 Advisers); together these have 119 constellations in Xiangqi (but 333 in Janggi, where the Advisers can get everywhere too). In Xiangqi you could double the number of Palace states by also encoding whether there is an Elephant in the Palace (which can only be on one square). In Janggi an Elephant can reach every square, so it is not so obvious you would like to use it as a defender at all, and if so, where.

I don't know much about Xiangqi, but if there are only 9 king locations, that would seem to just make it computationally easier to train the network. Is there a clear reason why the current SF-NNUE approach would not do as well in Xiangqi?

hgm · Post by **hgm** » Sat Oct 24, 2020 10:13 pm

It would probably do well, but what I am afraid of is that in comparison it would not do very much better than ignoring the King location alltogether (ie. an NNUE network built on PST rather than KPST). Because it is always implied reasonably accurately where the Kings can be found. For Chess and Shogi it matters much more where the King is, as it can be anywhere, and typically moves to one of the corners. The PST would be completely different for Kings in such different locations. So I was thinking in what way the computational effort could be used in a more relevant way instead.

An NNUE net based on pure PST would have far fewer inputs and weights, but would not give spectacular computational savings: the only thing you save is the complete recalculation on the occasional King move. And with 9 King positions you would have to recalculate on a King move anyway. By making the inputs dependent on both King and Adviser location, you also would have to do such recalculations on Adviser moves, which would of course drive up the computational cost. But it seems important to know where the advisers are, in particular whether they shield the King from the left or right, so that you know from which wing you should attack the Palace.

An alternative could be to divide all possible constellations into classes, grouping the somewhat similar ones together (e.g. all those that expose your from the left in a difficult-to solve way) and have those use the same PST set. That way you could involve more defensive pieces without getting too many weights. And moves of the defensive pieces that stay in the same class (like moving an Elephant from one square where it is useless to another) would not require recalculation.

syzygy · Post by **syzygy** » Sat Oct 24, 2020 10:44 pm

hgm wrote: ↑Sat Oct 24, 2020 10:13 pm An NNUE net based on pure PST would have far fewer inputs and weights, but would not give spectacular computational savings: the only thing you save is the complete recalculation on the occasional King move. And with 9 King positions you would have to recalculate on a King move anyway. By making the inputs dependent on both King and Adviser location, you also would have to do such recalculations on Adviser moves, which would of course drive up the computational cost. But it seems important to know where the advisers are, in particular whether they shield the King from the left or right, so that you know from which wing you should attack the Palace.

The computational saving would be in the training phase. You could obtain a trained network relatively quickly and see how well it does. And perhaps it would make sense to make other layers smaller as well.

Madeleine Birchfield · Sat Oct 24, 2020 11:57 pm

hgm wrote: ↑Sat Oct 24, 2020 10:13 pm It would probably do well, but what I am afraid of is that in comparison it would not do very much better than ignoring the King location alltogether (ie. an NNUE network built on PST rather than KPST). Because it is always implied reasonably accurately where the Kings can be found. For Chess and Shogi it matters much more where the King is, as it can be anywhere, and typically moves to one of the corners. The PST would be completely different for Kings in such different locations. So I was thinking in what way the computational effort could be used in a more relevant way instead.

An NNUE net based on pure PST would have far fewer inputs and weights, but would not give spectacular computational savings: the only thing you save is the complete recalculation on the occasional King move. And with 9 King positions you would have to recalculate on a King move anyway. By making the inputs dependent on both King and Adviser location, you also would have to do such recalculations on Adviser moves, which would of course drive up the computational cost. But it seems important to know where the advisers are, in particular whether they shield the King from the left or right, so that you know from which wing you should attack the Palace.

An alternative could be to divide all possible constellations into classes, grouping the somewhat similar ones together (e.g. all those that expose your from the left in a difficult-to solve way) and have those use the same PST set. That way you could involve more defensive pieces without getting too many weights. And moves of the defensive pieces that stay in the same class (like moving an Elephant from one square where it is useless to another) would not require recalculation.

In games where the king is like every other piece (antichess comes to mind) or where no king piece exists (draughts), would we just use a NNUE built out of PST and treat the king as a regular piece?

Raphexon · Post by **Raphexon** » Sun Oct 25, 2020 1:45 am

On the Lc0 discord somebody came along recently talking about GPL-defying clones and NNUE having brought a new revolution in computer Xiangqi too.

I had an extremely hard time confirming it since Xiangqi literally translates to Chess so whenever I search I get a lot of links to Stockfish...
But I've sharpened up my googlefu again and tadaa:
(everything translated with google translate...)

http://www.ccyclone.com/buy.html

"! ! ! The Tornado engine has been updated to the latest version 2020.9.19, with more powerful algorithms, more powerful neural networks, and unique chess skills. It is amazing! ! ! "

http://www.xqbase.com/xqwizard/bugchess.htm

　The bug engine has far exceeded the level of human chess champions on ordinary laptops (the rating is around 2900 points ), and the rating is expected to exceed 3200 points on high-performance multi-core computers (such as network servers, graphics workstations, etc.) .
　　In the 22nd International Computer Olympiad held in Macau in August 2019, Bugs Chess won the Chinese Chess Championship with a record of 7 wins and 1 draw;　　
　　In the 13th China Computer Game Championships held in Beijing in October 2019, Bugs Chess won the Chinese Chess Championship with 8 wins and 2 draws.

Bug Engine Update Log
　　20.9.19
　　1. Using the latest NNUE and AlphaZero neural network technology to achieve, the chess power is greatly improved;
　　2. Support AVX2 instruction set to improve computing speed;
　　3. Update the knowledge base.

So it seems like NNUE has entered the world of commercial Xiangqi software too.

Pedro · Post by **Pedro** » Wed Oct 28, 2020 12:23 pm

Guys, I'm a layman, but how is the Stockfish neural network trained? We know that Leela's network is trained from zero, but is NNUE training supervised? How is it done?

NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements