You mean the King location of the side not to move is not taken into account at all? Are you sure than that it is not the other way around? In Shogi it makes eminent sense to ignore the King of the side to move. In Shogi ('Tsume') problems they don't even put such a King on the board. The opponent is the one you have to checkmate if you have the initiative, so how your pieces constrict the net around his King is of paramount importance. The side that doesn't have the initiative will be in check all the time, and will not evaluate.syzygy wrote: ↑Sat Oct 24, 2020 3:20 pmThe flip takes place when converting the incrementally updated accumulator to the input values for the first hidden layer. If white is to move, the white half of the accumulator gives coefficients 0-255. If black is to move, the black half of the accumulator gives coefficients 0-255.
BTW, the white and black accumulators are still doing what I called leap-frogging: they are used only every alternate ply. I had just expected that there would be similar tables (with different weights, of course) for the other player, rather than effectively zeroing all weights there by not using the outputs in the next layer. That seems much more natural fo Chess, were the defending side also evaluates.
Well, everything is relative. Perhaps it would have been 600 Elo stronger with a better net topology.How many "big mistakes" have you made that work so incredibly well as NNUE in its current implementation?
Even then, bungling an obviously won position even one time doesn't qualify as 'incredibly well'. But I always cared more for worst-case behavior than for average behavior.I've seen just one post that claims that, and that seemed to have been based (as usual) on a single position and probably on a single pair of play-outs from that position.
OK, so the posting I had seen might be a bit alarmist.I've seen another post reporting that NNUE improves endplay considerably more than TBs do. (And then NNUE+TBs turned out to do even better still, which is not surprising as they are orthogonal concepts.)
I've also seen SF-NNUE play games and from what I have seen I can only conclude that it knows very well which endgames are won and which are drawn or lost.
Indeed, large NN can do miraculous things. But the computational costs are significant. Perhaps you could do with half the number of neurons if you used slightly more relevant inputs, and I understand that this would give a significant speedup.It is clear that the input layer fails to capture a lot of obviously important chess knowledge, but apparently the two hidden layers make up for that very well.
One of the other obvious things to try is not alternately use the full white and the full black accumulator to feed the next layer, but alternately use half-white & half-black and then the other half-white and half-black.It would be surprising if the current implementation could not be improved.
It seems adding Kk inputs basically means making the 256-element biases vector dependent on the positions of the two kings. (But from a learning perspective this might not be a useful way to look at things.)