jonkr wrote: ↑Sun Jun 13, 2021 12:01 am
It was helpful for me to remove positions with immediate tactics, I did this by just running the Qsearch and saving the position at the end of the PV. (I think actually I'm using depth 1 search now which includes Qsearch.) Then this was the position that would be scored. I used a combination of game result and search score. Using quiet positions measured a clear gain over not doing it, but wasn't a huge difference.
I tried to do the same yesterday. I made the PV-stack extend all the way down into qsearch and then after searching, I would save the position's score, go down the PV and save the resulting position. I believe that is what you mean too? Or do you also search the quiet position at the end of qsearch again?
jonkr wrote: ↑Sun Jun 13, 2021 12:01 am
Is your network giving away a queen even after searching? If so that sounds like there's a bug somewhere in your code. Either in the training process, the training data itself, or the network value calculation. (If it prefers a move leaving a queen hanging without searching I wouldn't worry, your network isn't big enough to detect tactics well.)
Yes, it usually just hangs a piece for one of the sides (sometimes it doesn't even see the recapture) in what seems like a rather random way. I don't think it is a problem with either 1) My network forward propagation or 2) My training process. The reasons are that 1) When I train a network with pure static evaluations, it does not hang pieces in the same manner (it still does, but that is because it doesn't search a deep as the HCE version), 2) If I use a really small dataset (10-100 positions), it quite easily overfits, so the training algorithm seems to do what it's meant to.
For the input to first layer, I use incremental updates in make/unmake move, which introduces another source of the bug. However, I think it's unlikely since I tested the incremental update for undo and do move while doing perft, and I didn't catch any bugs.
Therefore, I think the problem is in the training data itself... I am using ~72M positions from the lichess database that I search to depth 2 (very low depth, but it should give at least a descent strength compared to HCE), and as I said, I (now) use the position at the end of the PV.
BTW. The way I represent the board for the input is, as mentioned, 12 pieces * 64 squares. It goes: WP, WN, WB, WR, WQ, WK, BP, BN, BB, BR, BQ, BK, with a 1 for a piece present and 0 for no piece. I don't swap these depending on the side to move because I thought it would be good enough to get a white-relative score for each evaluation and then just inverting it in case it was black's move. Does this sound okay?