I've gotten a couple more successful aggression patches in the last couple of days.
My first idea was to instead of just giving a bonus to positions where the static evaluation was much better than the material count, explicitly look for sacrifices in the search line. This means that we make a note of what the material evaluation is at root, and then compare the material evaluations for each ply that's been played in search so far. If we find that there's been a point where we sacrificed material (i.e. we went down in material and didn't immediately take it back), we give a bonus to the evaluation of the current position.
A couple notes on this:
It doesn't matter how far back the sacrifice occurred, as long as it occurred between root and the current position being evaluated, it still gets the evaluation bonus. I don't personally like the idea of giving positions 20 plies ahead of the sacrifice and completely unrelated to it bonuses, but how is the engine supposed to be incentivized to sacrifice pieces if the score at the end of the search tree isn't any better?
In-between checks before recaptures count as sacrifices according to my current method of sacrifice detection, while removing the defender sacrifices don't. I want to work on this problem some more later, but it seems hard to differentiate between actual sacrifices and in-between moves.
Anyways, the results of this test were quite positive!
Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player
-------------------------------------------------------------------
1 112165 10.89% 20.73% 15.58% 72 evalv2
2 87606 07.63% 19.43% 21.66% 73 base
The next patch I did was attempt to tone down the number of bad draws. I give a penalty to all draw scores where: (a) there is plenty of material on the board, and (b) the side that is responsible for the draw isn't down in material. This is because endgame draws are much more reasonable than 20 move draws by repetition in positions that still look like they have plenty of life, and because if you're down in material a draw is a reasonable result, because there's a good chance you lose otherwise. The penalty is currently set at 20 centipawns but could be changed later on.
Results of that test:
Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player
-------------------------------------------------------------------
1 102437 08.41% 21.64% 20.43% 68 searchv1
2 76148 10.17% 17.13% 22.88% 74 base
It also gained 20 elo because I changed TT cutoffs to only be available on non-pv nodes rather than all nodes that are not root.
I had a few tests in between that looked like they might have increased the EAS score, but I wasn't sure if the sample size and significance was big enough. Currently I run 3k game STC tests, and my "cutoff" for accepting a patch is a 20k increase in EAS score. However, I will definitely have to run a statistical test sooner or later. Most likely this will take the form of 20 matches of the same Patricia version, giving 40 data points; I'll discard the highest and lowest outliers and that will give me a 95% confidence interval for 3000 game matches. That's quite a few games, but it shouldn't take more than a week on my computer.
After one more patch shows improvement in aggression I'll rerun the gauntlet from before to see what has changed. Hopefully Patricia will be more clearly in first this time!