patricia devlog

Discussion of chess software programming and technical issues.

Moderators: hgm, chrisw, Rebel

Whiskers
Posts: 227
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

I filtered about 12 million positions from the Willow dataset of 400 million positions I have on hand using the 6th filtering program I wrote in Patricia, that "saves positions where one side can't castle and has an open king". I then retrained Patricia's current network for 10 epochs on the new dataset at a much lower LR (in order to make sure that the entire network does not begin overfitting), in the hopes that the network would appreciate "good but not necessarily better than other options" attacks more and choose them over quiet play.

The results were somewhat promising: here's a result of a 3k game match between that Patricia and the original Patricia.

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    117347  11.17%  23.08%  17.82%   70   patty-dev  
   2     76785  08.05%  16.32%  21.15%   74   patricia  
The increase in sacrifice rate is statistically significant, and the strength loss was only maybe 5-10 elo. Not bad!

I'm going to do this for all 8 filtering programs. I hope Linux has a good program that can run round-robin tournaments; otherwise, I'll just have them all play matches against the original Patricia and pick the one that does best in aggressiveness while also not sacrificing too much elo. I have no problem giving up 10 elo for a better playstyle per patch, I have more of a problem giving up 100 elo.
User avatar
eboatwright
Posts: 41
Joined: Tue Jan 09, 2024 8:38 pm
Full name: E Boatwright

Re: patricia devlog

Post by eboatwright »

Whiskers wrote: Sat Feb 17, 2024 1:13 am I filtered about 12 million positions from the Willow dataset of 400 million positions I have on hand using the 6th filtering program I wrote in Patricia, that "saves positions where one side can't castle and has an open king". I then retrained Patricia's current network for 10 epochs on the new dataset at a much lower LR (in order to make sure that the entire network does not begin overfitting), in the hopes that the network would appreciate "good but not necessarily better than other options" attacks more and choose them over quiet play.

The results were somewhat promising: here's a result of a 3k game match between that Patricia and the original Patricia.

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    117347  11.17%  23.08%  17.82%   70   patty-dev  
   2     76785  08.05%  16.32%  21.15%   74   patricia  
The increase in sacrifice rate is statistically significant, and the strength loss was only maybe 5-10 elo. Not bad!

I'm going to do this for all 8 filtering programs. I hope Linux has a good program that can run round-robin tournaments; otherwise, I'll just have them all play matches against the original Patricia and pick the one that does best in aggressiveness while also not sacrificing too much elo. I have no problem giving up 10 elo for a better playstyle per patch, I have more of a problem giving up 100 elo.
First of all I just wanna say I've been following this project, and it's really cool!
Also: I'm on Ubuntu Linux, and I use CuteChess for matches & tournaments, and it works great for me :D
Creator of Maxwell
Whiskers
Posts: 227
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

I did what I explained above for all eight of my filtering programs, then ran a round-robin tournament with all of them to compare their EAS ratings.
Results:

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    108452  10.67%  20.43%  18.41%   71   filter8  
   2    105823  09.49%  18.97%  17.33%   72   filter6  
   3    104599  10.40%  19.95%  17.65%   74   filter2  
   4     95155  10.55%  17.34%  16.75%   74   filter4  
   5     82626  09.70%  16.72%  19.16%   75   filter5  
   6     81288  06.91%  17.74%  20.92%   74   filter3  
   7     78811  06.44%  16.83%  20.22%   74   base  
   8     76218  07.89%  14.74%  19.12%   74   filter7  
-------------------------------------------------------------------
Some notes:
The filter1 net was a complete disaster, it lost every game. When I run a search from startpos on it it gives -137 cp at depth 1 and +303 cp at depth 2. I definitely need to fix that one.

All of the nets were mostly similar (+- 10 elo) in strength, with the major exception of filter4, which was -40 elo. Filter4 was the set with the most data, is it possible the net overfit to too many "normal" positions?

Filter8 being at the top is odd because it just filters positions with very powerful minor pieces. I have to assume that this is because oftentimes setting up a powerhouse piece like that requires a pawn or exchange sacrifice.


I'm going to combine the three bits of data from methods 2, 6, and 8 now (removing duplicates) and then will train one more net on that to see how it fares in terms of aggressiveness.
Whiskers
Posts: 227
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

After a few more experiments, this is the best I could come up with as of right now:
Step 1. filter data using filters 2, 6, and 8, and then combine the files while removing duplicate positions. This gave me about 10m positions in the combined filtered dataset.

Step 2. Train a new network starting with the weights of the original network on this dataset, for 30 epochs at a very low LR (0.00001 seemed to be a happy median for me) with a lower WDL ratio than normal (I tried a couple higher WDL ratios and they just led to the engine playing much worse.)

The result is about 25 elo worse but sacrifices almost twice as much as base Patricia; mind you, base Patricia is already noticeably more aggressive than Willow, so this should be fun.

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    119274  11.77%  22.12%  15.50%   71   filterv3  
   2     66863  06.42%  15.89%  22.09%   73   base  
-------------------------------------------------------------------
I'm going to run a match against Willow and a couple other 2700-3000 level engines to gauge just how aggressive Patricia is and to see how I'm progressing. After that, the next thing I want to do is modify the flat bonus that I give in search to "aggressive" positions. Currently it's just a simple material count, there's plenty of room for improvement there.
Whiskers
Posts: 227
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Patricia's in a pretty clear first but there's still work to be done (3k game tournament). She wasn't that much more aggressive than Peacekeeper (take it out though and Patricia laps everyone). I want to see the EAS score delta be more like 50k-60k before I consider releasing a first version.

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    153324  20.47%  20.47%  13.00%   67   Patricia 0.1  
   2    122006  16.19%  25.96%  18.11%   65   Peacekeeper 1.50  
   3     98288  14.35%  12.91%  14.27%   70   Willow 2.8  
   4     83179  10.89%  15.41%  18.19%   70   Wahoo v3  
   5     72299  10.01%  16.00%  19.98%   71   Midnight v5  
   6     62558  10.83%  11.32%  21.01%   72   Polaris 1.6.1  
   7     62335  04.03%  17.39%  25.43%   68   Akimbo 0.4.1  
-------------------------------------------------------------------
Whiskers
Posts: 227
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

I've gotten a couple more successful aggression patches in the last couple of days.

My first idea was to instead of just giving a bonus to positions where the static evaluation was much better than the material count, explicitly look for sacrifices in the search line. This means that we make a note of what the material evaluation is at root, and then compare the material evaluations for each ply that's been played in search so far. If we find that there's been a point where we sacrificed material (i.e. we went down in material and didn't immediately take it back), we give a bonus to the evaluation of the current position.

A couple notes on this:
It doesn't matter how far back the sacrifice occurred, as long as it occurred between root and the current position being evaluated, it still gets the evaluation bonus. I don't personally like the idea of giving positions 20 plies ahead of the sacrifice and completely unrelated to it bonuses, but how is the engine supposed to be incentivized to sacrifice pieces if the score at the end of the search tree isn't any better?

In-between checks before recaptures count as sacrifices according to my current method of sacrifice detection, while removing the defender sacrifices don't. I want to work on this problem some more later, but it seems hard to differentiate between actual sacrifices and in-between moves.

Anyways, the results of this test were quite positive!

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    112165  10.89%  20.73%  15.58%   72   evalv2  
   2     87606  07.63%  19.43%  21.66%   73   base  
The next patch I did was attempt to tone down the number of bad draws. I give a penalty to all draw scores where: (a) there is plenty of material on the board, and (b) the side that is responsible for the draw isn't down in material. This is because endgame draws are much more reasonable than 20 move draws by repetition in positions that still look like they have plenty of life, and because if you're down in material a draw is a reasonable result, because there's a good chance you lose otherwise. The penalty is currently set at 20 centipawns but could be changed later on.

Results of that test:

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    102437  08.41%  21.64%  20.43%   68   searchv1  
   2     76148  10.17%  17.13%  22.88%   74   base  
It also gained 20 elo because I changed TT cutoffs to only be available on non-pv nodes rather than all nodes that are not root.


I had a few tests in between that looked like they might have increased the EAS score, but I wasn't sure if the sample size and significance was big enough. Currently I run 3k game STC tests, and my "cutoff" for accepting a patch is a 20k increase in EAS score. However, I will definitely have to run a statistical test sooner or later. Most likely this will take the form of 20 matches of the same Patricia version, giving 40 data points; I'll discard the highest and lowest outliers and that will give me a 95% confidence interval for 3000 game matches. That's quite a few games, but it shouldn't take more than a week on my computer.

After one more patch shows improvement in aggression I'll rerun the gauntlet from before to see what has changed. Hopefully Patricia will be more clearly in first this time!
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: patricia devlog

Post by lithander »

Very interesting thread! :)
Whiskers wrote: Wed Feb 21, 2024 11:41 pm It also gained 20 elo because I changed TT cutoffs to only be available on non-pv nodes rather than all nodes that are not root.
I don't understand this. Afaik a TT cutoff is where you say "I've searched this position before to at least the required depth" and based on the score I've stored, considering the bounds and the current alpha-beta window there are three scenarios where I can use the TT score instead of researching the position. I was under the impression that this is always at least as good as searching the position due to the tt_depth >= required_depth condition.

Is there an intuitive explanation why you would benefit from not considering a TT cutoff for pv nodes? And how do you define pv nodes in that scenario?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: patricia devlog

Post by JoAnnP38 »

lithander wrote: Thu Feb 22, 2024 12:12 pm Is there an intuitive explanation why you would benefit from not considering a TT cutoff for pv nodes? And how do you define pv nodes in that scenario?
Current Pedantic does TT cutoffs for PV nodes except when it returns a MATE score. Then I do not do the TT cutoff. I found that this allows Pedantic to more accurately keep the current mate line in the PV. In my current refactor I'm going to try not doing TT cutoffs for PV nodes to see if there is any advantage elo-wise.
Whiskers
Posts: 227
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

lithander wrote: Thu Feb 22, 2024 12:12 pm Very interesting thread! :)
Whiskers wrote: Wed Feb 21, 2024 11:41 pm It also gained 20 elo because I changed TT cutoffs to only be available on non-pv nodes rather than all nodes that are not root.
I don't understand this. Afaik a TT cutoff is where you say "I've searched this position before to at least the required depth" and based on the score I've stored, considering the bounds and the current alpha-beta window there are three scenarios where I can use the TT score instead of researching the position. I was under the impression that this is always at least as good as searching the position due to the tt_depth >= required_depth condition.

Is there an intuitive explanation why you would benefit from not considering a TT cutoff for pv nodes? And how do you define pv nodes in that scenario?

A PV node is any node where beta != alpha + 1. The reason I assume not cutting off in those nodes gains in Patricia is that I make quite a few changes to evaluation and search scores based on the "sacrifice history" of the search line. They then get saved in the TT, and when I next bump into the position under different circumstances, the modified score gets used and can cause a cutoff, even if the position no longer supports that score.

Honestly I just changed it because I had used !root as my cutoff condition as a placeholder, I didn't expect it to actually gain :)
Whiskers
Posts: 227
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Having a bit of trouble getting Patricia to play more aggressively by modifying search scores. I'm trying a bunch of strategies to the effect of "if you were the last side to play a sacrifice, eval bonus to you, if you weren't, eval penalty to you", but none of them are working any better than my first attempt I currently have in place...