patricia devlog

Discussion of chess software programming and technical issues.

Moderator: Ras

Whiskers
Posts: 231
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Best results of a few net training experiments; Patricia's best aggressiveness occurred when net training was done with no WDL at all, just eval.

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    191298  28.34%  21.07%  11.26%   60   Patricia dev  
   2     66323  12.74%  11.46%  20.22%   76   Princhess 0.17  
   3     36155  05.63%  12.12%  27.46%   68   Drofa 3.3  
   4     33275  07.41%  06.35%  26.76%   75   Velvet 3.1  
   5     32619  06.45%  10.60%  33.14%   68   Counter 4  
   6     31589  04.86%  08.50%  32.65%   67   Igel 2.6  
   7     24261  02.80%  10.75%  35.51%   68   Combusken 1.4  
-------------------------------------------------------------------
Unfortunately, her data is not any better than Willow data, in fact it's somewhat worse. This is a shame but considering the engine is 200 elo weaker than Willow wasn't as bad as I expected.

I'm trying a few more datagen tricks on smallnets for Patricia before I move on to generating a large dataset; I want to try adjudicating games whenever she sacrifices, I want to try duplicating positions where the search and eval score were much different, I want to try using a combination of random moves and UHO positions for starting positions for games, etc.
Mike Sherwin
Posts: 930
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: patricia devlog

Post by Mike Sherwin »

Whiskers wrote: Sun Jul 21, 2024 4:21 am Best results of a few net training experiments; Patricia's best aggressiveness occurred when net training was done with no WDL at all, just eval.

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    191298  28.34%  21.07%  11.26%   60   Patricia dev  
   2     66323  12.74%  11.46%  20.22%   76   Princhess 0.17  
   3     36155  05.63%  12.12%  27.46%   68   Drofa 3.3  
   4     33275  07.41%  06.35%  26.76%   75   Velvet 3.1  
   5     32619  06.45%  10.60%  33.14%   68   Counter 4  
   6     31589  04.86%  08.50%  32.65%   67   Igel 2.6  
   7     24261  02.80%  10.75%  35.51%   68   Combusken 1.4  
-------------------------------------------------------------------
Unfortunately, her data is not any better than Willow data, in fact it's somewhat worse. This is a shame but considering the engine is 200 elo weaker than Willow wasn't as bad as I expected.

I'm trying a few more datagen tricks on smallnets for Patricia before I move on to generating a large dataset; I want to try adjudicating games whenever she sacrifices, I want to try duplicating positions where the search and eval score were much different, I want to try using a combination of random moves and UHO positions for starting positions for games, etc.
If you do not want to use WDL then consider only using positions from high quality drawn games. Someone did that for RomiChess and it produced good results.
jefk
Posts: 864
Joined: Sun Jul 25, 2010 10:07 pm
Location: the Netherlands
Full name: Jef Kaan

Re: patricia devlog

Post by jefk »

Congrats on the higher rating (and in my opinion, better style) of P3. It's an interesting development. Although I'm not an engine programming expert, after observing some P3 games against two other engines (GM level or higher), I get the impression there are two kinds of 'aggression' in this engine.

The first type is the engine's ability to make good positional sacrifices, which seems to be a strong point in P3. This might be a result of using positions with sacrifices for the neural network training, an idea that has evidently worked out well. I think this is how chess should be played to achieve the best results (at least in human chess)—it's more about being assertive than truly 'ultra-aggressive'.

The second type of aggression appears to be a tendency to almost always want to attack, often bringing pieces toward the opponent's (king's) side. This might be due to a high contempt factor. In some test games, for example against Crafty (around 2800 rating), I noticed that this approach sometimes backfired. After a long positional attack near the end of the middle game (around move 35 or so), if Crafty managed to defend well and was not losing, it gradually took the upper hand, leading to P3's defeat in the endgame.

In human chess, it's known to start attacking only once you have a (slightly) better position; otherwise, you should defend. However, the evaluation of the position can be a bit subjective sometimes (and similarly for the engine).

Perhaps the contempt factor is set a bit too high. It might be a good idea for a future version (P3.1 or so) to include an option in the UCI to adjust this, instead of having all these tuning parameters which I don't dare to touch, of course. Just my 2 cents.

Jef

PS: Some gambits tend to work better later in the game than others. I plan to read the played games into a small Polyglot book and then combine it with the other gambit books I have made over the past few years.
Whiskers
Posts: 231
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

A couple datagen innovations that gain a little bit of Elo:

(1): Save positions where the qsearch score is much different from the search score multiple times into the file. This forces the net to fit closer to those positions where the eval was significantly wrong in search.

(2): Start 10% of the datagen games from an opening book (I used 4moves_noob.epd because it has so many positions). This has some advantage over games from random moves because most games from random moves don't bear too much resemblance to "normal" games that arise from typical openings, so the inclusion of some games from openings helps the network to fit a bit better to this. (I did not expect this one to work and was shocked to see it gained about 20 ELO in small nets).

This brings Patricia datagen on par with Willow datagen, which is pretty impressive given Patricia is 200 ELO weaker than Willow on the CCRL list. EAS scores for small nets are a little bit down but that's ok. What I really want to see is how well these innovations translate to full-sized nets.
Whiskers
Posts: 231
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

An idea from Serdra from the EP discord: unbalanced datagen, with a random side getting 10k nodes and the other side getting 1k nodes. This is done to maximize decisive games and especially to increase the likelihood of successful attacks in the training data.

On small net tests, it leads to *no strength loss* and a huge EAS improvement.

(remember, this is before any retraining. This is just a regular network trained normally. I really hope this scales to large nets with lots of data...)

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    250738  31.61%  35.82%  11.26%   55   Patricia 3.1  
   2     60064  06.90%  03.45%  24.35%   80   Princhess 0.17  
   3     30500  03.55%  11.24%  32.31%   63   Drofa 3.3  
   4     29537  01.04%  12.44%  29.90%   65   Combusken 1.4  
   5     19436  02.96%  11.24%  38.02%   68   Igel 2.6  
   6     18290  04.09%  15.20%  41.07%   69   Counter 4.0  
   7     11480  06.76%  04.73%  38.89%   76   Velvet 3.1  
-------------------------------------------------------------------
Whiskers
Posts: 231
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Whiskers wrote: Sat Jul 27, 2024 5:44 am An idea from Serdra from the EP discord: unbalanced datagen, with a random side getting 10k nodes and the other side getting 1k nodes. This is done to maximize decisive games and especially to increase the likelihood of successful attacks in the training data.

On small net tests, it leads to *no strength loss* and a huge EAS improvement.

(remember, this is before any retraining. This is just a regular network trained normally. I really hope this scales to large nets with lots of data...)

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    250738  31.61%  35.82%  11.26%   55   Patricia 3.1  
   2     60064  06.90%  03.45%  24.35%   80   Princhess 0.17  
   3     30500  03.55%  11.24%  32.31%   63   Drofa 3.3  
   4     29537  01.04%  12.44%  29.90%   65   Combusken 1.4  
   5     19436  02.96%  11.24%  38.02%   68   Igel 2.6  
   6     18290  04.09%  15.20%  41.07%   69   Counter 4.0  
   7     11480  06.76%  04.73%  38.89%   76   Velvet 3.1  
-------------------------------------------------------------------

final test: balanced datagen with draw scores removed reduces Elo and EAS compared to the unbalanced datagen method. I'm currently generating a medium-sized dataset of 300m fens to see how my improvements scale on somewhat larger nets before I go ahead and do the full run.

I plan to release 3.1 as a version that will just feature QOL improvements; a MultiPV option, a "human mode" where Patricia will play more aggressive openings, a more accurate skill level system, etc. A common complaint has been that Patricia 3 isn't quite as fun for a human to play against as Patricia 2, so I want to fix that before I move on to all the network/strength work for Patricia 4, which could take a few months.
isidro02139
Posts: 6
Joined: Wed Jul 03, 2024 1:27 am
Full name: ARUN MEHTA

Re: patricia devlog

Post by isidro02139 »

Hey! This is really inspiring ^^ I'm glad to see your progress, please keep it up (sorry to hear about Willow)
Whiskers
Posts: 231
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

isidro02139 wrote: Mon Jul 29, 2024 1:54 am Hey! This is really inspiring ^^ I'm glad to see your progress, please keep it up (sorry to hear about Willow)
Thank you!
Sorry to hear about Willow? Willow is still alive and kicking (and doing quite well), Patricia is the rabbit who passed away late last year. I miss her but am happy her name lives on in this project.
schack
Posts: 177
Joined: Thu May 27, 2010 3:32 am

Re: patricia devlog

Post by schack »

Any reason why this would show up as a false positive?

https://www.virustotal.com/gui/file/c2b ... /detection
jefk
Posts: 864
Joined: Sun Jul 25, 2010 10:07 pm
Location: the Netherlands
Full name: Jef Kaan

Re: patricia devlog

Post by jefk »

about false positives,
having Avast, i got such reporting as well, but with virustotal it were only 2 scanners or so
so i thought it would be safe to ignore it (sent it in to Avast as well but that might take some time)

nowadays with Avast quite some chess downloads from github show up as possible 'trojan'
(at least in my Avast, which i chose for overall good security).
Eg pythonchessGui, and some others.

PS as for combichess, this shows up in virustotal with many scanners, possibly
because of some python code (anyway haven't used combichess (yet)