patricia devlog

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: patricia devlog

Post by lithander »

Before releasing Leorik 3 I did a mini gauntlet with a few opponents and this is the EAS score when I stuff the PGN in the EAS tool:

Code: Select all

Rank  EAS-Score   wins  moves   sacs    sacsQ    sacs5+   sacs4    sacs3    sacs2    sacs1    all shorts short40  short45  short50  short55  short60   draws    Engine/player
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   1    188464      900   69   20.11% =[00.00% + 00.44% + 00.56% + 01.78% + 05.11% + 12.22%]    33.78% = [07.89% + 05.56% + 05.11% + 07.11% + 08.11%]  16.92%   Leorik-3.0  
   2     71251      167   83   07.19% =[00.00% + 00.60% + 00.00% + 00.00% + 00.60% + 05.99%]    11.38% = [01.20% + 00.60% + 01.20% + 01.80% + 06.59%]  18.03%   Nalwald-18  
   3     51066      245   85   09.80% =[00.00% + 00.41% + 00.00% + 00.41% + 01.63% + 07.35%]    05.71% = [00.00% + 00.00% + 00.00% + 01.22% + 04.49%]  21.67%   frozenight-6  
   4     43004      160   89   05.00% =[00.00% + 00.63% + 00.00% + 00.00% + 01.25% + 03.13%]    06.88% = [00.00% + 00.63% + 01.25% + 02.50% + 02.50%]  25.00%   StockNemo-5.7  
   5     27105      227   91   02.64% =[00.00% + 00.00% + 00.00% + 00.44% + 00.00% + 02.20%]    06.17% = [00.44% + 00.44% + 00.00% + 00.44% + 04.85%]  26.98%   zahak-10  
   6     22320       46   84   00.00% =[00.00% + 00.00% + 00.00% + 00.00% + 00.00% + 00.00%]    13.04% = [00.00% + 00.00% + 00.00% + 08.70% + 04.35%]  32.17%   PeSTO
Based on the score, comparing it to https://www.sp-cc.de/eas-ratinglist.htm Leorik should be one of the most agressive engines ever without even trying? But I doubt that's really the case. :roll: I don't understand how the EAS tool works exactly but I think what it means is that you need to have a wide range of engines that are both stronger and weaker before you can draw meaningful conclusions.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
Whiskers
Posts: 186
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

lithander wrote: Mon Feb 12, 2024 5:47 pm Before releasing Leorik 3 I did a mini gauntlet with a few opponents and this is the EAS score when I stuff the PGN in the EAS tool:

Code: Select all

Rank  EAS-Score   wins  moves   sacs    sacsQ    sacs5+   sacs4    sacs3    sacs2    sacs1    all shorts short40  short45  short50  short55  short60   draws    Engine/player
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   1    188464      900   69   20.11% =[00.00% + 00.44% + 00.56% + 01.78% + 05.11% + 12.22%]    33.78% = [07.89% + 05.56% + 05.11% + 07.11% + 08.11%]  16.92%   Leorik-3.0  
   2     71251      167   83   07.19% =[00.00% + 00.60% + 00.00% + 00.00% + 00.60% + 05.99%]    11.38% = [01.20% + 00.60% + 01.20% + 01.80% + 06.59%]  18.03%   Nalwald-18  
   3     51066      245   85   09.80% =[00.00% + 00.41% + 00.00% + 00.41% + 01.63% + 07.35%]    05.71% = [00.00% + 00.00% + 00.00% + 01.22% + 04.49%]  21.67%   frozenight-6  
   4     43004      160   89   05.00% =[00.00% + 00.63% + 00.00% + 00.00% + 01.25% + 03.13%]    06.88% = [00.00% + 00.63% + 01.25% + 02.50% + 02.50%]  25.00%   StockNemo-5.7  
   5     27105      227   91   02.64% =[00.00% + 00.00% + 00.00% + 00.44% + 00.00% + 02.20%]    06.17% = [00.44% + 00.44% + 00.00% + 00.44% + 04.85%]  26.98%   zahak-10  
   6     22320       46   84   00.00% =[00.00% + 00.00% + 00.00% + 00.00% + 00.00% + 00.00%]    13.04% = [00.00% + 00.00% + 00.00% + 08.70% + 04.35%]  32.17%   PeSTO
Based on the score, comparing it to https://www.sp-cc.de/eas-ratinglist.htm Leorik should be one of the most agressive engines ever without even trying? But I doubt that's really the case. :roll: I don't understand how the EAS tool works exactly but I think what it means is that you need to have a wide range of engines that are both stronger and weaker before you can draw meaningful conclusions.

Those results look a bit odd. Leorik sacrificing twice as often as any other opponent and winning way faster on average? And how did PESTO not play a sacrifice even once? The raw data clearly suggests Leorik is really aggressive. Maybe a couple of its opponents are vulnerable to tactical shots? Thanks for the info, it gives me something to think about.
User avatar
Whiskers
Posts: 186
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

Leorik winning 8% of its games in under 40 moves seems to suggest there were one or two hopeless engines that Leorik crushed over and over again, while the other engines couldn’t enjoy that benefit because they only played Leorik.
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: patricia devlog

Post by lithander »

Whiskers wrote: Mon Feb 12, 2024 9:55 pm Leorik winning 8% of its games in under 40 moves seems to suggest there were one or two hopeless engines that Leorik crushed over and over again, while the other engines couldn’t enjoy that benefit because they only played Leorik.
For context here's how the match was setup:

Code: Select all

./cutechess-cli.exe -engine conf="Leorik-3.0" -engine conf="frozenight-6" -engine conf="PeSTO" -engine conf="StockNemo-5.7" -engine conf="Nalwald-18" -engine conf="zahak-10" -each tc=40/30 book=varied.bin option.Hash=32 -pgnout leorik3_gauntlet1_30per40.pgn -rounds 1000 -games 2 -repeat -concurrency 7 -tournament gauntlet
Anchoring the engines to their CCRL ratings shows that the only really hopeless engine was PeSTO. Maybe PeSTO was providing Leorik with all the opportunities to play short, aggressive wins. That's a good theory!

Code: Select all

 .\ordo-win64.exe -p .\leorik3_gauntlet1_30per40.pgn -m anchors_3.0.txt
   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 frozenight-6     :  3374.0   406.5     626    65
   2 zahak-10         :  3342.0   366.0     625    59
   3 Leorik-3.0       :  3289.4  1591.5    3128    51
   4 Nalwald-18       :  3289.0   344.5     625    55
   5 StockNemo-5.7    :  3287.0   316.0     626    50
   6 PeSTO            :  3122.0   103.5     626    17
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
Guenther
Posts: 4643
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: patricia devlog

Post by Guenther »

lithander wrote: Mon Feb 12, 2024 5:47 pm Before releasing Leorik 3 I did a mini gauntlet with a few opponents and this is the EAS score when I stuff the PGN in the EAS tool:

Code: Select all

Rank  EAS-Score   wins  moves   sacs    sacsQ    sacs5+   sacs4    sacs3    sacs2    sacs1    all shorts short40  short45  short50  short55  short60   draws    Engine/player
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   1    188464      900   69   20.11% =[00.00% + 00.44% + 00.56% + 01.78% + 05.11% + 12.22%]    33.78% = [07.89% + 05.56% + 05.11% + 07.11% + 08.11%]  16.92%   Leorik-3.0  
   2     71251      167   83   07.19% =[00.00% + 00.60% + 00.00% + 00.00% + 00.60% + 05.99%]    11.38% = [01.20% + 00.60% + 01.20% + 01.80% + 06.59%]  18.03%   Nalwald-18  
   3     51066      245   85   09.80% =[00.00% + 00.41% + 00.00% + 00.41% + 01.63% + 07.35%]    05.71% = [00.00% + 00.00% + 00.00% + 01.22% + 04.49%]  21.67%   frozenight-6  
   4     43004      160   89   05.00% =[00.00% + 00.63% + 00.00% + 00.00% + 01.25% + 03.13%]    06.88% = [00.00% + 00.63% + 01.25% + 02.50% + 02.50%]  25.00%   StockNemo-5.7  
   5     27105      227   91   02.64% =[00.00% + 00.00% + 00.00% + 00.44% + 00.00% + 02.20%]    06.17% = [00.44% + 00.44% + 00.00% + 00.44% + 04.85%]  26.98%   zahak-10  
   6     22320       46   84   00.00% =[00.00% + 00.00% + 00.00% + 00.00% + 00.00% + 00.00%]    13.04% = [00.00% + 00.00% + 00.00% + 08.70% + 04.35%]  32.17%   PeSTO
Based on the score, comparing it to https://www.sp-cc.de/eas-ratinglist.htm Leorik should be one of the most agressive engines ever without even trying? But I doubt that's really the case. :roll: I don't understand how the EAS tool works exactly but I think what it means is that you need to have a wide range of engines that are both stronger and weaker before you can draw meaningful conclusions.
So far I have never tried this tool, but it seems obvious it will only show meaningful results in matches or full round tournaments (e.g. not in gauntlets and not in tournaments, where players have a different number of games - provided its calculations are correct anyway).

@ Adam
Thanks for the dev log of your new 'Patricia' - will enjoy your posts about it, as I did and still do with Thomas' Leorik :)
https://rwbc-chess.de

trollwatch:
Talkchess nowadays is a joke - it is full of trolls/idiots/clone lovers/people stuck in the pleistocene > 80% of the posts fall into this category...
User avatar
Whiskers
Posts: 186
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

I added RFP, LMR, NMP, and history to Patricia; she's now at around 2900 strength already. She also remains more aggressive than the corresponding Willow version:

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    118857  15.97%  23.00%  16.59%   71   Patricia 0.1  
   2     70876  11.84%  24.49%  24.89%   70   Willow 2.8  
It's time to work a little bit more on Patricia's aggressiveness. This time around, I plan to retrain my network on "aggressive" data filtered from my Willow dataset at lower LR. This should keep most of the knowledge that the net already has, just slanting it a little bit towards the new positions I'm adding in; I expect this new net to be slightly worse at regular chess, but much better at aggressive chess, which is what I want.
The only problem is that my data is all in binpack format, so I need to write a converter.
chesskobra
Posts: 216
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: patricia devlog

Post by chesskobra »

Interesting project, I will be following. Do the position filters work on epd files or pgn files? Do they compile on linux?
User avatar
pohl4711
Posts: 2516
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: patricia devlog

Post by pohl4711 »

Whiskers wrote: Thu Feb 15, 2024 4:26 pm I added RFP, LMR, NMP, and history to Patricia; she's now at around 2900 strength already. She also remains more aggressive than the corresponding Willow version:

Code: Select all

Rank  EAS-Score  sacs   shorts  draws  moves  Engine/player 
-------------------------------------------------------------------
   1    118857  15.97%  23.00%  16.59%   71   Patricia 0.1  
   2     70876  11.84%  24.49%  24.89%   70   Willow 2.8  
It's time to work a little bit more on Patricia's aggressiveness. This time around, I plan to retrain my network on "aggressive" data filtered from my Willow dataset at lower LR. This should keep most of the knowledge that the net already has, just slanting it a little bit towards the new positions I'm adding in; I expect this new net to be slightly worse at regular chess, but much better at aggressive chess, which is what I want.
The only problem is that my data is all in binpack format, so I need to write a converter.
Great! Cant wait to see the results.
User avatar
Whiskers
Posts: 186
Joined: Tue Jan 31, 2023 4:34 pm
Full name: Adam Kulju

Re: patricia devlog

Post by Whiskers »

chesskobra wrote: Thu Feb 15, 2024 10:45 pm Interesting project, I will be following. Do the position filters work on epd files or pgn files? Do they compile on linux?
They work on text files that contain FEN lines. I did this because I have a lot of Willow data lying around in that format (that I convert to binpacks for nnue training). It would not be very difficult to change it to work with pgns, but I don't see the point for my purposes right now. If demand is high enough I'll generalize the position filter programs and make them easy to build too.
chesskobra
Posts: 216
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: patricia devlog

Post by chesskobra »

Of course, I am not requesting you to generalise them but I was only curious since I was wondering if such programs could be used to extract interesting positions from game databases for the purpose of human training.