Just the normal score (success in percentage, or points/games, or whatever).Whiskers wrote: ↑Thu Apr 04, 2024 5:45 pmELO scores you mean? Sure, I’ll do it as soon as I get home. Patricia is pretty much right in the middle strength wise.Guenther wrote: ↑Thu Apr 04, 2024 8:55 amWould you mind also to show the real scores for comparison?Whiskers wrote: ↑Thu Apr 04, 2024 3:18 am New baseline pool for Patricia. These engines are about equal strength or stronger, with the exception of poor Wahoo (I didn't expect it to perform that poorly!)
This is a very odd list to me, Velvet and Princhess so low? how? Princhess is one of the best style playing engines I've seen. Its draw rate on CCRL, at 31%, is lower than every single engine rated higher than 2865. And Velvet is, well, Velvet.Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player ------------------------------------------------------------------- 1 290153 31.42% 29.50% 08.47% 61 Patricia 2 126973 16.04% 22.23% 15.86% 64 Svart 6 3 116043 09.46% 31.21% 24.78% 60 Wahoo v4 4 110608 12.72% 26.20% 20.54% 64 Leorik 3.0.1 5 109185 13.46% 28.64% 21.11% 65 Velvet 3.1 6 79544 08.61% 18.72% 25.89% 68 StockNemo 5.7 7 52576 09.23% 08.87% 21.35% 83 Princhess 0.16 -------------------------------------------------------------------
Patricia's no longer able to wipe opponents out at the speed of light; as most of its sacrifices are at least somewhat unsound, it often wins despite the sacrifices rather than because of them against these stronger opponents. However, the EAS score is still way higher than any other engine in the gauntlet, so I am satisfied with her performance and will start development of 2.1/3.0 using this pool.
I really can't help but shake the feeling that optimizing for a particular pool of engines isn't the best idea... maybe I should test using a gauntlet against more (like 20) engines?
patricia devlog
Moderators: hgm, chrisw, Rebel
-
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: patricia devlog
-
- Posts: 309
- Joined: Thu Jul 21, 2022 12:30 am
- Full name: Chesskobra
Re: patricia devlog
As I suggested in another thread, it would be interesting to look at points scored per 100 moves, because engines scoring short wins would do well on such metrics.
-
- Posts: 231
- Joined: Tue Jan 31, 2023 4:34 pm
- Full name: Adam Kulju
Re: patricia devlog
This is 1800 games per each engine, I did it because halfway through my last test the power went out so I had to start the cutechess script again.Guenther wrote: ↑Thu Apr 04, 2024 5:53 pmJust the normal score (success in percentage, or points/games, or whatever).Whiskers wrote: ↑Thu Apr 04, 2024 5:45 pmELO scores you mean? Sure, I’ll do it as soon as I get home. Patricia is pretty much right in the middle strength wise.Guenther wrote: ↑Thu Apr 04, 2024 8:55 amWould you mind also to show the real scores for comparison?Whiskers wrote: ↑Thu Apr 04, 2024 3:18 am New baseline pool for Patricia. These engines are about equal strength or stronger, with the exception of poor Wahoo (I didn't expect it to perform that poorly!)
This is a very odd list to me, Velvet and Princhess so low? how? Princhess is one of the best style playing engines I've seen. Its draw rate on CCRL, at 31%, is lower than every single engine rated higher than 2865. And Velvet is, well, Velvet.Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player ------------------------------------------------------------------- 1 290153 31.42% 29.50% 08.47% 61 Patricia 2 126973 16.04% 22.23% 15.86% 64 Svart 6 3 116043 09.46% 31.21% 24.78% 60 Wahoo v4 4 110608 12.72% 26.20% 20.54% 64 Leorik 3.0.1 5 109185 13.46% 28.64% 21.11% 65 Velvet 3.1 6 79544 08.61% 18.72% 25.89% 68 StockNemo 5.7 7 52576 09.23% 08.87% 21.35% 83 Princhess 0.16 -------------------------------------------------------------------
Patricia's no longer able to wipe opponents out at the speed of light; as most of its sacrifices are at least somewhat unsound, it often wins despite the sacrifices rather than because of them against these stronger opponents. However, the EAS score is still way higher than any other engine in the gauntlet, so I am satisfied with her performance and will start development of 2.1/3.0 using this pool.
I really can't help but shake the feeling that optimizing for a particular pool of engines isn't the best idea... maybe I should test using a gauntlet against more (like 20) engines?
Book is 4moves-noob.epd (slightly unbalanced), time control is 10+0.1.
Code: Select all
Rank Name Elo +/- Games Score Draw
1 Leorik 3.0.1 123 14 1800 67.1% 31.7%
2 Svart 6 104 14 1800 64.5% 29.3%
3 StockNemo 5.7 41 13 1800 55.9% 30.8%
4 Patricia -18 14 1800 47.4% 24.9%
5 Princhess 0.15 -68 14 1800 40.4% 28.1%
6 Velvet 3.1 -69 14 1800 40.3% 26.9%
7 Wahoo v4 -111 15 1800 34.5% 21.4%
go and star https://github.com/Adam-Kulju/Patricia!
-
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: patricia devlog
Thanks for posting the real scores too. After checking the opponents in CCRL Blitz it seems the result of Wahoo was just to be expected.
Patricia 2.0 now should be already around 3200 (CCRL scale) now and congrats for the successful tuning towards aggressivity!
Patricia 2.0 now should be already around 3200 (CCRL scale) now and congrats for the successful tuning towards aggressivity!
Code: Select all
Name CCRL Games Err
------------------------------------
StockNemo 5.7 3285 1397 15
Leorik 3.0 3284 1047 17
Svart 6 3261 1199 16
(Patricia 2.0) ---- ---- ----
Velvet 3.1 3161 900 17
Princhess 0.16 3154 850 20
Wahoo v4 3081 1240 16
Whiskers wrote: ↑Fri Apr 05, 2024 1:30 am
New baseline pool for Patricia. These engines are about equal strength or stronger, with the exception of poor Wahoo (I didn't expect it to perform that poorly!)
This is 1800 games per each engine, I did it because halfway through my last test the power went out so I had to start the cutechess script again.Code: Select all
Rank EAS-Score sacs shorts draws moves Engine/player ------------------------------------------------------------------- 1 290153 31.42% 29.50% 08.47% 61 Patricia 2 126973 16.04% 22.23% 15.86% 64 Svart 6 3 116043 09.46% 31.21% 24.78% 60 Wahoo v4 4 110608 12.72% 26.20% 20.54% 64 Leorik 3.0.1 5 109185 13.46% 28.64% 21.11% 65 Velvet 3.1 6 79544 08.61% 18.72% 25.89% 68 StockNemo 5.7 7 52576 09.23% 08.87% 21.35% 83 Princhess 0.16 -------------------------------------------------------------------
Book is 4moves-noob.epd (slightly unbalanced), time control is 10+0.1.
Code: Select all
Rank Name Elo +/- Games Score Draw 1 Leorik 3.0.1 123 14 1800 67.1% 31.7% 2 Svart 6 104 14 1800 64.5% 29.3% 3 StockNemo 5.7 41 13 1800 55.9% 30.8% 4 Patricia -18 14 1800 47.4% 24.9% 5 Princhess 0.15 -68 14 1800 40.4% 28.1% 6 Velvet 3.1 -69 14 1800 40.3% 26.9% 7 Wahoo v4 -111 15 1800 34.5% 21.4%
-
- Posts: 231
- Joined: Tue Jan 31, 2023 4:34 pm
- Full name: Adam Kulju
Re: patricia devlog
Firstly, I fixed a couple bugs that were causing some undefined behavior / occasional attempts to access an index that was one greater than what the array had storage for. I could never get Patricia to crash on either one of my machines but now that it actually passes all valgrind/UbSan checks I feel safe to release a bugfix version.
I'm generating data for a friend and can't do testing at the moment, so decided to try my hand at skill levels. The first most obvious idea is to just limit depth.
At 60 + 0.6, depth 2 Patricia = about 1100-1200 CCRL, depth 3 = 1400-1500, and depth 4 = about 1700. When I play against Patricia, its mistakes and blunders feel relatively normal, like those of me on a bad day in blitz (until she starts losing at least, at which points she starts throwing all her pieces into the garbage!) Depth 1 cannot spot mate in 1s if the mating move is a quiet one, so I discarded that idea.
The next idea is to limit nodes; above all, I want to avoid putting in code to intentionally force blunders. Patricia already makes enough funny mistakes at full strength
I'm generating data for a friend and can't do testing at the moment, so decided to try my hand at skill levels. The first most obvious idea is to just limit depth.
At 60 + 0.6, depth 2 Patricia = about 1100-1200 CCRL, depth 3 = 1400-1500, and depth 4 = about 1700. When I play against Patricia, its mistakes and blunders feel relatively normal, like those of me on a bad day in blitz (until she starts losing at least, at which points she starts throwing all her pieces into the garbage!) Depth 1 cannot spot mate in 1s if the mating move is a quiet one, so I discarded that idea.
The next idea is to limit nodes; above all, I want to avoid putting in code to intentionally force blunders. Patricia already makes enough funny mistakes at full strength
go and star https://github.com/Adam-Kulju/Patricia!
-
- Posts: 231
- Joined: Tue Jan 31, 2023 4:34 pm
- Full name: Adam Kulju
Re: patricia devlog
Patricia now has several skill levels, which were roughly determined by testing at 60 + 0.6 (at shorter time controls these skill levels will be stronger, at longer time controls they will be weaker). The ratings are somewhat anchored to CCRL 40|15 rating - 50.
These skill levels use the UCI_Elo option and are:
1200 ELO: depth 2 Patricia
1400 ELO: 1000 node Patricia
1600 Elo: 1600 node Patricia
1800 Elo: depth 4 Patricia
2000 Elo: 4000 node Patricia
2200 Elo: 8000 node Patricia
2600 Elo: 64000 node Patricia
3200 Elo (or any other value that doesn't fit in the above categories): Full strength Patricia
I played a couple test games and had a friend play some test games against the weaker versions of Patricia. She put up a good fight and played in quite a human manner, except for perhaps being a bit too tactically sharp and for having an unfortunate tendency to throw all her pieces away when losing.
I also implemented go depth and go nodes, next up is a proper PV and multithreading.
These skill levels use the UCI_Elo option and are:
1200 ELO: depth 2 Patricia
1400 ELO: 1000 node Patricia
1600 Elo: 1600 node Patricia
1800 Elo: depth 4 Patricia
2000 Elo: 4000 node Patricia
2200 Elo: 8000 node Patricia
2600 Elo: 64000 node Patricia
3200 Elo (or any other value that doesn't fit in the above categories): Full strength Patricia
I played a couple test games and had a friend play some test games against the weaker versions of Patricia. She put up a good fight and played in quite a human manner, except for perhaps being a bit too tactically sharp and for having an unfortunate tendency to throw all her pieces away when losing.
I also implemented go depth and go nodes, next up is a proper PV and multithreading.
go and star https://github.com/Adam-Kulju/Patricia!
-
- Posts: 231
- Joined: Tue Jan 31, 2023 4:34 pm
- Full name: Adam Kulju
Re: patricia devlog
Multithreading support is now added in Patricia. Supports up to 1024 threads. In doing so, however, I discovered something that I need some time to think on.
When I first tested it, multithreading did not give nearly as much ELO as I thought it was - just 60 ELO for 4 threads, on an unbalanced book. After verifying that there was nothing fishy locally, I wondered if it might be due to the sacrifices Patricia plays. So I removed the sacrifice bonuses in eval, and lo and behold, suddenly multithreading was giving the gains I had expected it to.
It seems that I've reached a point in strength where most of Patricia's losses come from the ridiculous sacrifices she's forced to do. A lot of Patricia's sacrifices have great compensation, but some are basically just giving the opponent piece odds. All the threads in the world can't stop Patricia from playing three bad sacrifices in a row and getting into a losing position.
It seems then that for Patricia 3, I'm going to have to take a step back, and figure out how to get her to play into positions where good (especially close to best move) sacrifices are plentiful. In some positions, there just are not good sacrifices that can be played, and in those scenarios I want to code Patricia so that she steers the game into more exciting waters and then sacrifices, instead of immediately forcing a "sacrifice" that is just a hang of a pawn.
When I first tested it, multithreading did not give nearly as much ELO as I thought it was - just 60 ELO for 4 threads, on an unbalanced book. After verifying that there was nothing fishy locally, I wondered if it might be due to the sacrifices Patricia plays. So I removed the sacrifice bonuses in eval, and lo and behold, suddenly multithreading was giving the gains I had expected it to.
It seems that I've reached a point in strength where most of Patricia's losses come from the ridiculous sacrifices she's forced to do. A lot of Patricia's sacrifices have great compensation, but some are basically just giving the opponent piece odds. All the threads in the world can't stop Patricia from playing three bad sacrifices in a row and getting into a losing position.
It seems then that for Patricia 3, I'm going to have to take a step back, and figure out how to get her to play into positions where good (especially close to best move) sacrifices are plentiful. In some positions, there just are not good sacrifices that can be played, and in those scenarios I want to code Patricia so that she steers the game into more exciting waters and then sacrifices, instead of immediately forcing a "sacrifice" that is just a hang of a pawn.
go and star https://github.com/Adam-Kulju/Patricia!
-
- Posts: 231
- Joined: Tue Jan 31, 2023 4:34 pm
- Full name: Adam Kulju
Re: patricia devlog
I decided to extract data from SPCC testing to get some better data for retraining Patricia's net on. To do this, I grabbed Patricia's games, as well as all the games played in SPCC testing (found on the site), used the interesting wins filter to search for, well, interesting games, used pgn-extract to grab the FENs (with best moves and scores) from the PGNs, and wrote a script to perform filtering + conversion on those FENs. This yielded about 8.25m "interesting" FENs; if retraining Patricia's network on it yields positive results, I'll probably grab CCRL games as well.
For testing the new retrained net I'm going to remove the features that directly force sacrifices in Patricia. I feel like they're a bit unhealthy for how she plays, especially as the bonuses get *huge* for some sacrifices. I think I'm also not going to let Patricia give bonuses for sacrifices if she's losing, because sacrifices in losing positions are really just throwing pieces in the garbage and are not conducive whatsoever to style of play.
For testing the new retrained net I'm going to remove the features that directly force sacrifices in Patricia. I feel like they're a bit unhealthy for how she plays, especially as the bonuses get *huge* for some sacrifices. I think I'm also not going to let Patricia give bonuses for sacrifices if she's losing, because sacrifices in losing positions are really just throwing pieces in the garbage and are not conducive whatsoever to style of play.
go and star https://github.com/Adam-Kulju/Patricia!
-
- Posts: 2631
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: patricia devlog
My 2cents: fixed nodes or fixed depths are a good way to reduce the strength without damaging the playing-sytle. But please mention, in the endgame, the number of nodes (or the max depth) must be increased, otherwise there is a huge Elo-loss in the endgame. As I mentioned before: TheKing-Element Chesscomputer offers limited nodes levels, too, but doubles and quadruples this node-limit, when the board is getting more and more empty (=endgame).Whiskers wrote: ↑Mon Apr 08, 2024 7:14 am Patricia now has several skill levels, which were roughly determined by testing at 60 + 0.6 (at shorter time controls these skill levels will be stronger, at longer time controls they will be weaker). The ratings are somewhat anchored to CCRL 40|15 rating - 50.
These skill levels use the UCI_Elo option and are:
1200 ELO: depth 2 Patricia
1400 ELO: 1000 node Patricia
1600 Elo: 1600 node Patricia
1800 Elo: depth 4 Patricia
2000 Elo: 4000 node Patricia
2200 Elo: 8000 node Patricia
2600 Elo: 64000 node Patricia
3200 Elo (or any other value that doesn't fit in the above categories): Full strength Patricia
I played a couple test games and had a friend play some test games against the weaker versions of Patricia. She put up a good fight and played in quite a human manner, except for perhaps being a bit too tactically sharp and for having an unfortunate tendency to throw all her pieces away when losing.
I also implemented go depth and go nodes, next up is a proper PV and multithreading.
-
- Posts: 231
- Joined: Tue Jan 31, 2023 4:34 pm
- Full name: Adam Kulju
Re: patricia devlog
I definitely understand this for max depth (and will come back around to revising Patricia's skill levels before releasing), but for endgames why does it need more nodes? Thanks to the transposition table engines can hit very high depths with comparatively very few nodes.pohl4711 wrote: ↑Wed Apr 10, 2024 2:14 pmMy 2cents: fixed nodes or fixed depths are a good way to reduce the strength without damaging the playing-sytle. But please mention, in the endgame, the number of nodes (or the max depth) must be increased, otherwise there is a huge Elo-loss in the endgame. As I mentioned before: TheKing-Element Chesscomputer offers limited nodes levels, too, but doubles and quadruples this node-limit, when the board is getting more and more empty (=endgame).Whiskers wrote: ↑Mon Apr 08, 2024 7:14 am Patricia now has several skill levels, which were roughly determined by testing at 60 + 0.6 (at shorter time controls these skill levels will be stronger, at longer time controls they will be weaker). The ratings are somewhat anchored to CCRL 40|15 rating - 50.
These skill levels use the UCI_Elo option and are:
1200 ELO: depth 2 Patricia
1400 ELO: 1000 node Patricia
1600 Elo: 1600 node Patricia
1800 Elo: depth 4 Patricia
2000 Elo: 4000 node Patricia
2200 Elo: 8000 node Patricia
2600 Elo: 64000 node Patricia
3200 Elo (or any other value that doesn't fit in the above categories): Full strength Patricia
I played a couple test games and had a friend play some test games against the weaker versions of Patricia. She put up a good fight and played in quite a human manner, except for perhaps being a bit too tactically sharp and for having an unfortunate tendency to throw all her pieces away when losing.
I also implemented go depth and go nodes, next up is a proper PV and multithreading.
go and star https://github.com/Adam-Kulju/Patricia!