Pedantic Developer's Log Stardate...

JoAnnP38 · Post by **JoAnnP38** » Wed Mar 08, 2023 6:56 pm

lithander wrote: ↑Wed Mar 08, 2023 10:28 am When I'm trying to concentrate I find the noise small notebook fans make very irritating. That's the main reason I still use a "bulky" desktop with proper cooling. I suspect your mini-PCs are pretty loud under full load with the power they pack into such a small form factor?

LOL, living in Florida my air conditioner runs non-stop 9 months out of a year and I have three HEPA air filters running as well throughout my house. Yes, when running at high load you can hear the fan, but if it were mounted underneath the top of my desk, underneath the desk or attached behind my monitors via the VESA mounting screws I'm sure it would be less. I just like having them close, so I have easy access. But with all of the other ambient noise I have in my home I don't notice it. But if you live in a noise free zone, then I'm sure you might notice it.

JoAnnP38 · Post by **JoAnnP38** » Sun Mar 12, 2023 1:50 pm

Note to Self
At your age maybe you shouldn't be burning the early morning oil any longer. *sigh*

I just lost a week of changes due to a failed push to Github due to a file being over 100MB. I immediately went to revert the local commit only to remember too late that reverting back to a previous branch would delete all my changes. And with my new PC I don't have scheduled scripts to make backups of my repository. Now I have to redo all those changes I made to support optimizing my evaluation (and a few bug fixes if I can remember them.) Luckily (if I dare to use that word) I recovered my "changes" via the dotPeek decompiler. Unfortunately, much of the code is gobbly-gook with variable names being transformed into such lovely names as "num2". Well, you know what they say, if you want something done right -- do it twice.

JoAnnP38 · Post by **JoAnnP38** » Sat Mar 18, 2023 6:23 pm

After re-coding my lost functionality for evaluation tuning, I have decided to undertake a refactoring of my evaluation function. Looking at my coding from the advantage point of replicating much of its functionality for the tuning function, I notice that much of it was convoluted and inefficient. Also, I've decided to simplify my evaluation by removing the following functionality:

Pawns blocked by same side pieces.
KS/QS Pawn Majorities (tuning revealed only KS majority was important, but I decided to can it all the same)
Connected or passed pawns by rank (changing this to a simple bonus). Rank bonuses can be handled by the PST.
Penalty for King not being able to stop passed pawn promotion (outside it's "promotion square")

And I am adding the following functionality:

Bonus for rooks on open file
Bonus for doubled rooks
Adding a per-piece mobility weight instead of treating all mobility the same.
Adding a mop-up replacement for the King PST that replaces the standard end game PST when there are no pawns on the board.
Adding a king-to-king distance weight that rewards the winning side for minimizing this distance during mop-up.

Other functionality that I am investigating:

End game table bases
Different piece square tables for each position of the enemy king (actually divide the board into 2x2 areas and that will define the king's position). Having 16 king positions seems less daunting than 64 and should require less training data.
Change my version of the Texel turning to use Gradient Descent with mini batches. Should allow speed-up of tuning and allow processing of more data. Currently a 2.5 million position tuning requires about 8 hours to complete. Randomly selecting a mini-batch of 10-25K when calculating the next version of a weight will be much faster than using the entire 2.5 million position data set. However, over the course the tuning session each mini batch will be used.
Look at ML.NET (for the first time) to see if there is any chance it could be used to replace my evaluation function. I'm guessing it would have many more tools to train a logistic regression than I have currently built.

JoAnnP38 · Post by **JoAnnP38** » Wed Mar 22, 2023 1:55 pm

JoAnnP38 wrote: ↑Sat Mar 18, 2023 6:23 pm Removing the following functionality:

Pawns blocked by same side pieces. DONE

KS/QS Pawn Majorities (tuning revealed only KS majority was important, but I decided to can it all the same) DONE

Connected or passed pawns by rank (changing this to a simple bonus). Rank bonuses can be handled by the PST. DONE

Penalty for King not being able to stop passed pawn promotion (outside it's "promotion square") DONE

And I am adding the following functionality:

Bonus for rooks on open file DONE

Bonus for doubled rooks DONE

Adding a per-piece mobility weight instead of treating all mobility the same. DONE

Adding a mop-up replacement for the King PST that replaces the standard end game PST when there are no pawns on the board.

Adding a king-to-king distance weight that rewards the winning side for minimizing this distance during mop-up.

I've made a lot of progress the past couple of days with Pedantic's estimated Elo rising from 1600 to over 2000. While I continue to fine tune the evaluation function, it was the correction of a long standing and pesky bug that did most of the heavy lifting. I am using staged move generation that had a bug when it was generating the killer moves. Pawn moves listed as a killer would sometimes not be generated resulting in the engine ignoring those moves altogether. I had been banging my head on the wall trying to figure out why it was ignoring a passed pawn's march to promotion! Now I know. If I had replaced the move generation in my Perft test with the newer staged generation I would have caught this much sooner.

This is what I believe is left to be done before an initial release:

Flesh out and tidy up my unit test suite.
Create an informative Readme.md.
Go back through and regression test all of the command-line functionality.

JoAnnP38 · Post by **JoAnnP38** » Wed Mar 29, 2023 9:08 pm

I am very close to my first official release of Pedantic 0.1. I am doing some final testing in cutechess so I can at least give some indication of playing strength. Find a cohort of engines with Elo in the range of 2150 - 2400 that run reliably in cutechess for 100s of games has been challenging. However, three engines have stood out from the pack in terms of preserving all their prior releases and being very reliable. If any of these are in the Elo range you are looking for you can't go wrong with them:

At this point I'm hoping the following group can run without error in a 300 game/round, round-robin tournament:

2421 - Gaia 3.5
2385 - Horizon 4.4
2371 - MadChess 2.00
2310 - GOOB 1.0.0
2301 - Absolute Zero 2.4.7.2
2285 - Glass 1.2
2279 - CeeChess 1.3.2
2251 - Beowulf 2.4a
2244 - MinimalChess 0.5
2200 - MORA 1.1.0
???? - Pedantic 0.1 (of course!)

mvanthoor · Post by **mvanthoor** » Thu Mar 30, 2023 10:13 pm

JoAnnP38 wrote: ↑Mon Jan 09, 2023 2:55 pm Question to the group -- does anyone know of any UCI engines rated between 1800 and 2200? I'm guessing this is the around the ELO strength of Pedantic, but I really need to see it win some games before I can be sure.

There are lots: http://ccrl.chessdom.com/ccrl/404/

The currently rated versions of my own engine Rustic are between 1700 and 1920 Elo, depending on the version you pick.

Rustic 4 is expected to hit somewhere between 2150 and 2210 Elo (adding a tapered/tuned evaluation on top of Alpha 3), but that one isn't released yet. You could compile the 4.0-beta branch from my Github repository though, if you want to try it.

JoAnnP38 · Post by **JoAnnP38** » Fri Mar 31, 2023 8:48 am

mvanthoor wrote: ↑Thu Mar 30, 2023 10:13 pm
JoAnnP38 wrote: ↑Mon Jan 09, 2023 2:55 pm Question to the group -- does anyone know of any UCI engines rated between 1800 and 2200? I'm guessing this is the around the ELO strength of Pedantic, but I really need to see it win some games before I can be sure.
There are lots: http://ccrl.chessdom.com/ccrl/404/

There are not as many as you would think. Several reasons have led to me having to exclude many engines:

1) Engine's website no longer being active.
2) Engine having problems with my environment.
3) Engine throwing errors causing premature end of game (and match)
4) Engine making a false claim about the status of the game resulting in a forfeit.

Many of these engines have not been in active development for years (5-10) so they have grown creaky as things change around them.

mvanthoor · Post by **mvanthoor** » Fri Mar 31, 2023 9:02 am

JoAnnP38 wrote: ↑Fri Mar 31, 2023 8:48 am 1) Engine's website no longer being active.
2) Engine having problems with my environment.
3) Engine throwing errors causing premature end of game (and match)
4) Engine making a false claim about the status of the game resulting in a forfeit.

Many of these engines have not been in active development for years (5-10) so they have grown creaky as things change around them.

That is true enough, but I did manage to find a few. Engines I've used are Celestial, TSCP... I'd have to look them up.

But, it is indeed one of the reasons why I didn't try to develop my engine to 2000+ in one go, but start out with the most basic engine and release versions from there. I also maintain the older versions.

lithander · Post by **lithander** » Fri Mar 31, 2023 1:06 pm

JoAnnP38 wrote: ↑Fri Mar 31, 2023 8:48 am
mvanthoor wrote: ↑Thu Mar 30, 2023 10:13 pm
JoAnnP38 wrote: ↑Mon Jan 09, 2023 2:55 pm Question to the group -- does anyone know of any UCI engines rated between 1800 and 2200? I'm guessing this is the around the ELO strength of Pedantic, but I really need to see it win some games before I can be sure.
There are lots: http://ccrl.chessdom.com/ccrl/404/
There are not as many as you would think. Several reasons have led to me having to exclude many engines:

1) Engine's website no longer being active.
2) Engine having problems with my environment.
3) Engine throwing errors causing premature end of game (and match)
4) Engine making a false claim about the status of the game resulting in a forfeit.

Many of these engines have not been in active development for years (5-10) so they have grown creaky as things change around them.

You use MinimalChess and MadChess already so I understand if you don't want another C# engine. But Leorik 1.0 should otherwise be interesting for you. It doesn't have any of the bugs or issues you list above and it's interesting because it gives you the same results as a min-max algorithm just faster. In other words it never reduces or prunes and thus never misses a thing unless it's beyond the search horizon. Search depth of course ain't great if you have no selectivity whatsoever. But I think this is a rather unique property for an engine. It was 2150 CCRL Blitz Elo last I checked. Could be more now as many engines in that list have seen their Elo go up by ~50 points recently.

JoAnnP38 · Post by **JoAnnP38** » Fri Mar 31, 2023 2:42 pm

Here are the results of my round-robin tournament using the following parameters:
Ponder off, Common book (ECO.pgn to 20-ply (*)), no tablebases
Time control: 40s+0.4s for entire game on AMD Ryzen 9 6900HX CPU

Code: Select all

Rank Name                          Elo     +/-   Games    Wins  Losses   Draws   Points   Score    Draw
   1 Gaia 3.5                      107      15    1800    1045     507     248   1169.0   64.9%   13.8%
   2 Absolute Zero 2.4.7.2          63      15    1800     901     577     322   1062.0   59.0%   17.9%
   3 MadChess 2.00                  55      15    1800     890     605     305   1042.5   57.9%   16.9%
   4 Glass 1.2                      38      15    1800     844     646     310    999.0   55.5%   17.2%
   5 Horizon 4.4                    18      15    1800     782     691     327    945.5   52.5%   18.2%
   6 Pedantic 0.1                  -35      14    1800     629     808     363    810.5   45.0%   20.2%
   7 CeeChess 1.3.2                -40      14    1800     619     824     357    797.5   44.3%   19.8%
   8 GOOB 1.0.0                    -53      15    1800     593     864     343    764.5   42.5%   19.1%
   9 MinimalChess 0.5              -55      15    1800     598     882     320    758.0   42.1%   17.8%
  10 Beowulf 2.4a                  -98      15    1800     491     988     321    651.5   36.2%   17.8%

Finished match

Since Absolute Zero had the most precise rating from the CCRL Blitz scores of 2374 +/- 15 (over 1591 games), I used that as the reference for my Elo scores as predicted by cutechess and here are those results:

Code: Select all

                        CCRL BLITZ                          cutechess est.
Gaia 3.5                2433 +/-21 883                      2418
Absolute Zero 2.4.7.2   2374 +/-15 1591 <= Tie to average   2374
MadChess 2.00                                               2366
Glass 1.2                                                   2349
Horizon 4.4             2409 +/-23 682                      2329
Pedantic 0.1                                                2276
CeeChess 1.3.2          2266 +/-21 855                      2271
GOOB 1.0.0              2313 +/-17 1241                     2258
MinimalChess 0.5        2296 +/-19 986                      2256
Beowulf 2.4a            2268 +/-28 474                      2213

And just to double check I thought I would run the resulting PGN file through Ordo to see how it calculates the Elo ratings:

Code: Select all

>ordo -p elo-gauntlet-2023-03-29.pgn -a 2374 -A "AbsoluteZero 2.4.7.2" -W -o ratings.txt

   # PLAYER                  : RATING    POINTS  PLAYED    (%)
   1 Gaia 3.5                : 2414.5    1169.0    1800   64.9%
   2 AbsoluteZero 2.4.7.2    : 2374.0    1062.0    1800   59.0%
   3 MadChess 2.00           : 2366.8    1042.5    1800   57.9%
   4 Glass 1.2               : 2350.9     999.0    1800   55.5%
   5 Horizon 4.4             : 2331.4     945.5    1800   52.5%
   6 Pedantic 0.1            : 2282.6     810.5    1800   45.0%
   7 CeeChess 1.3.2          : 2277.9     797.5    1800   44.3%
   8 GOOB 1.0.0              : 2265.8     764.5    1800   42.5%
   9 MinimalChess 0.5        : 2263.4     758.0    1800   42.1%
  10 Beowulf 2.4a            : 2223.4     651.5    1800   36.2%

White advantage = 19.80
Draw rate (equal opponents) = 50.00 %

So these are the ratings of my chosen Pedantic cohort with Pedantic doing about as well as expected. (I say that because I endeavored to choose engines in the cohort such that I had an equal number of stronger engines and weaker engines. Good job at least on that front. An Elo of 2276 (2282) in hyper-bullet chess is lower than I was initially shooting for, but I wanted to get my first initial release packaged up so I could start planning my second release. I am going to do one more test with a five-engine cohort with longer time controls (180/40+6) just to make sure things don't fall apart for Pedantic. In the meantime I will be prepping my release in GitHub.

(*) BTW, you may recognize the file that I am using for my openings. It is the same file that pgn-extract uses to classify openings using the ECO system. This means it has well over 1000 different opening lines (although some may only be 3 ply long) and they don't guarantee that the position will be balanced. I think that's okay because I setup cutechess to make each player play both sides of the opening. This also leads to a good deal of variation in my PGN making it ideal for integrating with my training data!

Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...

Re: Pedantic Developer's Log Stardate...