Pedantic Developer's Log Stardate...

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate...

Post by JoAnnP38 »

Based on my self-play and Pedantic 0.2's CCRL blitz rating of 2474, one might suppose that Pedantic 0.3 Elo will be around 2785 +/- 35. However, I think it's mostly understood that self-play ratings are generally overrated. To help me form a more grounded estimate of Pedantic 0.3's strength I composed my own gauntlet and let Pedantic fight through its peers. Here are the results of that gauntlet:

Code: Select all

Rank Name                          Elo     +/-   Games    Wins  Losses   Draws   Points   Score    Draw
   0 Pedantic 0.3 (????)            -5      18    1176     453     470     253    579.5   49.3%   21.5%
   1 K2 0.99 (2783)                 93      71      84      46      24      14     53.0   63.1%   16.7%
   2 GreKo 2021.06 (2801)           89      63      84      39      18      27     52.5   62.5%   32.1%
   3 Zevra 2.5 (2689)               42      61      84      33      23      28     47.0   56.0%   33.3%
   4 Apotheosis 4.01 (2779)         33      67      84      37      29      18     46.0   54.8%   21.4%
   5 Maverick 1.5 (2641)            29      67      84      37      30      17     45.5   54.2%   20.2%
   6 Alaric 707 (2680)              21      69      84      38      33      13     44.5   53.0%   15.5%
   7 Leorik 2.2 (2727)               8      67      84      34      32      18     43.0   51.2%   21.4%
   8 WyldChess 1.51 (2650)           8      68      84      35      33      16     43.0   51.2%   19.0%
   9 Peacekeeper 1.40 (2734)        -4      62      84      28      29      27     41.5   49.4%   32.1%
  10 KnightX 3.7b (2727)           -12      67      84      32      35      17     40.5   48.2%   20.2%
  11 MadChess 3.1 (2696)           -12      69      84      34      37      13     40.5   48.2%   15.5%
  12 Princhess 0.11.0 (2617)       -42      70      84      31      41      12     37.0   44.0%   14.3%
  13 Glass 2.0 (2663)              -58      72      84      30      44      10     35.0   41.7%   11.9%
  14 Myrddin 0.89 (2590)          -125      67      84      16      45      23     27.5   32.7%   27.4%
I'm not sure if I'm doing this right, but the I suspect that the -5 +/-18 Elo rating for Pedantic achieved from the gauntlet needs to be applied either to the mean or median of the other engine's Elo in the gauntlet. In this case the mean or average is 2698 and the median is 2692. To be conservative I used the median so I would estimate that Pedantic's Elo rating for TC 60+0.5, 3-5 egtb for all engines that use them, and 128MB hash table is: 2687 +/-18.

Hopefully, someone who actually knows what they are doing will chime in a critique my calculations and hopefully provide a "correct" answer.
KhepriChess
Posts: 93
Joined: Sun Aug 08, 2021 9:14 pm
Full name: Kurt Peters

Re: Pedantic Developer's Log Stardate...

Post by KhepriChess »

Congrats on the new release and the very nice strength improvement!

If you saved the game outputs for that gauntlet, you can download something like https://github.com/michiguel/Ordo, run the games through it and tell it to output the ratings relative to a know one.
Puffin: Github
KhepriChess: Github
JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate...

Post by JoAnnP38 »

Announcement: Official release of Pedantic 0.3 is now available
I am happy to announce that the first official update to Pedantic is now available. Pedantic 0.3 is a UCI engine that I have tested in both Arena and CuteChess. This release is estimated to be stronger than v0.2 by about 214 Elo which results in overall playing strength of about 2687. I am looking forward to seeing how close my estimate is to the one CCRL eventually calculates.

This version includes the following updates:
  • It is no longer mandatory to pass the argument "uci" in order to use Pedantic in a chess GUI. This should alleviate issues where GUI does not provide a method to pass this argument.
  • The option MaxThreads has been removed from Pedantic's configuration. This option (or the more standard Threads option) will be re-enabled when Pedantic supports SMP.
  • Static Null Move Pruning (Reverse Futility Pruning)
  • Internal Iterative Deepening
  • Enhanced time management (results in stronger performance)
  • Enhanced static position evaluation (2x2 king buckets, enhanced king safety and stronger center control)
  • Faster search (shorter time to depth)
  • Will now used PEXT/PDEP instructions to enhance move generation if available on the user's system.
Download Pedantic 0.3.
JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate...

Post by JoAnnP38 »

Plans for Pedantic 0.4
So, after releasing Pedantic 0.3 and reviewing my future project plans, I will tentatively concentrate on the following items in Pedantic 0.4:
  • Syzygy End-Game Tablebase Support
  • Research/benchmark using SIMD to optimize evaluation function
  • Improve Search (Increase LMR depth support, Counter Move, History Gravity, Capture History)
  • Improve evaluation function (Increase king buckets from 2x2 to 4x4, Improve king attack/defense & center control, Queen oriented terms)
  • More UCI support (upper/lower bounds on info score and more config options (UCI_LimitStrength/UCI_Elo, UCI_Opponent, SyzygyPath)
  • Add moving piece to move. Also research/benchmark changing data structure for move from ulong to struct
Well that's the most important things. I pushed FRC 960 support & .NET 8 support to the 0.5 release because I want to concentrate more on the standard chess and .NET 8 isn't scheduled to be officially released until this November.

That's all for now. I have big nut to crack with EGTB support, so I am dealing with it first so that it doesn't slip to the 0.5 release.
JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate...

Post by JoAnnP38 »

The last time I posted, I reported that I would be working on end-game tablebase support. I am happy to say that this feature is now complete. I can now successfully probe Syzsygy tablebase files which would normally provide about +10 Elo in additional playing strength, but if playing in an adjudicated tournament, its advantage is much smaller. However, the feature is complete and will provide me and hopefully others with enhanced analysis capability if nothing else.

The tablebase feature was implemented using the Microsoft C++/CLI compiler which produces mixed managed/unmanaged (or native) assemblies. This allowed me to create a standalone assembly that implements a static Syzsygy class that wraps the calls to the Sysygy probing code written in C/C++. Unlike other .NET assemblies, .NET is unable to embed a mix-mode assembly in its single-file deployment therefore it must be provided separately in addition to Pedantic.exe. Moreover, the C/C++ runtime DLL must also be provided (C++/CLI cannot be statically linked with the runtime libraries) as well as a .NET host DLL. While the engine's entry will continue to be Pedantic.exe, there will now be few more files provided as part of its deliverable. The contents of the new zip file provided when 0.4 is release will look similar to this:

Code: Select all

03/24/2023  09:58 PM           155,256 Ijwhost.dll
01/29/2023  11:43 PM        11,193,520 Pedantic.bin
06/18/2023  06:19 PM           249,344 Pedantic.Chess.pdb
06/18/2023  06:19 PM            42,496 Pedantic.Collections.pdb
06/18/2023  06:19 PM        13,753,836 Pedantic.exe
06/18/2023  06:19 PM            30,208 Pedantic.Genetics.pdb
06/18/2023  06:19 PM            56,832 Pedantic.pdb
06/18/2023  06:18 PM           178,176 Pedantic.Tablebase.dll
06/18/2023  06:18 PM         1,134,592 Pedantic.Tablebase.pdb
06/18/2023  06:19 PM            30,208 Pedantic.Utilities.pdb
05/18/2023  10:10 AM           109,440 vcruntime140.dll
              11 File(s)     26,933,908 bytes
NOTE: My solution would appear to be a Windows-only solution. I do not believe that the Pedantic.Tablebase.dll assembly can be used in other environments even where the CPU target is the same (ex. Linux.)

On another front, Pedantic 0.3.1 is currently busy competing in the the CCRL Division 9 40/15 tournment and is doing remarkably well. I still nickpick about its play, but it has managed to work its way into first place where it currently remains. Anyone wanting to join us in watching the Division 9 engines duke it out you are welcome to join us.
JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate... (Training Data)

Post by JoAnnP38 »

Training Data

I thought it might be interesting to review the training data that Pedantic uses to tune its HCE. Like most, my evaluation function is optimized using a logit regression model composed of millions of chess positions from my ever growing training set. It's funny, that just because I don't use a neural network to evaluate Pedantic's board positions my evaluation is classified as "hand-crafted". The only piece that is "hand-crafted" is the decision of which board features to evaluate. From that point on its really up to my tuner and training data set to define my evaluation. Granted this seems to have worked rather well so far, but I have to admit that much of my progress has been haphazard. For instance, I don't restrict the size of my training data set. The entire data set currently contains about 45 million positions and is growing. It is currently an amalgamation of the following games:
  • Various engine games where both participants are rated 2800 and higher (6.6M)
  • Recent games from Lichess where both participants are rated 2500 or higher (5.0M)
  • OTB Grandmaster games where both participants are rated 2600 or higher (8.9M)
  • Older online games of an unknown origin where participants are rated 2500 or higher (17.6 million)
  • Pedantic self-play & tournament play against other similarly rated engines (7.3 million)
As the Pedantic self-play/tournament play file grows it will start replacing other datasets. The first file that will probably be replaced is the one with games from Lichess, and then the other online games followed by the GM games and finally the other engine games. In the end I hope to have only data where Pedantic is at least one of the opponents in each game.

You might ask, "how can you tune with so much data?" The answer to that question is that I don't. When I start a tuning session I specify the sample size and a random sample is selected from my overall data set. A typical training set sample size is 10-12 million, but I have trained with up to 16 million. After being shuffled to further randomize the location of records in the training sample, only then does the tuner start. It turns out that the "shuffle" step is very important because I use a "mini-batch" enhancement to increase the performance of my tuner. Shuffling reduces the probability that positions from the same game will appear in the same mini-batch and increases the effectiveness of the mini-batch approach.

To stabilize the performance of Pedantic as new releases are issued, in release 0.4 Pedantic will start saving the sample created for a particular tuning session. I will start saving these data sets so that I can retune with the exact same data in the future as-needed. Of course that will take some of the fun out of it for me because I am always curious to see how Pedantic's game play changes with each new tuning session. Also, I am working hard to increase my entire training set to over 100 million positions to further reduce the probability that more than a handful of positions from the same game will be used.

Okay, that's all for now.
adityachandra
Posts: 22
Joined: Sun Apr 23, 2023 9:26 am
Full name: Aditya Chandra

Re: Pedantic Developer's Log Stardate... (Training Data)

Post by adityachandra »

JoAnnP38 wrote: Wed Jun 28, 2023 1:15 am Training Data

I thought it might be interesting to review the training data that Pedantic uses to tune its HCE. Like most, my evaluation function is optimized using a logit regression model composed of millions of chess positions from my ever growing training set. It's funny, that just because I don't use a neural network to evaluate Pedantic's board positions my evaluation is classified as "hand-crafted". The only piece that is "hand-crafted" is the decision of which board features to evaluate. From that point on its really up to my tuner and training data set to define my evaluation. Granted this seems to have worked rather well so far, but I have to admit that much of my progress has been haphazard. For instance, I don't restrict the size of my training data set. The entire data set currently contains about 45 million positions and is growing. It is currently an amalgamation of the following games:
  • Various engine games where both participants are rated 2800 and higher (6.6M)
  • Recent games from Lichess where both participants are rated 2500 or higher (5.0M)
  • OTB Grandmaster games where both participants are rated 2600 or higher (8.9M)
  • Older online games of an unknown origin where participants are rated 2500 or higher (17.6 million)
  • Pedantic self-play & tournament play against other similarly rated engines (7.3 million)
As the Pedantic self-play/tournament play file grows it will start replacing other datasets. The first file that will probably be replaced is the one with games from Lichess, and then the other online games followed by the GM games and finally the other engine games. In the end I hope to have only data where Pedantic is at least one of the opponents in each game.

You might ask, "how can you tune with so much data?" The answer to that question is that I don't. When I start a tuning session I specify the sample size and a random sample is selected from my overall data set. A typical training set sample size is 10-12 million, but I have trained with up to 16 million. After being shuffled to further randomize the location of records in the training sample, only then does the tuner start. It turns out that the "shuffle" step is very important because I use a "mini-batch" enhancement to increase the performance of my tuner. Shuffling reduces the probability that positions from the same game will appear in the same mini-batch and increases the effectiveness of the mini-batch approach.

To stabilize the performance of Pedantic as new releases are issued, in release 0.4 Pedantic will start saving the sample created for a particular tuning session. I will start saving these data sets so that I can retune with the exact same data in the future as-needed. Of course that will take some of the fun out of it for me because I am always curious to see how Pedantic's game play changes with each new tuning session. Also, I am working hard to increase my entire training set to over 100 million positions to further reduce the probability that more than a handful of positions from the same game will be used.

Okay, that's all for now.
Just curious, how long has your tuning process taken on 10-12 million positions without mini-batches (on my laptop even 1 million takes 2 days...)? Also do you start the tuning from scratch or have some starting values?
JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate... (Training Data)

Post by JoAnnP38 »

adityachandra wrote: Mon Jul 03, 2023 7:40 pm Just curious, how long has your tuning process taken on 10-12 million positions without mini-batches (on my laptop even 1 million takes 2 days...)? Also do you start the tuning from scratch or have some starting values?
It typically takes about 3-4 minutes per iteration (epoch?) and I usually only retune when I add something new to my evaluation function. Then I start with whatever values/weights that came from my prior tuning plus my own guess for the new weights. Typically, this will take about 50-80 iterations (a few of them fail because of fidelity loss of the mini-batch with the data set, but I retry those and eventually succeed when the next mini-batch has better fidelity. I will exit the tunning process when one of the following occurs:

a) When the accuracy of my error reaches a certain point (by default this is within 0.00000005).
b) When the tuning fails to lower error 5 times in a row.
c) When the maximum iteration I specified when starting the tuning session (usually around 100-200).

Assuming a maximum iterations X minutes/iteration (i.e. 80 X 4) that ends up being around 5.33 hours. I think many people have tuners that can do this even faster. Currently my evaluation has over 3000 weights that have to be tuned and as my evaluation function grows it also causes the time taken to tune to grow as well. If your tuner is taking 2 days for 1 million positions, you definitely have some optimization to do. Here are some ideas.

1) Most of your values will be zero. Use sparse arrays and only store values > 0.
2) Use SIMD to calculate the final dot-product for your opening/endgame phases.
3) Use multi-threading to calculate sum of square of errors (split positions into slices, one slice per thread, and add up the errors from each thread)
4) Don't reevaluate a position each time you run the calculation. Extract all the features from a position one time and then use them over and over when calculating the error. This means you don't reload the FEN from your data ever time but extract just the minimum amount of information from each position and save them in a vector to be used over and over.

You are always welcome to look at my implementation which is very messy, but it works.
JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate...

Post by JoAnnP38 »

Output from my last optimization session (not the weights):

The time it took for the tuning session looks like it took about 8 hours, much longer than the 5 hours I quoted earlier.

Code: Select all

pedantic learn --data Pedantic_Sample_12000000_20230628_162041.csv --iter 100
Sample size: 12000000, Start time: 7/3/2023 9:19:16 AM
Loading data file...
Loading 12000000 of 12000000 (100%)
Shuffling data...
K = 0.81
Pass stats   0 - ε: 0.111053
Pass stats   1 - ε: 0.111034, δt: 03:28, eff: 0.408
Pass stats   2 - ε: 0.111006, δt: 03:21, eff: 0.395
Pass stats   3 - ε: 0.110987, δt: 03:18, eff: 0.409
Pass stats   4 - ε: 0.110966, δt: 03:16, eff: 0.415
Pass stats   5 - ε: 0.110944, δt: 03:18, eff: 0.405
Pass stats   6 - ε: 0.110933, δt: 03:17, eff: 0.412
Pass stats   7 - ε: 0.110909, δt: 03:16, eff: 0.399
Pass stats   8 - ε: 0.110896, δt: 03:19, eff: 0.399
Pass stats   9 - ε: 0.110885, δt: 03:21, eff: 0.397
Pass stats  10 - ε: 0.110860, δt: 03:21, eff: 0.403
Pass stats  11 - ε: 0.110852, δt: 03:22, eff: 0.401
Pass stats  12 - ε: 0.110835, δt: 03:23, eff: 0.396
Pass stats  13 - ε: 0.110827, δt: 03:27, eff: 0.401
Pass stats  14 - ε: 0.110818, δt: 03:22, eff: 0.408
Pass stats  15 - ε: 0.110804, δt: 03:27, eff: 0.404
Pass stats  16 - ε: 0.110799, δt: 03:25, eff: 0.387
Pass stats  17 - ε: 0.110787, δt: 03:25, eff: 0.395
Pass stats  18 - ε: 0.110780, δt: 03:31, eff: 0.383
Pass stats  19 - ε: 0.110773, δt: 03:27, eff: 0.392
Pass stats  20 - ε: 0.110763, δt: 03:28, eff: 0.390
Pass stats  21 - ε: 0.110757, δt: 03:27, eff: 0.386
Pass stats  22 - ε: 0.110754, δt: 03:28, eff: 0.394
Pass stats  23 - ε: 0.110753, δt: 03:27, eff: 0.391
Pass stats  24 - ε: 0.110749, δt: 03:28, eff: 0.391
Pass stats  25 - ε: 0.110741, δt: 03:27, eff: 0.407
Pass stats  26 - ε: 0.110732, δt: 03:26, eff: 0.397
Pass stats  27 - ε: 0.110723, δt: 03:30, eff: 0.383
Pass stats  28 - ε: 0.110717, δt: 03:27, eff: 0.395
Pass stats  29 - ε: 0.110712, δt: 03:24, eff: 0.390
Pass stats  29 - ε: 0.110713, δt: 03:24, NO IMPROVEMENT
Pass stats  29 - ε: 0.110716, δt: 03:24, NO IMPROVEMENT
Pass stats  30 - ε: 0.110708, δt: 03:23, eff: 0.391
Pass stats  31 - ε: 0.110707, δt: 03:22, eff: 0.389
Pass stats  32 - ε: 0.110705, δt: 03:24, eff: 0.397
Pass stats  33 - ε: 0.110702, δt: 03:24, eff: 0.374
Pass stats  33 - ε: 0.110705, δt: 03:24, NO IMPROVEMENT
Pass stats  34 - ε: 0.110695, δt: 03:23, eff: 0.388
Pass stats  35 - ε: 0.110693, δt: 03:23, eff: 0.388
Pass stats  36 - ε: 0.110689, δt: 03:22, eff: 0.383
Pass stats  37 - ε: 0.110683, δt: 03:23, eff: 0.382
Pass stats  37 - ε: 0.110687, δt: 03:21, NO IMPROVEMENT
Pass stats  38 - ε: 0.110683, δt: 03:24, eff: 0.373
Pass stats  39 - ε: 0.110680, δt: 03:25, eff: 0.389
Pass stats  40 - ε: 0.110679, δt: 03:24, eff: 0.380
Pass stats  41 - ε: 0.110678, δt: 03:23, eff: 0.388
Pass stats  42 - ε: 0.110675, δt: 03:22, eff: 0.404
Pass stats  43 - ε: 0.110674, δt: 03:25, eff: 0.382
Pass stats  44 - ε: 0.110668, δt: 03:30, eff: 0.382
Pass stats  45 - ε: 0.110663, δt: 03:27, eff: 0.390
Pass stats  46 - ε: 0.110661, δt: 03:25, eff: 0.369
Pass stats  46 - ε: 0.110662, δt: 03:28, NO IMPROVEMENT
Pass stats  47 - ε: 0.110656, δt: 03:34, eff: 0.378
Pass stats  47 - ε: 0.110658, δt: 03:27, NO IMPROVEMENT
Pass stats  48 - ε: 0.110655, δt: 03:28, eff: 0.374
Pass stats  49 - ε: 0.110655, δt: 03:27, eff: 0.385
Pass stats  50 - ε: 0.110654, δt: 03:29, eff: 0.375
Pass stats  51 - ε: 0.110650, δt: 03:27, eff: 0.385
Pass stats  52 - ε: 0.110645, δt: 03:29, eff: 0.378
Pass stats  53 - ε: 0.110645, δt: 03:29, eff: 0.377
Pass stats  54 - ε: 0.110639, δt: 03:29, eff: 0.381
Pass stats  54 - ε: 0.110642, δt: 03:30, NO IMPROVEMENT
Pass stats  55 - ε: 0.110633, δt: 03:25, eff: 0.378
Pass stats  55 - ε: 0.110636, δt: 03:26, NO IMPROVEMENT
Pass stats  55 - ε: 0.110633, δt: 03:25, NO IMPROVEMENT
Pass stats  55 - ε: 0.110635, δt: 03:24, NO IMPROVEMENT
Pass stats  55 - ε: 0.110639, δt: 03:25, NO IMPROVEMENT
Pass stats  56 - ε: 0.110631, δt: 03:24, eff: 0.377
Pass stats  56 - ε: 0.110634, δt: 03:32, NO IMPROVEMENT
Pass stats  56 - ε: 0.110632, δt: 03:31, NO IMPROVEMENT
Pass stats  56 - ε: 0.110636, δt: 03:32, NO IMPROVEMENT
Pass stats  56 - ε: 0.110634, δt: 03:34, NO IMPROVEMENT
Pass stats  56 - ε: 0.110631, δt: 03:28, NO IMPROVEMENT
Optimization complete at: 7/3/2023 5:30:37 PM, Elapsed: 4:11:22.8866073
JoAnnP38
Posts: 250
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic Developer's Log Stardate...

Post by JoAnnP38 »

JoAnnP38 wrote: Tue Jul 04, 2023 1:40 am Output from my last optimization session (not the weights):

The time it took for the tuning session looks like it took about 8 hours, much longer than the 5 hours I quoted earlier.
There was a bug in my code for writing the end time. The start time was output as local time and the output was output as UtcTime. The previous estimate of 5 hours is probably correct.

Image