Pedantic

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Pedantic

Post by Guenther »

adnoh wrote: Fri Jun 09, 2023 7:43 pm ...

Guenther: Here are the PGN, including a new 500 game match with Pedantic using its own opening book and full command line to the EXE

https://1drv.ms/f/s!Aqe2-2i2H5hegoBxZeC ... g?e=2eCfyI

...
I just checked quickly the first match you reported in this thread and you really used no own book + no start positions either (despite the cutechess json showing own book = true), this led to just two different opening lines, basically 1.e4 e5 when 0.2.1 is Black and 1. e4 d5 when 0.3.0 is Black with an early pawn sac for activity.

More crucible though seems to be that the depths look to be reversed very soon! In the beginning 0.3.0 was always several plies in depth ahead in all games (as expected), but very soon it changed to 0.3.0 being down often by 1-3 plies. This is not like it should be.
Probably I can look at it more closely again tomorrow with some stat tools I have.
https://rwbc-chess.de

[Trolls n'existent pas...]
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic

Post by JoAnnP38 »

Guenther wrote: Fri Jun 09, 2023 10:43 pm
adnoh wrote: Fri Jun 09, 2023 7:43 pm ...

Guenther: Here are the PGN, including a new 500 game match with Pedantic using its own opening book and full command line to the EXE

https://1drv.ms/f/s!Aqe2-2i2H5hegoBxZeC ... g?e=2eCfyI

...
I just checked quickly the first match you reported in this thread and you really used no own book + no start positions either (despite the cutechess json showing own book = true), this led to just two different opening lines, basically 1.e4 e5 when 0.2.1 is Black and 1. e4 d5 when 0.3.0 is Black with an early pawn sac for activity.

More crucible though seems to be that the depths look to be reversed very soon! In the beginning 0.3.0 was always several plies in depth ahead in all games (as expected), but very soon it changed to 0.3.0 being down often by 1-3 plies. This is not like it should be.
Probably I can look at it more closely again tomorrow with some stat tools I have.
There are two things going on -- one by design and one by necessity that might be causing the search depth disparity (and maybe a third that I will explain.) First, the newly designed game clock component is designed to use more time in its first 10 moves at the end of book moves trying to establish a positional advantage early. Also, 0.2.1 subdivides remaining time by 40 and uses that as its per move budget while 0.3 divides the remaining time by 30. Once again this makes more time available earlier in the game, but less in the later stages. The second thing going on is that in the end game (i.e., when total board material < 1600) 0.3 will clear its transposition table between searches. This reduces the efficiency of the search but masks a pesky bug that crept up late in the dev cycle that still hasn't been fixed. And lastly (and I'm still researching this) there may be an issue where after a game goes to the end game, the next game following it will have reduced search efficiency due to some turmoil going on with the transposition table because the way it was being cleared in the last game. So, if the last problem is truly a bug (which I haven't determined yet) restarting the engine after every game will fix the issue for now.

Last question from my side, are we really playing just two openings for 500 games? While it is not 100% guaranteed that the exact same exact game will be played when the same opponents are given the same opening over and over, it is still somewhat likely. Do you not need more variation in your opening?
Guenther
Posts: 4718
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: Pedantic

Post by Guenther »

Guenther wrote: Fri Jun 09, 2023 10:43 pm
adnoh wrote: Fri Jun 09, 2023 7:43 pm ...

Guenther: Here are the PGN, including a new 500 game match with Pedantic using its own opening book and full command line to the EXE

https://1drv.ms/f/s!Aqe2-2i2H5hegoBxZeC ... g?e=2eCfyI

...
I just checked quickly the first match you reported in this thread and you really used no own book + no start positions either (despite the cutechess json showing own book = true), this led to just two different opening lines, basically 1.e4 e5 when 0.2.1 is Black and 1. e4 d5 when 0.3.0 is Black with an early pawn sac for activity.

More crucible though seems to be that the depths look to be reversed very soon! In the beginning 0.3.0 was always several plies in depth ahead in all games (as expected), but very soon it changed to 0.3.0 being down often by 1-3 plies. This is not like it should be.
Probably I can look at it more closely again tomorrow with some stat tools I have.
It seems my wording was not precise enough, with the beginning I meant the first 10 games and not the beginning of games and later it changed
for all upcoming games (depth disparity not only for the first 10 moves, but mostly overall!). The hash turmoil thing for not being restarted might be
a candidate. BTW if I am not mistaken only cutechess-cli offers restarting between games, but the cutechess GUI doesn't.

How likely is it that 0.3.0 was leading 9.5:1.5/10:0/7:1 and then lost all three matches with margins between 37-45%?
https://rwbc-chess.de

[Trolls n'existent pas...]
adnoh
Posts: 76
Joined: Tue Jun 26, 2007 6:31 am
Full name: Charles Wong

Re: Pedantic

Post by adnoh »

Guenther wrote: Fri Jun 09, 2023 10:43 pm
adnoh wrote: Fri Jun 09, 2023 7:43 pm ...

Guenther: Here are the PGN, including a new 500 game match with Pedantic using its own opening book and full command line to the EXE

https://1drv.ms/f/s!Aqe2-2i2H5hegoBxZeC ... g?e=2eCfyI

...
I just checked quickly the first match you reported in this thread and you really used no own book + no start positions either (despite the cutechess json showing own book = true), this led to just two different opening lines, basically 1.e4 e5 when 0.2.1 is Black and 1. e4 d5 when 0.3.0 is Black with an early pawn sac for activity.

More crucible though seems to be that the depths look to be reversed very soon! In the beginning 0.3.0 was always several plies in depth ahead in all games (as expected), but very soon it changed to 0.3.0 being down often by 1-3 plies. This is not like it should be.
Probably I can look at it more closely again tomorrow with some stat tools I have.
Thanks Guenther for looking into this. Please note there is now a third set of results and PGNs where I allowed Pedantic to use its own books and a quick glance at the first few games seems to show more variety in openings.

You are very through in catching the apparent discrepancy between between no books and the settings in the JSON file. The reason? I renamed the Pedantic.bin file so it guarantees the program cannot find the book!!
adnoh
Posts: 76
Joined: Tue Jun 26, 2007 6:31 am
Full name: Charles Wong

Re: Pedantic

Post by adnoh »

Pedantic 0.3.0 vs Pedantic 0.2.1 results using BanksiaGUI is what is expected with a clear advantage for the newer version.

Tournament conditions are the same except I used concurrency 1 so the engine logs are easier to browse. 100 games, 10+0.1 seconds TC, Pedantic uses its own books and no adjudication.

Code: Select all

Result:
----------------------------------------------------------------------------
  #  name            games    wins   draws  losses   score    elo    +    -
  1. Pedantic 0.3.0    100      75      16       9      83    130   37   33
  2. Pedantic 0.2.1    100       9      16      75      17   -130   33   37

Cross table:
----------------------------------------------------------------------------
  #  name               score   games         1         2
  1. Pedantic 0.3.0        83     100         x      83.0
  2. Pedantic 0.2.1        17     100      17.0         x

Tech:
----------------------------------------------------------------------------

Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
  #  name              nodes/m         NPS  depth/m   time/m    moves     time
  1. Pedantic 0.3.0       366K     1722336     14.7      0.2     68.8     14.6
  2. Pedantic 0.2.1       361K     1536436     10.7      0.2     68.2     16.0
     all ---              355K     1625073     12.7      0.2     68.5     15.3
Results, PGN and engine logs are in the same folder starting with file name "BanksiaGUI - Pedantic 0.3.0 - Pedantic 0.2.1 (Concurrency 1)"

https://1drv.ms/f/s!Aqe2-2i2H5hegoBxZeC ... g?e=2eCfyI

I looked through the settings in BanksiaGUI and Cutechess GUI and could not find a setting in either Chess GUI to restart the engine after each game so if the differing results are because of the turmoil in the transposition table then it implies BanksiaGUI is in fact restarting the engine?
adnoh
Posts: 76
Joined: Tue Jun 26, 2007 6:31 am
Full name: Charles Wong

Re: Pedantic

Post by adnoh »

Pedantic 0.3.0 vs Pedantic 0.2.1 results using Cutechess GUI with engine logging on.

Tournament conditions are concurrency 1 so the engine logs are easier to browse. 100 games, 10+0.1 seconds TC, Pedantic uses its own books and no adjudication.

Code: Select all

Score of Pedantic 0.3.0 vs Pedantic 0.2.1: 33 - 48 - 19 [0.425]
...      Pedantic 0.3.0 playing White: 16 - 26 - 8  [0.400] 50
...      Pedantic 0.3.0 playing Black: 17 - 22 - 11  [0.450] 50
...      White vs Black: 38 - 43 - 19  [0.475] 100
Elo difference: -52.5 +/- 62.4, LOS: 4.8 %, DrawRatio: 19.0 %
100 of 100 games finished.

Player: Pedantic 0.3.0
   "Draw by 3-fold repetition": 1
   "Draw by fifty moves rule": 6
   "Draw by insufficient mating material": 12
   "Loss: Black mates": 26
   "Loss: White mates": 22
   "Win: Black mates": 17
   "Win: White mates": 16
Player: Pedantic 0.2.1
   "Draw by 3-fold repetition": 1
   "Draw by fifty moves rule": 6
   "Draw by insufficient mating material": 12
   "Loss: Black mates": 17
   "Loss: White mates": 16
   "Win: Black mates": 26
   "Win: White mates": 22
PGN and engine logs are in the same folder starting with file name "CutechessGUI - Pedantic 0.3.0 - Pedantic 0.2.1 (Concurrency 1)"

Cutechess GUI only seems to save the latest x line of the log so that is all I managed to capture.

https://1drv.ms/f/s!Aqe2-2i2H5hegoBxZeC ... g?e=2eCfyI
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic

Post by JoAnnP38 »

adnoh wrote: Sat Jun 10, 2023 2:28 am Pedantic 0.3.0 vs Pedantic 0.2.1 results using Cutechess GUI with engine logging on.

Tournament conditions are concurrency 1 so the engine logs are easier to browse. 100 games, 10+0.1 seconds TC, Pedantic uses its own books and no adjudication.

Code: Select all

Score of Pedantic 0.3.0 vs Pedantic 0.2.1: 33 - 48 - 19 [0.425]
...      Pedantic 0.3.0 playing White: 16 - 26 - 8  [0.400] 50
...      Pedantic 0.3.0 playing Black: 17 - 22 - 11  [0.450] 50
...      White vs Black: 38 - 43 - 19  [0.475] 100
Elo difference: -52.5 +/- 62.4, LOS: 4.8 %, DrawRatio: 19.0 %
100 of 100 games finished.

Player: Pedantic 0.3.0
   "Draw by 3-fold repetition": 1
   "Draw by fifty moves rule": 6
   "Draw by insufficient mating material": 12
   "Loss: Black mates": 26
   "Loss: White mates": 22
   "Win: Black mates": 17
   "Win: White mates": 16
Player: Pedantic 0.2.1
   "Draw by 3-fold repetition": 1
   "Draw by fifty moves rule": 6
   "Draw by insufficient mating material": 12
   "Loss: Black mates": 17
   "Loss: White mates": 16
   "Win: Black mates": 26
   "Win: White mates": 22
PGN and engine logs are in the same folder starting with file name "CutechessGUI - Pedantic 0.3.0 - Pedantic 0.2.1 (Concurrency 1)"

Cutechess GUI only seems to save the latest x line of the log so that is all I managed to capture.

https://1drv.ms/f/s!Aqe2-2i2H5hegoBxZeC ... g?e=2eCfyI
I found the problem. Expect a bug-fix release in the next 16-24 hours. I apologize for your frustrations, but thank you very much for helping me address this issue.
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic

Post by JoAnnP38 »

JoAnnP38 wrote: Sat Jun 10, 2023 5:14 am I found the problem. Expect a bug-fix release in the next 16-24 hours. I apologize for your frustrations, but thank you very much for helping me address this issue.
It looks like my fix will be closer to the 24 hr. mark than the 16 hr. mark. I know how to fix the issue; however, my first few tries, while they fixed the issue, and the Elo of Pedantic rose greatly in a non-restart environment, the fix actually lost 10-20 Elo against Pedantic 0.3 in a restart environment. I am currently trying another variation of the solution to try and lose no more than 5 Elo against Pedantic 0.3 in a restart environment. The last solution I tried ended up losing 7-8 Elo. I'm hoping the one I am now testing will be the charm. It currently takes a few hours for my system to run a 5-6K SPRT test, so if everything works out, I will have a patch available around 7-8pm EST.
adnoh
Posts: 76
Joined: Tue Jun 26, 2007 6:31 am
Full name: Charles Wong

Re: Pedantic

Post by adnoh »

Thanks JoAnn for the updates but please do not feel any urgency for the fix on my behalf. I am just glad I could contribute in a small way for you to identify an issue.
JoAnnP38
Posts: 253
Joined: Mon Aug 26, 2019 4:34 pm
Location: Clearwater, Florida USA
Full name: JoAnn Peeler

Re: Pedantic

Post by JoAnnP38 »

Announcement: Official release of Pedantic 0.3.1 is now available
This version includes a bug-fix for an issue that only affects Pedantic 0.3 whenever it is playing in a Chess GUI or tournament manager that does not restart the engines after every game. On systems where the Pedantic 0.3 chess engine is not restarted every game, it underperforms by 100s of Elo. The bug does not apparently affect Banksia, Arena where restart engines has been specified, or cutechess-cli where "restart=on" has been specified. The user may notice a slight performance increase over 0.3 when running on systems that restart engines (3-10 Elo.)

Download Pedantic 0.3.1.