New engine releases & news 2021

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
flok
Posts: 606
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: New engine releases & news 2021

Post by flok »

For the CCRL testers: don't bother (yet) to test CaffeinatedPawn: it is very weak still.

Code: Select all

   # PLAYER         :  RATING  POINTS  PLAYED   (%)
   1 dorpsgek       :  2195.0   976.5    1190    82
   2 Embla-5142     :  1937.6   356.0     860    41
   3 Embla-5140     :  1924.5   122.5     611    20
   4 tscp-1.8.1c    :  1627.1   135.0     279    48
   5 CP-a090da4     :  1286.7    18.5     248     7    <---
   6 CP-d031f3a     :  1124.8     0.5      30     2
For others: https://vanheusden.com/CaffeinatedPawn/ or https://github.com/folkertvanheusden/CaffeinatedPawn/
User avatar
Gabor Szots
Posts: 1477
Joined: Sat Jul 21, 2018 7:43 am
Location: Budapest, Hungary
Full name: Gabor Szots

Re: New engine releases & news 2021

Post by Gabor Szots »

flok wrote: Sat Oct 16, 2021 9:32 am For the CCRL testers: don't bother (yet) to test CaffeinatedPawn: it is very weak still.
Looking at our lists you can see that strength is not a main concern for us.
Gabor Szots
CCRL testing group
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: New engine releases & news 2021

Post by mvanthoor »

Gabor Szots wrote: Sat Oct 16, 2021 8:56 am So in the case of Princhess the rating you find at CCRL is not an overestimation owing to adjudicating wins when it may not have been able to win. Either it checkmated the opponent or the opponent resigned of its own accord.
Adjudication is not the same as resigning on own accord. Adjudication is like a referee saying:

Ref: "Hm... White is up 10 points of material here. Black, do you agree with thtat?"
Black: "Yes."
Ref: "Game over, white wins."

If you score games like that, and it's even combined with using Syzygy for mating (which would be a weak player consulting a book to look up K+R vs K mate), you _will_ overestimate the real strength of the engine. It's also obvious why, because the engine gets a lot of outside help: adjudication on material advantage, and book knowledge about endgames through the table bases. Reaching a winning position is enough to either win the game by referee, or by looking up how to actually win it.

At the moment, Princhess is like that generally strong club player that gets a winning position against about anyone else in the club, but then doesn't have the knowledge on how to finish it off. And after that becomes common knowledge, no-one will ever resign against this player. He or she will therefore probably not become the club champion.

So yes, Princhess _would_ be around 1815 - 1865, if the engine could indeed win the winning positions.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
Gabor Szots
Posts: 1477
Joined: Sat Jul 21, 2018 7:43 am
Location: Budapest, Hungary
Full name: Gabor Szots

Re: New engine releases & news 2021

Post by Gabor Szots »

mvanthoor wrote: Sat Oct 16, 2021 12:07 pm Adjudication is not the same as resigning on own accord.
That's exectly what I said.
Gabor Szots
CCRL testing group
amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: New engine releases & news 2021

Post by amanjpro »

mvanthoor wrote: Sat Oct 16, 2021 12:07 pm
Gabor Szots wrote: Sat Oct 16, 2021 8:56 am So in the case of Princhess the rating you find at CCRL is not an overestimation owing to adjudicating wins when it may not have been able to win. Either it checkmated the opponent or the opponent resigned of its own accord.
Adjudication is not the same as resigning on own accord. Adjudication is like a referee saying:

Ref: "Hm... White is up 10 points of material here. Black, do you agree with thtat?"
Black: "Yes."
Ref: "Game over, white wins."

If you score games like that, and it's even combined with using Syzygy for mating (which would be a weak player consulting a book to look up K+R vs K mate), you _will_ overestimate the real strength of the engine. It's also obvious why, because the engine gets a lot of outside help: adjudication on material advantage, and book knowledge about endgames through the table bases. Reaching a winning position is enough to either win the game by referee, or by looking up how to actually win it.

At the moment, Princhess is like that generally strong club player that gets a winning position against about anyone else in the club, but then doesn't have the knowledge on how to finish it off. And after that becomes common knowledge, no-one will ever resign against this player. He or she will therefore probably not become the club champion.

So yes, Princhess _would_ be around 1815 - 1865, if the engine could indeed win the winning positions.

Everyone is free to come up with their rule... And adjournment, using egtb books are generally the most widely accepted rules. I see no harm in using them for rating lists, they make the games significantly shorter, makes it much easier to test more engines with more games


On the other hand, for engine developers, I agree with you, dev probably needs to make sure that the engine can win games, not necessarily in the most correct way, but somehow can win
User avatar
Brunetti
Posts: 424
Joined: Tue Dec 08, 2009 1:37 pm
Location: Milan, Italy
Full name: Alex Brunetti

Re: New engine releases & news 2021

Post by Brunetti »

1'+1" rating ~1630.

Alex
User avatar
Lana Samson
Posts: 28
Joined: Wed Sep 08, 2021 3:36 am
Full name: Lana Samson

Re: New engine releases & news 2021

Post by Lana Samson »

using Syzygy for mating (which would be a weak player consulting a book to look up K+R vs K mate)
For me, I have no problem with Princhess requiring Syzygy at this point in time. There is a case to be made that an engine that doesn't require an egtb is stronger than one that does, but there's lots of things that could be improved about Princhess and how she plays without an egtb is not the most interesting thing for me right now. So I probably won't look at that for some time (altho hopefully some of the other improvements have a side effect of improving the no egtb play, even if it's not their goal).

I did run the following overnight (10+0.1, Princhess using Syzygy):

Code: Select all

all_versions_1  | Rank Name                          Elo     +/-   Games   Score    Draw
all_versions_1  |    1 Rustic Alpha 3.0.0            211      22    1000   77.1%   20.0%
all_versions_1  |    2 Princhess 0.4.0               -34      18    1000   45.1%   27.9%
all_versions_1  |    3 Princhess 0.3.1              -166      19    1000   27.8%   29.1%
So Princhess 0.4.0 doesn't seem to be a regression, but Rustic definitely has something over her. Perhaps there's something in its play that just hits a Princhess weakness consistently.
Princess Lana
---
Author of Princhess
Discord ID: Lana#4231
amanjpro
Posts: 883
Joined: Sat Mar 13, 2021 1:47 am
Full name: Amanj Sherwany

Re: New engine releases & news 2021

Post by amanjpro »

Lana Samson wrote: Sat Oct 16, 2021 5:08 pm
using Syzygy for mating (which would be a weak player consulting a book to look up K+R vs K mate)
For me, I have no problem with Princhess requiring Syzygy at this point in time. There is a case to be made that an engine that doesn't require an egtb is stronger than one that does, but there's lots of things that could be improved about Princhess and how she plays without an egtb is not the most interesting thing for me right now. So I probably won't look at that for some time (altho hopefully some of the other improvements have a side effect of improving the no egtb play, even if it's not their goal).

I did run the following overnight (10+0.1, Princhess using Syzygy):

Code: Select all

all_versions_1  | Rank Name                          Elo     +/-   Games   Score    Draw
all_versions_1  |    1 Rustic Alpha 3.0.0            211      22    1000   77.1%   20.0%
all_versions_1  |    2 Princhess 0.4.0               -34      18    1000   45.1%   27.9%
all_versions_1  |    3 Princhess 0.3.1              -166      19    1000   27.8%   29.1%
So Princhess 0.4.0 doesn't seem to be a regression, but Rustic definitely has something over her. Perhaps there's something in its play that just hits a Princhess weakness consistently.
Rustic doesn't have pruning (more or less), so the tree it searches is much wider than what Princhess searches, maybe that is it?
User avatar
flok
Posts: 606
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: New engine releases & news 2021

Post by flok »

Brunetti wrote: Sat Oct 16, 2021 2:47 pm
1'+1" rating ~1630.

Alex
That's more than I expected!

Code: Select all

   4 tscp-1.8.1c    :  1600.8  1807.5    2296    79
   5 CP-a8c074d     :  1307.8   308.0    1948    16
User avatar
Brunetti
Posts: 424
Joined: Tue Dec 08, 2009 1:37 pm
Location: Milan, Italy
Full name: Alex Brunetti

Re: New engine releases & news 2021

Post by Brunetti »

flok wrote: Sat Oct 16, 2021 8:09 pm That's more than I expected!

Code: Select all

   4 tscp-1.8.1c    :  1600.8  1807.5    2296    79
   5 CP-a8c074d     :  1307.8   308.0    1948    16
Maybe you used a time control where your engine is less performing, or that limited field of opponents is particularly bad for Caffeinated. My testings are based on a sort of Swiss system, and the resulting ratings look quite accurate (if you ask me, of course :)).
These are the current results:

Code: Select all

Rank  Elo   ±  Engine                                      Score Games  Wins Draws  Loss  Oppo time stal ille
-------------------------------------------------------------------------------------------------------------
   1. 1627  33 CaffeinatedPawn 46203a4 64-bit                44%   451   193    15   243   -51   0%  22%   0%
-------------------------------------------------------------------------------------------------------------
Results of CaffeinatedPawn 46203a4 64-bit
-------------------------------------------------------------------------------------------------------------
Rank  Elo  Engine                                        Score   Games   Wins  Draws Losses
-------------------------------------------------------------------------------------------------------------
   1. 2211 Madeleine 0.2                                     0%      1      0      0      1                  
   2. 2208 Blunder 6.1.0 64-bit                              0%      1      0      0      1                  
   3. 2195 Kurt 0.9.2.2 beta 64-bit                          0%      1      0      0      1                  
   4. 2192 DrunkenMaster 1.2                                 0%      1      0      0      1                  
   5. 2181 Tiny Chess 1.4.6 64-bit                           0%      1      0      0      1                  
   6. 2067 Bad Chess Engine 0.4.4-ucifix 64-bit              0%      1      0      0      1                  
   7. 2062 ProChess 102 A doctor                             0%      1      0      0      1                  
   8. 2043 Sapeli 2.0 64-bit                                 0%      1      0      0      1                  
   9. 2042 DelphiMax 2.9                                     0%      1      0      0      1                  
  10. 2032 Etude 0.11                                        0%      1      0      0      1                  
  11. 1961 Gunborg 1.0 64-bit                                0%      1      0      0      1                  
  12. 1936 Lila 3.0.1 64-bit                                33%      3      1      0      2                  
  13. 1930 Odonata 0.3.26 64-bit                             0%      4      0      0      4                  
  14. 1923 FreeChess 2.17b                                  25%      4      1      0      3                  
  15. 1921 Roce 0.039                                       13%      4      0      1      3                  
  16. 1911 Clueless 1.4                                      0%      5      0      0      5                  
  17. 1905 Rustic Alpha 3.0.0 64-bit                        20%      5      1      0      4                  
  18. 1904 Deep Chess r113 64-bit                           10%      5      0      1      4                  
  19. 1901 Minace 1.0                                       20%      5      1      0      4                  
  20. 1898 Snowy 0.1 64-bit                                 20%      5      1      0      4                  
  21. 1894 GearHeart 0.16 TP 64-bit                          0%      5      0      0      5                  
  22. 1892 Deepov 0.4 64-bit                                10%      5      0      1      4                  
  23. 1890 Predateur2.2.1                                   20%      5      1      0      4                  
  24. 1886 CDrill 1800 Build 4                              20%      5      1      0      4                  
  25. 1885 HeavyChess 0.13 beta                              0%      5      0      0      5                  
  26. 1878 Chareth 2020-06-28 64-bit                         0%      5      0      0      5                  
  27. 1877 Heracles 0.5.9                                   10%      5      0      1      4                  
  28. 1874 Sayuri 2018-05-23 64-bit                         20%      5      1      0      4                  
  29. 1869 Ziggy 0.7 64-bit                                  0%      5      0      0      5                  
  30. 1869 Hopper 20211004 64-bit                           20%      5      1      0      4                  
  31. 1866 Pigeon 1.5.1 64-bit                              20%      5      1      0      4                  
  32. 1857 Apollo 1.2.1 64-bit                              20%      5      1      0      4                  
  33. 1855 Zurichess Appenzeller 64-bit                      0%      5      0      0      5                  
  34. 1835 Silke Chess 1.2.1209                             17%      6      1      0      5                  
  35. 1827 BeRoChess 1.00.2012.04.10.0112 64-bit            67%      6      4      0      2                  
  36. 1809 Hopper 20211003 64-bit                           33%      6      2      0      4                  
  37. 1801 Hedwig rev25 64-bit                              29%      7      2      0      5                  
  38. 1774 Rocinante 2.0 64-bit                             36%      7      2      1      4                  
  39. 1770 Testina 3c                                       71%      7      5      0      2                  
  40. 1767 Celestial 1.0 64-bit                              7%      7      0      1      6                  
  41. 1760 RapChessCs 2021-07-22 64-bit                     14%      7      1      0      6                  
  42. 1759 Shallow Blue 2.0.0 64-bit                         0%      7      0      0      7                  
  43. 1738 Pwned 1.3 64-bit                                 56%      8      4      1      3                  
  44. 1733 Small-C 90.3                                     13%      8      1      0      7                  
  45. 1712 Crabby 1.00 64-bit                               38%      8      3      0      5                  
  46. 1705 FracTal 1.0 64-bit                               28%      9      2      1      6                  
  47. 1692 CaSpEr rev4 64-bit                               33%      9      2      2      5                  
  48. 1680 Walleye 1.4 64-bit                               22%      9      2      0      7                  
  49. 1669 Chancellor 1.00d                                 33%      9      3      0      6                  
  50. 1657 ArcBishop 1.00d                                  67%      9      6      0      3                  
  51. 1631 Pigeon 1.37 64-bit                               20%     10      2      0      8                  
  52. 1623 Cicada 0.1 64-bit                                55%     10      4      3      3                  
  53. 1617 Pulse 1.7.2 64-bit                               60%     10      6      0      4                  
  54. 1609 Conqueror 1.2                                    70%     10      7      0      3                  
  55. 1599 Koedem 1.1 64-bit                                44%      9      4      0      5                  
  56. 1586 SebLague Chess-AI 0.1 .Net5                      56%      9      5      0      4                  
  57. 1559 ChessEngine (SeminarEngine) 0.1-alpha 64-bit     63%      8      5      0      3                  
  58. 1553 DoctorB 1.2.1 64-bit                             50%      8      4      0      4                  
  59. 1549 Piranha 0.5                                      75%      8      6      0      2                  
  60. 1531 Jacky Chess 0.9.14 64-bit                        75%      8      6      0      2                  
  61. 1529 Chessier 0.1.0 64-bit                            88%      8      7      0      1                  
  62. 1526 Irina 0.15                                       88%      8      7      0      1                  
  63. 1508 Superpawn build 110 64-bit                       63%      8      5      0      3                  
  64. 1498 Walleye 1.3.0 64-bit                             71%      7      5      0      2                  
  65. 1470 Quokka 2.1 64-bit                                57%      7      4      0      3                  
  66. 1459 Lynx 5.2 64-bit                                 100%      6      6      0      0                  
  67. 1459 Madlenka 2015.02.13 DC 64-bit                    75%      6      4      1      1                  
  68. 1453 Lynx 5.1 64-bit                                  67%      6      4      0      2                  
  69. 1452 Hactar 0.9.0 64-bit                             100%      6      6      0      0                  
  70. 1451 Lynx 5.3 64-bit                                  86%      7      6      0      1                  
  71. 1445 Eden 0.0.13                                      83%      6      5      0      1                  
  72. 1427 Neocortex 1.0 64-bit                             83%      6      5      0      1                  
  73. 1426 Endamat 20210217 64-bit                          83%      6      5      0      1                  
  74. 1390 Overhead 64-bit                                  60%      5      3      0      2                  
  75. 1384 Sargon 1978 1.01                                100%      5      5      0      0                  
  76. 1372 KillerQueen 2 beta 3                             63%      4      2      1      1                  
  77. 1362 AXWChessbot 0.1.0 64-bit                        100%      5      5      0      0                  
  78. 1354 Haddock 0.2.1 64-bit                            100%      5      5      0      0                  
  79. 1331 Rumney 0.2.1 64-bit                              75%      4      3      0      1                  
  80. 1320 Belofte 2.1.3 64-bit                            100%      1      1      0      0 
  
The bad thing is 22% of crashes. Have you ever had any?

Alex