EN-Test 2022 - new testsuite

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Vinvin
Posts: 5298
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: EN-Test 2022 - new testsuite

Post by Vinvin »

Is Leptir private as your previous engine ?
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

Leptir 2.1 (Leptir = butterfly)
06 Nov 22: Private Engine. It cannot be bought. It's free for my friends. Stockfish derivative Engine for Rapid Chess games and intensive analyses. Active playing style with excellent balance between positional and tactical motifs. Less selective than Solista Attack v4. Expanded Options: MinimumThinkingTime, Polyglot Books. Updated to Normalize evaluation.
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

I would like to emphasize again that I am planning a new test for 2023. Maybe then a few more positions, with fewer side solutions and maybe a little more difficult. Still, I'm not dissatisfied with this test suite. When I sorted the individual positions, I didn't only think of Stockfish and its clones, nor of special Mate solver engines. My plan had been to test as many engines as possible. I think I succeeded. Had the test been more difficult, the weaker engines would have solved even fewer positions.

I have just tested the new Corchess from 061122. It's nice to see that this engine now solves 2 positions more than before. Still, I see some black spots. Some positions are now not solved that were solved some time ago.

Example (position 44):

[fen]4q1kr/p6p/1prQPppB/4n3/4P3/2P5/PP2B2P/R5K1 w - - 0 1[/fen]


44. EN 044 (Gusev), Qxe5 > 60s.

Analysis by CorChess 3 061122:

1.Qxe5
= (0.10 ++) Depth: 45/36 00:01:48 1992MN, tb=180774
The position is equal

Certainly this position is not really important, only: this position was solved a few months ago, not anymore. I don't like such engines, sorry. My engine needs to be able to solve such positions (that Stockfish could solve in the past) today. Corchess didn't make it to 20 threads in 60s today.

Analysis by Solista Attack v5 Beta:

1.Qxe5 fxe5 2.Rf1 Qe7 3.Bd1 b5 4.Bb3 Rc4 5.Kg2 Qe8 6.h3 Qe7 7.Rf3 Qe8 8.Kh2 Qe7 9.Rf2 Qe8 10.Rf1 Qe7 11.a4 a6 12.Kh1 Qe8 13.axb5 axb5 14.Rf6 Qe7 15.Rf3 Qd8 16.Rf7 Qe8 17.Rg7+ Kf8 18.Ra7+ Kg8 19.Bxc4 bxc4 20.Rg7+ Kf8 21.Rb7+ Kg8
+/- (1.10) Depth: 31/48 00:00:02 34375kN, tb=6
White is better

I would like to see that my analysis engine can solve such a position. The engine doesn't play weaker if I improve this!

Then there are positions that are solved after a few seconds, such as this position:

Position 77:

[fen]2b5/1pr4p/3bp1pk/1p6/1PpN2PP/K1P1n3/P3N1R1/3R4 w - - 0 1[/fen]


EN 077 (Mihai ENeghina), Nxb5 Solved in 4.94s



...but after that such positions as position 79:


[fen]3r2k1/p4pP1/1ppr4/5Bp1/P2qPnQP/4R3/5P2/5RK1 w - - 0 1[/fen]


79. EN 79 (Tauber vs Sikorsky), e5 > 60s.

not to be solved in 60s (at least on 20 threads).

Differences like that bother me. I would much prefer if position 77 was solved in 30s (instead of 5s), but position 79 was solved too.

Those are my thoughts. This is also the reason why I started to deal more intensively with the engines.
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy. Current TOP 10 LIST:

New: Kayra 1.7 and Leptir 4.

1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
3) Corchess 3 061122, Result: 109 out of 120 = 90.8%. Corchess 3 061122.txt (ZIP)
4) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
5-7) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
5-7) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
5-7) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
8) Shashchess 25.3 GoldDigger, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)
9) Eman 8.40, Result: 101 out of 120 = 84.1%. Eman 8.40.txt (ZIP)
10) Stockfish dev 051122, Result: 101 out of 120 = 84.1%. Stockfish 051122.txt (ZIP)

TXT files on my homepage.

https://solistachess.jimdosite.com/testing/

New Nr. 1 is now Leptir 4. Only private, but Engine plays live on PlayChess, Anyone can watch.

3 nice games by Leptir 4:

[Event "Rated game, 5 min"]
[Site "Engine Room"]
[Date "2022.11.15"]
[Round "?"]
[White "Ivers, Lc0 v0.29.0-rc0"]
[Black "Solista, Leptir 4"]
[Result "0-1"]
[ECO "C54"]
[WhiteElo "2945"]
[BlackElo "2949"]
[Annotator "-0.08;-0.28"]
[PlyCount "178"]
[EventDate "2022.11.15"]
[SourceTitle "playchess.com"]
[TimeControl "300"]

{Lc0 v0.29.0-rc0 (2 cores): 19.7 plies; 87kN/s AMD Ryzen 9 5950X 16-Core Processor 3394MHz, (16 cores, 32 threads), Solista Attack v3.2.ctg, 2048 MB} 1. e4 {B 0} e5 {B 0} 2. Nf3 {B 0} Nc6 {B 0} 3. Bc4 {B 0} Bc5 {B 0} 4. O-O {B 0} d6 {B 0} 5. c3 {B 0} Bb6 {B 0} 6. d3 {B 0} Nf6 {B 0} 7. Bg5 {B 0} h6 {B 0} 8. Bh4 {B 0} g5 {B 0} 9. Bg3 {B 0} Ne7 {B 0} 10. Na3 {B 0} Ng6 {-0.28/34 14} 11. Bb3 {-0.08/17 12} c6 {-0.39/32 7} 12. Nc4 {-0.08/19 1} Bc7 {-0.24/28 2} 13. a4 {-0.07/19 6 (d4)} Qe7 {-0.46/29 4} 14. Ne3 {-0.09/19 1} Be6 {-0.23/33 42} 15. a5 {-0.11/23 31} a6 {-0.31/33 3} 16. Re1 {-0.10/36 0} O-O-O {-0.28/29 4} 17. d4 {-0.09/35 0} Kb8 {-0.23/30 5} 18. d5 {-0.09/32 1 (Bxe6)} cxd5 {-0.21/31 7} 19. exd5 {-0.07/30 0} Bd7 {-0.25/28 3} 20. Nc4 {-0.03/27 4 (Bc2)} Bg4 {-0.41/26 7 (Bc8)} 21. Ba4 {0.06/23 14} Bxf3 {0.00/33 15 (Bd7)} 22. gxf3 {-0.38/32 6 (Qxf3)} h5 {-0.55/29 4} 23. b4 {-0.39/38 6 (Rb1)} h4 {-0.62/30 5} 24. Rb1 {-0.38/36 0} hxg3 {-0.97/32 20} 25. fxg3 {-0.37/41 0} Rc8 {-0.98/25 2 (b5)} 26. b5 {-0.45/23 17} Bd8 {-1.08/32 0} 27. bxa6 {-0.53/26 10} Rxc4 {-1.24/31 0} 28. Rxb7+ {-0.68/23 13 (Re2)} Qxb7 {-1.52/27 5} 29. axb7 {-0.74/19 0} e4 {-1.55/25 3} 30. Qb3 {-0.83/21 4} Rc5 {-1.57/29 2} 31. fxe4 {-0.86/19 2} Ng4 {-1.60/27 2} 32. Qb1 {-0.93/19 7 (Bb5)} N6e5 {-2.21/29 5 (Rxa5)} 33. Bd1 {-0.87/23 11} Rxh2 {-2.62/31 0 (Rxa5)} 34. Qb4 {-1.57/22 19 (Bxg4)} Bxa5 {-2.88/27 3} 35. Qa4 {-1.66/22 0} Rh8 {-2.91/30 4} 36. Bxg4 {-1.79/22 0} Nxg4 {-2.93/32 2} 37. Qd7 {-1.90/27 8 (Rc1)} Bxc3 {-3.04/32 7} 38. Qxd6+ {-2.02/29 0} Rc7 {-3.14/29 3} 39. Qa3 {-2.02/30 0 (Rd1)} Bd4+ {-3.22/27 3} 40. Kf1 {-2.06/29 0} Rh1+ {-3.44/28 2} 41. Ke2 {-2.18/1 0} Rh2+ {-3.88/28 3} 42. Kd3 {-2.23/26 2} Rc3+ {-4.03/25 1} 43. Qxc3 {-2.29/26 2} Bxc3 {-4.06/28 1} 44. Kxc3 {-2.32/24 2} Kxb7 {-4.29/26 1} 45. e5 {-2.42/22 3} Kc7 {-4.37/26 1} 46. e6 {-2.59/22 42} fxe6 {-7.68/29 0} 47. Re4 {-2.49/20 11} Nf6 {-9.73/27 0 (Nf2)} 48. Rxe6 {-2.39/20 1} Nxd5+ {-10.93/25 3 (Rf2)} 49. Kd4 {-2.86/15 6} Kd7 {-15.41/27 2} 50. Re5 {-3.18/16 1} Nc7 {-90.77/35 1} 51. Ra5 {-3.37/16 2} Rg2 {-90.99/36 0} 52. Ke3 {-4.39/16 6} Ne6 {-91.07/44 0} 53. Kf3 {-4.84/14 5 (Ra7+)} Rd2 {-91.10/50 3 (Rc2)} 54. Kg4 {-4.82/13 3} Ke7 {-91.10/50 0} 55. Kf5 {-5.34/14 1} Rf2+ {-91.11/48 1} 56. Kg6 {-5.86/14 0} Rf6+ {-91.12/49 17} 57. Kh5 {-7.30/14 0} Rf3 {-91.13/49 7} 58. Kg4 {-7.91/12 0} Rf2 {-91.13/47 8} 59. Kh5 {-8.23/11 1 (Ra3)} Kf6 {-91.14/44 7} 60. Ra1 {-9.00/11 5} Ng7+ {-91.14/44 9} 61. Kg4 {-11.40/1 0} Re2 {-91.15/41 3} 62. Ra6+ {-11.90/10 2} Re6 {-91.16/40 2} 63. Ra8 {-12.89/9 2} Re4+ {-91.16/39 0} 64. Kf3 {-16.46/8 1} Rc4 {-91.17/38 2 (Rb4)} 65. Ra6+ {-14.40/7 1} Ne6 {-91.18/37 0} 66. Rb6 {-15.68/7 1 (Rd6)} Kf5 {-91.20/34 1 (g4+)} 67. Rb5+ {-17.59/7 0 (g4+)} Nc5 {-91.20/34 1} 68. g4+ {-23.46/7 0} Ke5 {-91.21/33 2 (Ke6)} 69. Ke3 {-15.04/7 0 (Kg3)} Rc3+ {-91.21/37 2 (Kd5)} 70. Kd2 {-17.36/7 0} Rg3 {-91.22/36 3 (Kd4)} 71. Ke2 {-16.81/2 0} Kd5 {-91.23/34 3} 72. Ra5 {-21.01/7 0 (Rb4)} Rxg4 {-91.24/34 1} 73. Kf3 {-15.86/6 0} Rb4 {-1/0 0 (Rc4)} 74. Kg3 {-13.43/5 0} g4 {-10/1 0 (Rf4)} 75. Ra7 {-6.95/6 0} Nb7 {-7/0 0 (Nd3)} 76. Ra8 {-12.79/7 0} Nd6 {-6/1 0} 77. Rf8 {-9.33/1 0} Ke5 {-3/1 0} 78. Kh4 {-9.69/1 0} Re4 {-2/1 0 (Nf5+)} 79. Rg8 {-10.13/1 0} Nf5+ {-1/1 0} 80. Kg5 {-15.24/1 0} g3 {-2/1 0} 81. Re8+ {-19.14/1 0} Ne7 {-1/1 0 (Kd4)} 82. Ra8 {-21.60/1 0} g2 {-5/1 0} 83. Ra1 {-24.75/1 0} Nd5 {-4/0 0} 84. Rg1 {-36.00/1 0} Nf4 {-3/1 0 (Ne3)} 85. Kh6 {-30.45/1 0} Kf6 {-2/1 0} 86. Ra1 {-33.56/1 0} Ne6 {-1/1 0} 87. Ra5 {-#105/1 0} g1=Q {-1/1 0 (Rh4+)} 88. Rf5+ {-#104/1 0} Kxf5 {-1/1 0} 89. Kh7 {-125.99/1 0} Rh4# {-#1/0 0} 0-1

[Event "Rated game, 16 min"]
[Site "Engine Room"]
[Date "2022.11.15"]
[Round "?"]
[White "Solista, Leptir 4"]
[Black "Abcom, Lc0 v0.29.0-rc0"]
[Result "1-0"]
[ECO "A28"]
[WhiteElo "2425"]
[BlackElo "2458"]
[Annotator "0.14;0.08"]
[PlyCount "80"]
[EventDate "2022.11.15"]
[SourceTitle "playchess.com"]
[TimeControl "960"]

{Lc0 v0.29.0-rc0 (36 threads): 13.9 plies; 21kN/s Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz 2592MHz, (18 cores, 36 threads), Solista Attack v3.2.ctg, 2048 MB} 1. c4 {B 0} e5 {B 0} 2. Nc3 {B 0} Nf6 {B 0} 3. Nf3 {B 0} Nc6 {B 0} 4. e4 {B 0} Bb4 {B 0} 5. d3 {B 0} d6 {B 0} 6. a3 {B 0} Bc5 {B 0} 7. Be3 {B 0} Bb6 {B 0} 8. Be2 {0.14/46 61} Nd4 {0.08/12 80 (Bg4)} 9. Bxd4 {0.20/33 8} Bxd4 {0.09/15 6} 10. Nxd4 {0.14/36 11} exd4 {0.09/15 2} 11. Nd5 {0.16/35 10} Nxd5 {0.09/14 47 (Nd7)} 12. cxd5 {0.15/38 13} c6 {0.08/15 3 (Bd7)} 13. Qa4 {0.18/33 9} Qb6 {0.09/15 5} 14. Qb4 {0.13/31 7} c5 {0.09/14 19 (Ke7)} 15. Qd2 {0.12/33 22} a5 {0.08/14 7 (0-0)} 16. f4 {0.14/36 56} Bd7 {0.08/13 26} 17. O-O {0.07/34 0} O-O {0.09/13 7} 18. Rae1 {0.18/32 4} Rac8 {0.10/13 52 (c4)} 19. Bd1 {0.23/34 32} f6 {0.10/11 1} 20. b3 {0.16/39 7} Ra8 {0.09/12 69 (Qa6)} 21. h3 {0.18/37 52 (h4)} h6 {0.09/12 1 (a4)} 22. a4 {0.39/32 14 (h4)} Qb4 {0.14/14 20 (Qa6)} 23. Qe2 {0.32/35 11} Rae8 {0.17/17 24} 24. Qf3 {0.22/42 14 (Qf2)} b5 {0.01/16 18} 25. axb5 {0.19/37 0 (Qg3)} Bxb5 {0.02/13 18 (Qxb5)} 26. Qg3 {0.50/34 12} Re7 {0.02/12 1 (Qc3)} 27. e5 {0.85/33 14 (Bg4)} Qd2 {0.40/21 41} 28. exf6 {0.99/37 0} Ref7 {0.39/23 7} 29. fxg7 {0.94/38 24} Rxg7 {0.47/28 0} 30. Bg4 {0.71/40 40} Qxd3 {0.44/22 1 (h5)} 31. Rf3 {1.01/33 12 (Qh4)} Qd2 {0.24/18 43} 32. Qh4 {1.24/37 7} Bd7 {0.49/31 48 (Be2)} 33. Re6 {2.37/32 14} Bxe6 {0.57/24 1} 34. dxe6 {2.92/30 29} Qc2 {0.49/39 5 (Qd1+)} 35. f5 {3.60/31 17} Qc1+ {0.48/40 3} 36. Kh2 {3.84/34 11} Qg5 {0.46/39 10} 37. Qxg5 {3.94/32 21} Rxg5 {0.46/39 3} 38. f6 {4.18/31 10} Rb8 {0.45/37 4} 39. Bf5 {4.35/31 8 (Rf5)} a4 {1.44/20 67 (c4)} 40. bxa4 {5.45/29 13} c4 {1.84/21 6 (d3) Abcom,Lc0 v0.29.0-rc0 resigns (Lag: Av=0.22s, max=0.7s)} 1-0

[Event "5 min, rated"]
[Site "Engine Room"]
[Date "2022.11.17"]
[Round "?"]
[White "Auryn, Dark Sister 1.9b"]
[Black "A2-a3, Leptir 4-avx2"]
[Result "0-1"]
[ECO "D05"]
[WhiteElo "2942"]
[BlackElo "2951"]
[Annotator "0.00;-0.11"]
[PlyCount "66"]
[EventDate "2022.11.17"]
[EventType "blitz"]
[TimeControl "300"]

1. c3 {B 0} d5 {B 0} 2. d4 {B 0} Nf6 {B 0} 3. e3 {B 0} e6 {B 0} 4. Nf3 {B 0} Bd6 {-0.11/34 23} 5. c4 {B 0 (Bd3)} b6 {B 0} 6. b3 {B 0} Qe7 {B 0} 7. Nc3 {B 0} Bb7 {B 0} 8. Bd3 {B 0} Nbd7 {B 0} 9. O-O {0.00/30 0} e5 {B 0} 10. Be2 {0.17/27 0} e4 {B 0} 11. Nd2 {0.30/28 0} a6 {B 0} 12. cxd5 {0.26/28 0} Bb4 {B 0} 13. Qc2 {0.16/26 0} Bxc3 {B 0} 14. Qxc3 {0.00/30 0} Nxd5 {B 0} 15. Qc2 {0.00/33 0} f5 {B 0} 16. Nc4 {0.00/30 0} O-O-O {B 0} 17. Bd2 {0.00/33 0} Kb8 {B 0} 18. a4 {0.00/32 0} Rhf8 {0.00/32 5} 19. Rfc1 {0.00/32 0} Rf6 {-0.25/33 24} 20. b4 {0.32/26 5} Rdf8 {B 0 (f4)} 21. b5 {0.00/28 12} a5 {B 0} 22. Bf1 {-0.12/30 23} Rh6 {0.00/32 14 (De6)} 23. Bxa5 {0.00/28 22 (g3)} f4 {-1.49/29 11 (bxa5)} 24. Be1 {-0.15/23 9 (exf4)} f3 {-3.28/24 5 (Tff6)} 25. Nd2 {-2.07/32 34} Qh4 {-4.21/30 3} 26. h3 {-4.22/27 10} fxg2 {-4.37/28 0} 27. Bxg2 {-5.23/25 17} Rg6 {-4.47/30 8} 28. Qxe4 {-5.49/23 3} Qh5 {-4.47/27 1} 29. Rc6 {-6.02/26 24 (a5)} Nc3 {-4.64/26 5} 30. Rxc3 {-6.07/25 6} Bxe4 {-4.64/30 3} 31. Nxe4 {-6.56/23 4} Rxg2+ {-4.81/29 2} 32. Kxg2 {-6.67/25 4} Qf3+ {-4.92/29 4} 33. Kh2 {-7.00/27 24} Qxe4 {-5.08/33 0 Auryn,Dark Sister 1.9b abbandona (Lag: Av=0.30s, max=0.9s)} 0-1
DrEinstein
Posts: 75
Joined: Wed Sep 15, 2021 8:50 pm
Full name: Albert Einstein

Re: EN-Test 2022 - new testsuite

Post by DrEinstein »

The engine 'of' an amateur programmer in first place and no reaction of any professional SF developer.
WHY?
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

Unfortunately, the professional (Stockfish)programmers only test with bullet games. On the one hand they are geniuses, on the other hand they are satisfied with bullet games. Everything stands or falls with whether the Bullet is won or not.

Example "fishcooking":
https://tests.stockfishchess.org/tests/ ... 7a36a6698f

LR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 198520 W: 52673 L: 52632 D: 93215
Ptnml(0-2): 645, 22241, 53499, 22178, 697

num_games 800000
tc 10+0.1
new_tc 10+0.1
threads 1

Changed code:

// Increase reduction if next ply has a lot of fail high
- if ((ss+1)->cutoffCnt > 3 && !PvNode)
+ if ((ss+1)->cutoffCnt > 3)
r++;

ss->statScore = 2 * thisThread->mainHistory[us][from_to(move)]
---------------------------------------------------------------------------

I take note of the bullet games, but that's not enough for me. In my test, Stockfish dev only solves 101 positions, but Leptir 4 solves 115. That alone wouldn't be enough for me. The engine also plays great live on the server where Ponder ON is also played.
DrEinstein
Posts: 75
Joined: Wed Sep 15, 2021 8:50 pm
Full name: Albert Einstein

Re: EN-Test 2022 - new testsuite

Post by DrEinstein »

Thanks for your answer. But what is (would be) the answer of the SFdev side?
I'm quite sure that the one or the other still reads here from time to time.
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

There is Discord. There are many thousands of Stockfish fans and programmers, there discussing on many channels.

What would be the answer?

Example:
4q1kr/p6p/1prQPppB/4n3/4P3/2P5/PP2B2P/R5K1 w - - 0 1

It's not difficult to program Stockfish so that the solution comes immediately. Then why would that be bad? Then such a version would win fewer bullet games, and thus be weaker than the Master Version, which is a giant in bullet play. :x
DrEinstein
Posts: 75
Joined: Wed Sep 15, 2021 8:50 pm
Full name: Albert Einstein

Re: EN-Test 2022 - new testsuite

Post by DrEinstein »

I certainly know the SF discord channels.
So your assumption is, that Stockfish has to be optimized for each TC separately? Hm....Maybe I will ask this on discord but do I also get an answer here from any SFdev or SF expert?
Eduard
Posts: 1439
Joined: Sat Oct 27, 2018 12:58 am
Location: Germany
Full name: N.N.

Re: EN-Test 2022 - new testsuite

Post by Eduard »

The basic question is: What do I need my engine for?

I need my engine for home analysis of normal chess openings. I play Freestyle tournaments and need an engine for deep analysis. I like playing engine prize tournaments where the time control is at least 12 minutes plus bonus time.

Everyone can now do whatever they want, ok! Why are you constantly nagging at what is supposed to be better than the other? For me, my engine is clearly better for analysis and better for longer time controls, but strong enough to be the best engine on fast hardware even in Blitz. We play with 12 to 64 cores, and not bullet on one core. Leptir is less selective than Stockfish. This is also the reason why this engine finds more solutions than any other engine! That is the secret. In the game above (against Dark Sister, see PGN) Leptir 4 played on 30 threads and about 35.000 kns. Another friend is playing with 100.000 kns. I have only 20.000 kns.