EN-Test 2022 - new testsuite
Moderator: Ras
-
- Posts: 75
- Joined: Wed Sep 15, 2021 8:50 pm
- Full name: Albert Einstein
Re: EN-Test 2022 - new testsuite
Shouldn't these test be made with only one core and cleared hash to get a deterministic result?
-
- Posts: 232
- Joined: Thu Aug 06, 2009 8:04 pm
- Location: UK
Re: EN-Test 2022 - new testsuite
I think the engine process needs to be restarted to clear all data and not just the hash tables. However, testing with one core is probably not useful unless that is what is typically used for analysis. If we want to see what is better for analysis with many cores, we need to test with that number of cores.DrEinstein wrote: ↑Sun Nov 20, 2022 8:42 pm Shouldn't these test be made with only one core and cleared hash to get a deterministic result?
I used a test set in order to try seeing how analysis differed with various numbers of threads, hash size, hyperthreading on and off, etc for some new hardware. The results often vary wildly for the same setup and it requires a large enough sample set to gain some accuracy. I also observed, as others have pointed out in the past, that it isn't always reliable to stop as soon as the key move is found. Sometimes many extra ply, or a meaningful enough evaluation, are required to make sure the move is indeed being chosen for the right reason and doesn't get dropped.
-
- Posts: 3652
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: EN-Test 2022 - new testsuite
I don't know any clone, that is stronger than SF dev. AFAIK there has never been! If clone is better in test positions it plays weaker, because of lower depth. Example Crystal:
Score of SFdev vs Crystal 5: 17 - 4 - 179 [0.532]
... SFdev playing White: 12 - 1 - 87 [0.555] 100
... SFdev playing Black: 5 - 3 - 92 [0.510] 100
... White vs Black: 15 - 6 - 179 [0.522] 200
Elo difference: 22.6 +/- 15.3, LOS: 99.8 %, DrawRatio: 89.5 %
200 of 200 games finished.
Score of SFdev vs Crystal 5: 17 - 4 - 179 [0.532]
... SFdev playing White: 12 - 1 - 87 [0.555] 100
... SFdev playing Black: 5 - 3 - 92 [0.510] 100
... White vs Black: 15 - 6 - 179 [0.522] 200
Elo difference: 22.6 +/- 15.3, LOS: 99.8 %, DrawRatio: 89.5 %
200 of 200 games finished.
Jouni
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
There are other reasons for me to build my own engine. I'm a book expert. My engine can use multiple Polyglot books. My engine can also learn. The advantage of a learning file is this: On a slow computer, the engine will play better if it can use a learning file that has been trained with 64 cores! I have two slower PCs. Also some other players on the server play very successfully with the use of the learning file. Even old computers with only 2 cores are difficult to defeat. This is even more fun than playing with super fast computers. 

-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
I am currently working on a new test set. Unfortunately, there are always positions where it seems that a special move wins very nicely, but the engine proves the opposite. What should I do with these two positions?
Position 1:
[fen]7k/8/7p/2p1p1pP/1pPpPpP1/1P1P1P2/4K3/2N5 w - - 0 1[/fen]
I found this position in a test suite. The solution move is Kd2! But then I analyzed further, and an engine now tells me that Kf1! also wins.
Position 2:
[fen]4K1k1/8/1p5p/1Pp3b1/8/1P3P2/P1B2P2/8 w - - 0 1[/fen]
In this position, the beautiful move f4 wins! Until recently, only special engines could find this move. But on a fast computer, new normal engines find this move in less than 60 seconds! I then analyzed a bit and an engine then said that Bd1 wins here too. After about 6 minutes the eval increased to +9 for Bd1! So it wins both f4 and Bd1.
As beautiful as the positions are, when there are two solutions, especially in the endgame, I can't expect an engine to look for a specific move.
Eduard
Position 1:
[fen]7k/8/7p/2p1p1pP/1pPpPpP1/1P1P1P2/4K3/2N5 w - - 0 1[/fen]
I found this position in a test suite. The solution move is Kd2! But then I analyzed further, and an engine now tells me that Kf1! also wins.
Position 2:
[fen]4K1k1/8/1p5p/1Pp3b1/8/1P3P2/P1B2P2/8 w - - 0 1[/fen]
In this position, the beautiful move f4 wins! Until recently, only special engines could find this move. But on a fast computer, new normal engines find this move in less than 60 seconds! I then analyzed a bit and an engine then said that Bd1 wins here too. After about 6 minutes the eval increased to +9 for Bd1! So it wins both f4 and Bd1.
As beautiful as the positions are, when there are two solutions, especially in the endgame, I can't expect an engine to look for a specific move.

Eduard
-
- Posts: 436
- Joined: Thu Aug 02, 2012 7:48 pm
- Location: Germany
Re: EN-Test 2022 - new testsuite
as for the 1st pos i looked at it 2 sec (without an engine) and im pretty sure that there does not exist a single move that does NOT wim !?
as for the 2nd pos i totaly agree
as for the 2nd pos i totaly agree
Wahrheiten sind Illusionen von denen wir aber vergessen haben dass sie welche sind.
-
- Posts: 3652
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: EN-Test 2022 - new testsuite
Just one note: Crystal result 113 average time 4s is much better than Leptir 115 average time 5s. Just calculate total used time.
Jouni
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
Info:
I am working on a new test suite that I will present in January 2023. I collect good test positions from previous test sets. But I have already implemented some positions that have never been published before! Currently there are 46 positions altogether. I have a large database with many interesting computer games. From this I have selected the best variants, about 2500 games. I've only checked 560 games out of it currently. It's fun to look for new positions.
I am working on a new test suite that I will present in January 2023. I collect good test positions from previous test sets. But I have already implemented some positions that have never been published before! Currently there are 46 positions altogether. I have a large database with many interesting computer games. From this I have selected the best variants, about 2500 games. I've only checked 560 games out of it currently. It's fun to look for new positions.
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
EN-Test 2022 - Starts on 06 Nov 2022:
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy.
New in the List now Brainlearn 20.1 vulkan (Download on my Homepage under Solista News, or here in the Shashchess thread).
Top 10:
1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
3-4) Brainlearn 20.1 vulkan (AM/EN), Result: 110 out of 120 = 91.6%. Brainlearn 20.1 vulkan.txt (ZIP)
3-4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)
Textfiles on my Homepage:
https://solistachess.jimdosite.com/testing/
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy.
New in the List now Brainlearn 20.1 vulkan (Download on my Homepage under Solista News, or here in the Shashchess thread).
Top 10:
1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
3-4) Brainlearn 20.1 vulkan (AM/EN), Result: 110 out of 120 = 91.6%. Brainlearn 20.1 vulkan.txt (ZIP)
3-4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)
Textfiles on my Homepage:
https://solistachess.jimdosite.com/testing/
-
- Posts: 1439
- Joined: Sat Oct 27, 2018 12:58 am
- Location: Germany
- Full name: N.N.
Re: EN-Test 2022 - new testsuite
Vulkan 021222: The engine is based on Brainlearn vulkan, but with a more aggressive playing style. The other changes are tuned for minimum 5min blitz, and even better for higher time controls. Those who love bullet distances, can forget this engine. In my EN-Test 2022, Vulkan 021222 is impressive, and solves with 60s 114 of 120 positions, and is now on second place!
Download (64-Bit Windows):
Filehorst.de:
https://filehorst.de/d/eqlbtmgI
Pixeldrain:
https://pixeldrain.com/u/GBgYWMg8
And on my homepage:
https://solistachess.jimdosite.com/solista-news/
New in the list: Vulkan 021222:
Starts on 06 Nov 2022
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy. Current TOP 10 LIST:
1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Vulkan 021222 (EN/AM), Result: 114 out of 120 = 95.0%. Vulkan 021222.txt (ZIP)
3) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)
Text files on my homepage
https://solistachess.jimdosite.com/testing/
Download (64-Bit Windows):
Filehorst.de:
https://filehorst.de/d/eqlbtmgI
Pixeldrain:
https://pixeldrain.com/u/GBgYWMg8
And on my homepage:
https://solistachess.jimdosite.com/solista-news/
New in the list: Vulkan 021222:
Starts on 06 Nov 2022
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy. Current TOP 10 LIST:
1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Vulkan 021222 (EN/AM), Result: 114 out of 120 = 95.0%. Vulkan 021222.txt (ZIP)
3) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)
Text files on my homepage
https://solistachess.jimdosite.com/testing/