EN-Test 2022 - new testsuite

DrEinstein · Post by **DrEinstein** » Sun Nov 20, 2022 8:42 pm

Shouldn't these test be made with only one core and cleared hash to get a deterministic result?

gordonr · Post by **gordonr** » Sun Nov 20, 2022 8:57 pm

DrEinstein wrote: ↑Sun Nov 20, 2022 8:42 pm Shouldn't these test be made with only one core and cleared hash to get a deterministic result?

I think the engine process needs to be restarted to clear all data and not just the hash tables. However, testing with one core is probably not useful unless that is what is typically used for analysis. If we want to see what is better for analysis with many cores, we need to test with that number of cores.

I used a test set in order to try seeing how analysis differed with various numbers of threads, hash size, hyperthreading on and off, etc for some new hardware. The results often vary wildly for the same setup and it requires a large enough sample set to gain some accuracy. I also observed, as others have pointed out in the past, that it isn't always reliable to stop as soon as the key move is found. Sometimes many extra ply, or a meaningful enough evaluation, are required to make sure the move is indeed being chosen for the right reason and doesn't get dropped.

Jouni · Post by **Jouni** » Mon Nov 21, 2022 9:16 am

I don't know any clone, that is stronger than SF dev. AFAIK there has never been! If clone is better in test positions it plays weaker, because of lower depth. Example Crystal:

Score of SFdev vs Crystal 5: 17 - 4 - 179 [0.532]
... SFdev playing White: 12 - 1 - 87 [0.555] 100
... SFdev playing Black: 5 - 3 - 92 [0.510] 100
... White vs Black: 15 - 6 - 179 [0.522] 200
Elo difference: 22.6 +/- 15.3, LOS: 99.8 %, DrawRatio: 89.5 %
200 of 200 games finished.

Eduard · Post by **Eduard** » Mon Nov 21, 2022 6:11 pm

There are other reasons for me to build my own engine. I'm a book expert. My engine can use multiple Polyglot books. My engine can also learn. The advantage of a learning file is this: On a slow computer, the engine will play better if it can use a learning file that has been trained with 64 cores! I have two slower PCs. Also some other players on the server play very successfully with the use of the learning file. Even old computers with only 2 cores are difficult to defeat. This is even more fun than playing with super fast computers.

Eduard · Post by **Eduard** » Tue Nov 22, 2022 6:36 pm

I am currently working on a new test set. Unfortunately, there are always positions where it seems that a special move wins very nicely, but the engine proves the opposite. What should I do with these two positions?

Position 1:

[fen]7k/8/7p/2p1p1pP/1pPpPpP1/1P1P1P2/4K3/2N5 w - - 0 1[/fen]

I found this position in a test suite. The solution move is Kd2! But then I analyzed further, and an engine now tells me that Kf1! also wins.

Position 2:

[fen]4K1k1/8/1p5p/1Pp3b1/8/1P3P2/P1B2P2/8 w - - 0 1[/fen]

In this position, the beautiful move f4 wins! Until recently, only special engines could find this move. But on a fast computer, new normal engines find this move in less than 60 seconds! I then analyzed a bit and an engine then said that Bd1 wins here too. After about 6 minutes the eval increased to +9 for Bd1! So it wins both f4 and Bd1.

As beautiful as the positions are, when there are two solutions, especially in the endgame, I can't expect an engine to look for a specific move.

Eduard

Spliffjiffer · Post by **Spliffjiffer** » Tue Nov 22, 2022 7:27 pm

as for the 1st pos i looked at it 2 sec (without an engine) and im pretty sure that there does not exist a single move that does NOT wim !?

as for the 2nd pos i totaly agree

Jouni · Post by **Jouni** » Wed Nov 23, 2022 10:35 pm

Just one note: Crystal result 113 average time 4s is much better than Leptir 115 average time 5s. Just calculate total used time.

Eduard · Post by **Eduard** » Fri Nov 25, 2022 6:20 am

Info:
I am working on a new test suite that I will present in January 2023. I collect good test positions from previous test sets. But I have already implemented some positions that have never been published before! Currently there are 46 positions altogether. I have a large database with many interesting computer games. From this I have selected the best variants, about 2500 games. I've only checked 560 games out of it currently. It's fun to look for new positions.

Eduard · Post by **Eduard** » Thu Dec 01, 2022 11:20 am

EN-Test 2022 - Starts on 06 Nov 2022:
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy.

New in the List now Brainlearn 20.1 vulkan (Download on my Homepage under Solista News, or here in the Shashchess thread).

Top 10:

1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
3-4) Brainlearn 20.1 vulkan (AM/EN), Result: 110 out of 120 = 91.6%. Brainlearn 20.1 vulkan.txt (ZIP)
3-4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)

Textfiles on my Homepage:
https://solistachess.jimdosite.com/testing/

Eduard · Post by **Eduard** » Fri Dec 02, 2022 9:23 pm

Vulkan 021222: The engine is based on Brainlearn vulkan, but with a more aggressive playing style. The other changes are tuned for minimum 5min blitz, and even better for higher time controls. Those who love bullet distances, can forget this engine. In my EN-Test 2022, Vulkan 021222 is impressive, and solves with 60s 114 of 120 positions, and is now on second place!

Download (64-Bit Windows):

Filehorst.de:
https://filehorst.de/d/eqlbtmgI
Pixeldrain:
https://pixeldrain.com/u/GBgYWMg8

And on my homepage:
https://solistachess.jimdosite.com/solista-news/

New in the list: Vulkan 021222:

Starts on 06 Nov 2022
I have now started testing with a thinking time of 60 seconds. Only the top ten engines are included in this list, and only one engine by the same author. Computer AMD Ryzen 3900X, 20 Threads, Hash 4 GB, all 3456men Syzygy. Current TOP 10 LIST:

1) Leptir 4, Result: 115 out of 120 = 95.8%. Leptir 4.txt (ZIP)
2) Vulkan 021222 (EN/AM), Result: 114 out of 120 = 95.0%. Vulkan 021222.txt (ZIP)
3) Crystal 5 KWK, Result: 113 out of 120 = 94.1%. Crystal 5 KWK.txt (ZIP)
4) Corchess 3 201122, Result: 110 out of 120 = 91.6%. Corchess 3 201122.txt (ZIP)
5) Blue Marlin 15.4, Result: 107 out of 120 = 89.1%. Blue Marlin 15.4.txt (ZIP)
6-8) Dark Sister 1.9a, Result: 106 out of 120 = 88.3%. Dark Sister 1.9a.txt (ZIP)
6-8) Kayra 1.7, Result: 106 out of 120 = 88.3%. Kayra 1.7.txt (ZIP)
6-8) ProteusSF-Piranha 220904, Result: 106 out of 120 = 88.3%. ProteusSF-Piranha 220904.txt (ZIP)
9-10) Stockfish dev 231122, Result: 105 out of 120 = 87.5%. Stockfish dev 231122.txt (ZIP)
9-10) Shashchess 25.3 GoldDi, Result: 105 out of 120 = 87.5%. ShashChess 25.3GoldDigger.txt (ZIP)

Text files on my homepage
https://solistachess.jimdosite.com/testing/

EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite

Re: EN-Test 2022 - new testsuite