A big SF15 experiment

Werewolf · Post by **Werewolf** » Tue Apr 26, 2022 3:29 pm

Take or leave this as you wish:

I've been working on tactical testsuites for well over 10 years and lately it has become harder and harder to find challenging positions. But after much work over the last year, and with help from other testsuites

I did manage 100 hard positions, checked many times over for cooks in Aquarium IDeA.
It's currently private, but you'll have seen 70% of the positions before.

Then, I got hold of 3 powerful machines:

1) Threadripper 3990X 64 cores / 128 threads (SF15 running on 60 threads, 70-90 MN/s averaged)
2) Cloud rental SF15 Cluster 512 cores (the company claims 1 BN/s, in reality 800 MN/s is closer)
3) Cloud rental SF15 Cluster 1024 cores (the company claims 2 BN/s, in reality 1.6 BN/s is closer)

Each contender got 2 mins/position on the Testsuite with the following results:

1. SF Cluster "2 BN/s" 96/100 average solve time 3.10 seconds
2. SF Cluster "1 BN/s" 92/100 average solve time 4.49 seconds
3. Threadripper 3990X 90/100 average solve time 5.86 seconds

The difficulty of a few positions disguises how far apart these scores are.

So far, nothing too surprising.

Then I decided to give the Threadripper 16x as much time (32 mins per position) and it scored
94/100 with an average solve time of 59 seconds.

The thing I'm surprised by is that despite the vast difference in threads (and with that associated search efficiency loss, even with Lazy SMP?) the results are surprisingly dependent on the total number of nodes searched! Threadripper on 16x more time searched a total number of nodes between the two cluster versions at 2 mins/position and its testsuite score was in line with that. I am quite surprised by this.

Also time to depth was pretty meaningless: the cluster versions were only a few plies higher and d=35 on a cluster means something much more than d=35 on a PC I think.

Jouni · Post by **Jouni** » Tue Apr 26, 2022 6:09 pm

Easy to extrapolate, that "4 BN/s" scores 100/100 and chess is solved

. Seriously what's ShashChess score in weak machine?

Werewolf · Post by **Werewolf** » Tue Apr 26, 2022 11:03 pm

Jouni wrote: ↑Tue Apr 26, 2022 6:09 pm Easy to extrapolate, that "4 BN/s" scores 100/100 and chess is solved . Seriously what's ShashChess score in weak machine?

The main purpose of the exercise was to measure the search speed-up of SF15, not to see who is the best tactician. However, after you suggested Shashchess I did test it on the Threadripper:

Shashchess 21.1 89/100 average solve time 4.52 seconds.

The best tactician, by far, that I have tested is Black Diamond 11 with a special net.

Jouni · Post by **Jouni** » Wed Apr 27, 2022 8:20 am

Thanks for test. But I mean with special setting like one test here:

ShashChess21.1-GoldDigger 94/128
Stockfish15 87/128

peter · Post by **peter** » Wed Apr 27, 2022 6:02 pm

Jouni wrote: ↑Wed Apr 27, 2022 8:20 am Thanks for test. But I mean with special setting like one test here:

ShashChess21.1-GoldDigger 94/128
Stockfish15 87/128

You mean some like this one?

https://forum.computerschach.de/cgi-bin ... #pid154529

You saw CorChess too and that it's run with 15"/pos. only?

BTW, the postions are linked within the posting too.
Had 222 before at most of my trials, but it showed up, the 128 to be more discriminating as for Elo, of course error bar is higher too with fewer but harder positions. Ponder about adding Marek's 100 from his IQ- test or at least some of them to get up to 222 again or with some more out of my problem- database to have 250 maybe, of about that average difficulty.
Too difficult ones only (e.g. composed studies) aren't as well for a broader range of engines and short TC again regards

Jouni · Post by **Jouni** » Wed Apr 27, 2022 7:55 pm

Hi I follow you tests in CSS forum. But in my tests ShashChess 20.2 in totally unbeaten! Scores 91/110 in HTC110 - CorChess only 80.

peter · Post by **peter** » Wed Apr 27, 2022 8:05 pm

Jouni wrote: ↑Wed Apr 27, 2022 7:55 pm Hi I follow you tests in CSS forum. But in my tests ShashChess 20.2 in totally unbeaten! Scores 91/110 in HTC110 - CorChess only 80.

Afair 20.2 (this one didn't have Gold Digger option but Tal without the other GM- options, hadn't it?) indeed had the one or the other one solution more, now there's 21.1 only version in my list. CorChess has, if you have a closer look, indeed 2 solutions less but it gets the edge for 5 Elo (only about half the error bar, so not significant) being a little faster as for all measured time- values. That's the biggest advantage of EloStat (Frank Schubert) to me, not numbers of solutions only are counted. And you get nice tables of results for as many engines as you compare to each other, the more runs, the lower error bar, especially for those with many similiarly strong engines at the top of the list regards

A big SF15 experiment

A big SF15 experiment

Re: A big SF15 experiment

Re: A big SF15 experiment

Re: A big SF15 experiment

Re: A big SF15 experiment

Re: A big SF15 experiment

Re: A big SF15 experiment