The Strategic Test Suite by Swaminathan and Corbit
https://sites.google.com/site/strategic ... e/sts-stat
had an own program made by Philippe Gailhac for evaluating it (see the site of the link), multiple solutions of strategically meaningful positions had to be judged with different amounts of points instead of single best moves only.
Pity most of the positions don't stand evaluation with modern engines as for the solutions and points anymore, stored in the 15 blocks of 100 positions each, thematically sorted, even if many of the positions are still worth trying as for positional testing of the kind, the suite was meant to be used orgininally as I understood that way back then.
Those never were meant for tactical single best move- testing, on the contrary engine should judge them almost without search, with as short TC as possible, to get move ordering almost out of "static eval" at once.
For this kind of testing with very short time control, I sorted out 594 of the orginally 1500 in the following way:
Only best move was kept, only those positions of at least 50%- difference between best and second best move- eval, seen from the height of lesser one, e.g. best move 0.75cp, second best not better than 0.50 cp.
SF was the main engine for me to evaluate with, notice that numeric height of eval of a certain engine itself shouldn't really matter if the relation of the evals given by other engines fit as well even with much higher or much lower numbers in output.
Anyhow the evaluation was mainly one of my own more then of enslavement to engines' one, most of the time I did rely on some short Forward- Backward and positional evaluation of the candidate moves done by my personal pov then.
Having that all said, I hope Dann Corbit has a look at the link, if by me also much honoured Mr. Swaminathan is reading her at times anymore, I don't know, but if I see his nick swami in members list still yet too.
https://www.dropbox.com/s/khn8tkonb68kcel/594.epd?dl=0
is .epd of the 594 positions with best move each, the numbers given as sources are such, that I got by putting all 1500 postions in order of the blocks' numbers of orginal suite together in one .pgn- file from 1 to 1500.
Hope, that's ok with you, dear Dann, please let me know your verdict if you find the time to see the positions through once in a while.
Here
Code: Select all
Program Elo +/- Matches Score Av.Op. S.Pos. MST1 MST2 RIndex
1 Lc0v0.30.0-dag+git.c91bf77 : 3500 0 471 50.0 % 3500 486/512 1.0s 1.0s 1.00
2 Stockfish310722 : 3500 0 471 50.0 % 3500 482/512 1.0s 1.0s 0.99
MST1 : Mean solution time (solved positions only)
MST2 : Mean solution time (solved and unsolved positions)
RIndex: Score according to solution time ranking for each position
That just to see, how many would get solved and if there would be any difference in rating and ranking by EloStatTS from Frank Schubert, without a program of that kind, there won't be any use in a test suite like this at all, I fear, counting unsolved postions only wouldn't give any statistical meaning. That there isn't any counting difference between LC0 and SF so far to me is just ok, being both the probably best engines as for their "static eval" at the moment, if TC was short enough to almost avoid any deeper search. LC0 was run with weights_run3_784822.lc0.
If there would be more programs with more positions (trying to get next version up to at least 650 by adding early opening positions out of books and databases with good statistical evaluation from high class games) to run with more engines (which would make discrimination by that bigger too as for the "Matches" of EloStatTS) I hope testing more engines of interest single threaded only then with very short TC again, that could give results of their own of interest yet too, of course as always not to be compared directly with such of other suites, e.g. such of tactical single best move suites, which at least have to be run with quite different hardware- TCs.
Will keep you informed, but will have to have at least some more weeks to get to next version, hopefully of more utility then but the one of this first beta- test.
Enjoy!