Swami,
I'd like to do some analysis on the STS results you have from various programs and their ability to predict playing strength.
I don't believe it will be highly successful, because you don't test many things that are important for computer chess. Things like: pondering,
timing algorithms, .... Especially, timing algorithms can make a big difference.
So, I'd like the data from as many programs as you have on the same hardware under the same conditions. Preferably, all in one file.
Format should be similar to this:
Name & version score on testset1 testset2 ........ testset7
Only raw scores needed not percentages.
My planned analysis will probe for correlations in overall score as well as individual tests and combination scores. I hope to find
which tests are the most predictive or which combination of tests are the most predictive.
Need STS data
Moderator: Ras
-
CRoberson
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
-
swami
- Posts: 6663
- Joined: Thu Mar 09, 2006 4:21 am
Re: Need STS data
Hi Charles,
I will certainly do tests, collect results, and enter them in a website. But I can only start testing on 21st of December. Right now, I have a lots of other things to get myself busy with. I have about 2 weeks holiday starting from December 23rd until January first week. I plan to do tests on as maximum number of engines (>250 or so) as possible.
The goal is to determine the engine's knowledge on specific strategic theme. Surprisingly the results so far seem to closely correlate with the engine's actual playing strength.
Not sure what you meant by timing Algorithms, is it implemented within an engine? I'm also interested in statistics and distribution and your perspective based on the data. I and Dann are currently working on 8th version called "Advancement of King Side Pawn Cover". Would probably release it in months time.
Will certainly send you the data on January first week if that's not too late.
I will make the complete data publicly available as well as inform and remind you via email.
Regards,
Swami
I will certainly do tests, collect results, and enter them in a website. But I can only start testing on 21st of December. Right now, I have a lots of other things to get myself busy with. I have about 2 weeks holiday starting from December 23rd until January first week. I plan to do tests on as maximum number of engines (>250 or so) as possible.
The goal is to determine the engine's knowledge on specific strategic theme. Surprisingly the results so far seem to closely correlate with the engine's actual playing strength.
Not sure what you meant by timing Algorithms, is it implemented within an engine? I'm also interested in statistics and distribution and your perspective based on the data. I and Dann are currently working on 8th version called "Advancement of King Side Pawn Cover". Would probably release it in months time.
Will certainly send you the data on January first week if that's not too late.
I will make the complete data publicly available as well as inform and remind you via email.
Regards,
Swami
-
swami
- Posts: 6663
- Joined: Thu Mar 09, 2006 4:21 am
Re: Need STS data
BTW, In case if you want data on very limited number of engines, I have done the tests on Division 2 and Division 3 for STS v1 to STS v6
You can get the excel file for the results of Division 2 engines here:
http://sites.google.com/site/strategict ... ects=0&d=1
Also, total results of Division 3 engines can be found here:
http://www.talkchess.com/forum/viewtopi ... 5&start=10
The results of latest updates of various engines can be obtained by doing a search on this forum with the keyword "sts"
You can get the excel file for the results of Division 2 engines here:
http://sites.google.com/site/strategict ... ects=0&d=1
Also, total results of Division 3 engines can be found here:
http://www.talkchess.com/forum/viewtopi ... 5&start=10
The results of latest updates of various engines can be obtained by doing a search on this forum with the keyword "sts"
-
Jouni
- Posts: 3722
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Need STS data
I was lazy and tested about 10 top engines only. The best correlation to CEGT/CCRL ratings was STS1-3... But note, that as always in test suites number of solutions isn't the best way to measure, but total time used!
One example in STS3:
Fritz 10-128MB Stockfish 1.5.1 JA-128MB
0:26:45 0:25:30
83 78
Which one is better
? When You use time you get also more precision to calculations with 4-5 numbers.
Jouni
One example in STS3:
Fritz 10-128MB Stockfish 1.5.1 JA-128MB
0:26:45 0:25:30
83 78
Which one is better
Jouni
-
swami
- Posts: 6663
- Joined: Thu Mar 09, 2006 4:21 am
Re: Need STS data
Not many test suites have an ability to predict even the rough strength of the chess engines.
WAC, WSAC, PET, LAPUCE and such are popular test sets in tactics but as you can see many engines score nearly 290/300 in these suites
therefore only the working knowledge of engines in basic tactics is determined.
They can't be used for the comparison purposes.
They can't be used to determine the elo strength of the engine.
Factors:
= STS tests only engine's strategical strength. Not tactical. Therefore you can't compare with the standard elo.
= 7 STS tests are not enough. We need 25 - 30 STS for more accuracy.
within the next 2 to 3 years time, we will have about 25-30 STS test suites. Engine's score out of 3000 positions will give you near accurate strength _in strategy_ I believe.
WAC, WSAC, PET, LAPUCE and such are popular test sets in tactics but as you can see many engines score nearly 290/300 in these suites
therefore only the working knowledge of engines in basic tactics is determined.
They can't be used for the comparison purposes.
They can't be used to determine the elo strength of the engine.
Factors:
= STS tests only engine's strategical strength. Not tactical. Therefore you can't compare with the standard elo.
= 7 STS tests are not enough. We need 25 - 30 STS for more accuracy.
within the next 2 to 3 years time, we will have about 25-30 STS test suites. Engine's score out of 3000 positions will give you near accurate strength _in strategy_ I believe.
-
CRoberson
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Need STS data
Here is what I meant when talking about timing algorithms and such.swami wrote:Hi Charles,
I will certainly do tests, collect results, and enter them in a website. But I can only start testing on 21st of December. Right now, I have a lots of other things to get myself busy with. I have about 2 weeks holiday starting from December 23rd until January first week. I plan to do tests on as maximum number of engines (>250 or so) as possible.
The goal is to determine the engine's knowledge on specific strategic theme. Surprisingly the results so far seem to closely correlate with the engine's actual playing strength.
Not sure what you meant by timing Algorithms, is it implemented within an engine? I'm also interested in statistics and distribution and your perspective based on the data. I and Dann are currently working on 8th version called "Advancement of King Side Pawn Cover". Would probably release it in months time.
Will certainly send you the data on January first week if that's not too late.
I will make the complete data publicly available as well as inform and remind you via email.
Regards,
Swami
To get a perfect test that correlates with the playing strengths, one must test every feature of a chess program that is used during
a match. Test suites do test the search and eval, but they don't test the quality of things like the timing algorithms. Without testing
everything, you can't guarantee an accurate correlation. Thus, it could be used to catch a few clones that copy all then purposely
botch the timing algorithm to hide the cloning.