partial credit scores could be a good idea: currently, if an engine finds a slighly lower move, it is penalized as much as an engine that blunders in the position.
May be that some of the dispersion comes from that.
I was thinking about it yesterday, but believed it was too difficult to implement.
STS - List the Order of Importance
Moderators: hgm, Rebel, chrisw
-
- Posts: 364
- Joined: Sun Oct 04, 2009 1:27 pm
- Location: Italy
-
- Posts: 364
- Joined: Sun Oct 04, 2009 1:27 pm
- Location: Italy
Re: STS - List the Order of Importance
I understand that the new software Sts stat would allow partial scores.
What do you suggest to do? I can go on looking for outliers ( as I said you Typhoon (?) and Cerebro are strange, much better in your tests than in elo - bad time management in matches??), and other behaviours, and analyses.
The best tentative complete analysis should however include also partial ratings, total time to solve (just as a correcting factor), but, IMHO, also elo ratings /match-time they are related to (another correcting factor) (Cerebro ,e.g., came from a 40/20)
Lot of work. How to proceed?
0) I go on anlyzing and understanding (hopefully relations
1) I could wait for your lower and higher division tournaments.
2) If you want , put already inthere also partial scores... I'l try to see if it seems to improve fit.
OK?
noctiferus (e.)
ciao enrico
What do you suggest to do? I can go on looking for outliers ( as I said you Typhoon (?) and Cerebro are strange, much better in your tests than in elo - bad time management in matches??), and other behaviours, and analyses.
The best tentative complete analysis should however include also partial ratings, total time to solve (just as a correcting factor), but, IMHO, also elo ratings /match-time they are related to (another correcting factor) (Cerebro ,e.g., came from a 40/20)
Lot of work. How to proceed?
0) I go on anlyzing and understanding (hopefully relations
1) I could wait for your lower and higher division tournaments.
2) If you want , put already inthere also partial scores... I'l try to see if it seems to improve fit.
OK?
noctiferus (e.)
ciao enrico
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: STS - List the Order of Importance
Excel File now updated with 11 Engines addition!
Please download:
http://sites.google.com/site/strategict ... STS1-8.xls
Let me know what the statistical assessment is.
Meanwhile I will now begin testing 11 more engines and keep updating the Excel file.
Please download:
http://sites.google.com/site/strategict ... STS1-8.xls
Let me know what the statistical assessment is.
Meanwhile I will now begin testing 11 more engines and keep updating the Excel file.
-
- Posts: 364
- Joined: Sun Oct 04, 2009 1:27 pm
- Location: Italy
Re: STS - List the Order of Importance
Got it. tomorrow I'll work on it.
ciao
e.
ciao
e.
-
- Posts: 2071
- Joined: Thu May 04, 2006 3:40 am
- Location: Dune
Re: STS - List the Order of Importance
The highest CEGT rated engine in your list, Critter 0.42 (2754 40/4), also received the highest total score, 562. However Bison, which is CEGT rated 2727, scored 499, while Hamsters 0.7 which is CEGT rated 2626 scored higher with a 502. List 5.12 is CEGT rated 2652 and scored 503. So correlation not yet perfect.swami wrote:Excel File now updated with 11 Engines addition!
Please download:
http://sites.google.com/site/strategict ... STS1-8.xls
Let me know what the statistical assessment is.
Meanwhile I will now begin testing 11 more engines and keep updating the Excel file.
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: STS - List the Order of Importance
I forgot to update the scores for Bison!Leto wrote:The highest CEGT rated engine in your list, Critter 0.42 (2754 40/4), also received the highest total score, 562. However Bison, which is CEGT rated 2727, scored 499, while Hamsters 0.7 which is CEGT rated 2626 scored higher with a 502. List 5.12 is CEGT rated 2652 and scored 503. So correlation not yet perfect.swami wrote:Excel File now updated with 11 Engines addition!
Please download:
http://sites.google.com/site/strategict ... STS1-8.xls
Let me know what the statistical assessment is.
Meanwhile I will now begin testing 11 more engines and keep updating the Excel file.
I had tested Bison 9.8 and posted the results in the same thread:
http://www.talkchess.com/forum/viewtopi ... 74&t=31700
The Excel Page is now updated with 533 score for Bison:
http://sites.google.com/site/strategict ... STS1-8.xls
-
- Posts: 364
- Joined: Sun Oct 04, 2009 1:27 pm
- Location: Italy
Re: STS - List the Order of Importance
Just what i was posting now. Maybe that today or tomorrow latest, i'll bother you to get some details about some few engines'anomalous behaviour.
Please, would you so kind to specify which engines versions are you using (some are missing), so that I can verify cegt? and can you confirm that all come from 40\4?
I'm getting some outliers, and would check everithing is OK.
Anyway, as far I see, a test set only evaluating positional ability , not taking into account tactics, endgames, time management in matches, has a big agreemement with the general cegt evaluation.
Please, would you so kind to specify which engines versions are you using (some are missing), so that I can verify cegt? and can you confirm that all come from 40\4?
I'm getting some outliers, and would check everithing is OK.
Anyway, as far I see, a test set only evaluating positional ability , not taking into account tactics, endgames, time management in matches, has a big agreemement with the general cegt evaluation.
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: STS - List the Order of Importance
Little Goliath Nemesis
Tornado 2.5
Slowchess WV2
Movei 0.4.438
Let me know if there are still engines there whose version numbers were not listed.
Yes, It's good to know that for the most part the test suites matches CEGT rating despite not taking time management/tactics/endgame into account
Tornado 2.5
Slowchess WV2
Movei 0.4.438
Let me know if there are still engines there whose version numbers were not listed.
Yes, It's good to know that for the most part the test suites matches CEGT rating despite not taking time management/tactics/endgame into account
-
- Posts: 364
- Joined: Sun Oct 04, 2009 1:27 pm
- Location: Italy
Re: STS - List the Order of Importance
Can you help me, in order to spare some time on data verification? (sorry, this is a data miner's paranoia). After a first quick and dirty evaluation, I'm just preprocessing data.
May you confirm that all CEGT ratings come from 40\4?
In case of MP, which one you use (if available in the web): 4, 2 ,1: is it homogeneous, or some engine uses less cpu's?
I'm getting, with initial unchecked data, rather good fits (not enough to make you jump on your seat; however, it's interesting).
Also, I would like to put the question: why there are, for some engines, such heavy discrepancies?
Maybe it could help, if this testing approach could be extended to other testsets related to other game phases or characteristics that aren't taken into account in your mainly "positional testset" (BTW, great), to help authors in evaluating their engines'weaknesses for improvement...
May you confirm that all CEGT ratings come from 40\4?
In case of MP, which one you use (if available in the web): 4, 2 ,1: is it homogeneous, or some engine uses less cpu's?
I'm getting, with initial unchecked data, rather good fits (not enough to make you jump on your seat; however, it's interesting).
Also, I would like to put the question: why there are, for some engines, such heavy discrepancies?
Maybe it could help, if this testing approach could be extended to other testsets related to other game phases or characteristics that aren't taken into account in your mainly "positional testset" (BTW, great), to help authors in evaluating their engines'weaknesses for improvement...
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: STS - List the Order of Importance
Hi Enrico, Edmund and others who maybe interested in doing statistical calculation, please see this file with complete results of STS 1 to 10.
I've recently tested out 53 engines on STS 9 and 10 and appended the results in excel file.
http://sites.google.com/site/strategict ... esults.xls
Would be interested to know what the result/change is like.
Thanks so much!
10 seconds per position.
CCRL/CEGT ratings as reference in Blitz time controls
I've recently tested out 53 engines on STS 9 and 10 and appended the results in excel file.
http://sites.google.com/site/strategict ... esults.xls
Would be interested to know what the result/change is like.
Thanks so much!
10 seconds per position.
CCRL/CEGT ratings as reference in Blitz time controls