STS - List the Order of Importance

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

noctiferus
Posts: 364
Joined: Sun Oct 04, 2009 1:27 pm
Location: Italy

Re: STS - List the Order of Importance

Post by noctiferus »

partial credit scores could be a good idea: currently, if an engine finds a slighly lower move, it is penalized as much as an engine that blunders in the position.
May be that some of the dispersion comes from that.

I was thinking about it yesterday, but believed it was too difficult to implement.
noctiferus
Posts: 364
Joined: Sun Oct 04, 2009 1:27 pm
Location: Italy

Re: STS - List the Order of Importance

Post by noctiferus »

I understand that the new software Sts stat would allow partial scores.

What do you suggest to do? I can go on looking for outliers ( as I said you Typhoon (?) and Cerebro are strange, much better in your tests than in elo - bad time management in matches??), and other behaviours, and analyses.

The best tentative complete analysis should however include also partial ratings, total time to solve (just as a correcting factor), but, IMHO, also elo ratings /match-time they are related to (another correcting factor) (Cerebro ,e.g., came from a 40/20)

Lot of work. How to proceed?

0) I go on anlyzing and understanding (hopefully :-) relations
1) I could wait for your lower and higher division tournaments.
2) If you want , put already inthere also partial scores... I'l try to see if it seems to improve fit.
OK?
noctiferus (e.)

ciao enrico
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Excel File now updated with 11 Engines addition! :)

Please download:
http://sites.google.com/site/strategict ... STS1-8.xls

Let me know what the statistical assessment is.

Meanwhile I will now begin testing 11 more engines and keep updating the Excel file.
noctiferus
Posts: 364
Joined: Sun Oct 04, 2009 1:27 pm
Location: Italy

Re: STS - List the Order of Importance

Post by noctiferus »

Got it. tomorrow I'll work on it.
ciao
e.
User avatar
Leto
Posts: 2071
Joined: Thu May 04, 2006 3:40 am
Location: Dune

Re: STS - List the Order of Importance

Post by Leto »

swami wrote:Excel File now updated with 11 Engines addition! :)

Please download:
http://sites.google.com/site/strategict ... STS1-8.xls

Let me know what the statistical assessment is.

Meanwhile I will now begin testing 11 more engines and keep updating the Excel file.
The highest CEGT rated engine in your list, Critter 0.42 (2754 40/4), also received the highest total score, 562. However Bison, which is CEGT rated 2727, scored 499, while Hamsters 0.7 which is CEGT rated 2626 scored higher with a 502. List 5.12 is CEGT rated 2652 and scored 503. So correlation not yet perfect.
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Leto wrote:
swami wrote:Excel File now updated with 11 Engines addition! :)

Please download:
http://sites.google.com/site/strategict ... STS1-8.xls

Let me know what the statistical assessment is.

Meanwhile I will now begin testing 11 more engines and keep updating the Excel file.
The highest CEGT rated engine in your list, Critter 0.42 (2754 40/4), also received the highest total score, 562. However Bison, which is CEGT rated 2727, scored 499, while Hamsters 0.7 which is CEGT rated 2626 scored higher with a 502. List 5.12 is CEGT rated 2652 and scored 503. So correlation not yet perfect.
I forgot to update the scores for Bison!

I had tested Bison 9.8 and posted the results in the same thread:

http://www.talkchess.com/forum/viewtopi ... 74&t=31700

The Excel Page is now updated with 533 score for Bison:

http://sites.google.com/site/strategict ... STS1-8.xls
noctiferus
Posts: 364
Joined: Sun Oct 04, 2009 1:27 pm
Location: Italy

Re: STS - List the Order of Importance

Post by noctiferus »

Just what i was posting now. Maybe that today or tomorrow latest, i'll bother you to get some details about some few engines'anomalous behaviour.

Please, would you so kind to specify which engines versions are you using (some are missing), so that I can verify cegt? and can you confirm that all come from 40\4?
I'm getting some outliers, and would check everithing is OK.

Anyway, as far I see, a test set only evaluating positional ability , not taking into account tactics, endgames, time management in matches, has a big agreemement with the general cegt evaluation.
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Little Goliath Nemesis
Tornado 2.5
Slowchess WV2
Movei 0.4.438

Let me know if there are still engines there whose version numbers were not listed.

Yes, It's good to know that for the most part the test suites matches CEGT rating despite not taking time management/tactics/endgame into account :)
noctiferus
Posts: 364
Joined: Sun Oct 04, 2009 1:27 pm
Location: Italy

Re: STS - List the Order of Importance

Post by noctiferus »

Can you help me, in order to spare some time on data verification? (sorry, this is a data miner's paranoia). After a first quick and dirty evaluation, I'm just preprocessing data.

May you confirm that all CEGT ratings come from 40\4?
In case of MP, which one you use (if available in the web): 4, 2 ,1: is it homogeneous, or some engine uses less cpu's?

I'm getting, with initial unchecked data, rather good fits (not enough to make you jump on your seat; however, it's interesting).

Also, I would like to put the question: why there are, for some engines, such heavy discrepancies?
Maybe it could help, if this testing approach could be extended to other testsets related to other game phases or characteristics that aren't taken into account in your mainly "positional testset" (BTW, great), to help authors in evaluating their engines'weaknesses for improvement...
swami
Posts: 6640
Joined: Thu Mar 09, 2006 4:21 am

Re: STS - List the Order of Importance

Post by swami »

Hi Enrico, Edmund and others who maybe interested in doing statistical calculation, please see this file with complete results of STS 1 to 10.

I've recently tested out 53 engines on STS 9 and 10 and appended the results in excel file.

http://sites.google.com/site/strategict ... esults.xls

Would be interested to know what the result/change is like.

Thanks so much!

10 seconds per position.
CCRL/CEGT ratings as reference in Blitz time controls
Image