STS - List the Order of Importance

swami · Post by **swami** » Thu Jan 14, 2010 8:58 am

I just read the interesting discussion in programming forum:

http://www.talkchess.com/forum/viewtopi ... 4&start=10

I think it's not fair that all test suites in STS get the _equal_ weightage. I believe some test suites are more important than others, obviously. I just don't know how to give the proper/balanced weightages. Therefore, I gave equal weightages for all the test suites just to get the idea of rough strength of ratings of new engines or rough strength improvement of one version over the other.

In cases where two engines are almost nearly equal strength-wise. There's need to place emphasis on the engine's knowledge on certain test suites.

This is the order of Importance. (IMO)

Code: Select all

 
1. STS &#40;v4.1&#41; Square Vacancy
2. STS &#40;v6.0&#41; Re-Capturing
3. STS &#40;v7.0&#41; Offer of Simplification
4. STS &#40;V1.0&#41; Undermining
5. STS &#40;v5.0&#41; Bishop vs Knight
6. STS &#40;v2.1&#41; Open Files and Diagonals
7. STS &#40;v3.0&#41; Knight Outposts/Centralization/Repositioning
8. STS &#40;v8.0&#41; Advancement of f/g/h Pawns

Your mileage may vary.

What do you think? Can you post your own version, if possible?

Best Regards,
Swami

Edmund · Post by **Edmund** » Thu Jan 14, 2010 9:25 am

As Marco Costalba suggests, you can infer the weights by comparing it to the real elo.

So you could do something like:

STS_1 * weight_1 + STS_2 * weight_2 ... + STS_n * weight_n = STS_elo

do this calculation with all results you have got, always replacing the STS_scores. This way find a set of weights that result in the minimum sum of differences of STS_elo and the real engine_elo for all engines.

if you lay out the data properly in a table (all the STSscores and elos of the engines) any statistic program could calculate the weights in no time.

swami · Post by **swami** » Thu Jan 14, 2010 9:34 am

Edmund wrote:As Marco Costalba suggests, you can infer the weights by comparing it to the real elo.

So you could do something like:

STS_1 * weight_1 + STS_2 * weight_2 ... + STS_n * weight_n = STS_elo

do this calculation with all results you have got, always replacing the STS_scores. This way find a set of weights that result in the minimum sum of differences of STS_elo and the real engine_elo for all engines.

if you lay out the data properly in a table (all the STSscores and elos of the engines) any statistic program could calculate the weights in no time.

Thank you! This looks like a brilliant suggestion.

I will experiment with this for a while with the set of ten engines but I think it might be best to wait until I release 7 more test suites in the next 6 months because there are more important strategical ideas that are yet to be put into practice via new test suites, and that the weightages may greatly vary once I release the next suite, and so on. STS is still in an early stage.

noctiferus · Post by **noctiferus** » Thu Jan 14, 2010 10:30 am

Swami:
Pay attention to the overfitting problem, doing the regression!
If you have results from n testsuites you need to estimate (n+1) parameters, so that you must put under test a number of engines much bigger than that...
now you have 10 engines and 8(+1, if you allow for a nonnull intercept) parameters, which leaves to you only 2 degrees of freedom, at most....
It boils down to a very good descriptive model, but with a poor predicting power.

mcostalba · Post by **mcostalba** » Thu Jan 14, 2010 12:21 pm

swami wrote: I will experiment with this for a while with the set of ten engines but I think it might be best to wait until I release 7 more test suites in the next 6 months because there are more important strategical ideas that are yet to be put into practice via new test suites, and that the weightages may greatly vary once I release the next suite, and so on. STS is still in an early stage.

Swami, I suggest you post a (long) list with all the engines in first column, then the score of each test, one test per column then at the end the CEGT/CCRL ELO esitmation, something like this:

Code: Select all

Engine    STS 1     STS 2     STS 3    .......         CEGT ELO       CCRL ELO
myeng       65       70         45                      2350            2380
nice        70       89         34                      2480            2520
lemon       45       60         60                      2300            2380

And some good fellow with a bit of spare time in his hand and Excel in the other will do the work..without waiting 6 monts

noctiferus · Post by **noctiferus** » Thu Jan 14, 2010 12:59 pm

I might do it. I would like to try, also, some other less classical methodologies, on these data.
It's my academic area.
However the basic problem, as I said, is to have data about enough engines, in order to have a good predictive model .

mcostalba · Post by **mcostalba** » Thu Jan 14, 2010 1:09 pm

noctiferus wrote: However the basic problem, as I said, is to have data about enough engines, in order to have a good predictive model .

yes, and also the source should be unique because all the numbers should come from tests done on the same hardware and with the same conditions.

noctiferus · Post by **noctiferus** » Thu Jan 14, 2010 1:24 pm

Of course: we don't need introducing noise factors...,

swami · Post by **swami** » Thu Jan 14, 2010 1:49 pm

noctiferus wrote:I might do it. I would like to try, also, some other less classical methodologies, on these data.
It's my academic area.
However the basic problem, as I said, is to have data about enough engines, in order to have a good predictive model .

Hi Enrico,

That sounds great! I'd like to know the what the total number of engines is that you need their data for? Does 100 sound good? I'd be willing to test 150 or more if that satisfies the necessary requirement to process the set of datas.

mcostalba · Post by **mcostalba** » Thu Jan 14, 2010 1:56 pm

swami wrote:
noctiferus wrote:I might do it. I would like to try, also, some other less classical methodologies, on these data.
It's my academic area.
However the basic problem, as I said, is to have data about enough engines, in order to have a good predictive model .
Hi Enrico,

That sounds great! I'd like to know the what the total number of engines is that you need their data for? Does 100 sound good? I'd be willing to test 150 or more if that satisfies the necessary requirement to process the set of datas.

I would like to suggest engines for which does exsist a reliable ELO estimation by CCRL and CEGT, otherwise are useless.

STS - List the Order of Importance

STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance

Re: STS - List the Order of Importance