Alternative methods of testing

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
Andres Valverde
Posts: 587
Joined: Sun Feb 18, 2007 11:07 pm
Location: Almeria. SPAIN
Full name: Andres Valverde Toresano

Re: Alternative methods of testing

Post by Andres Valverde »

Hart wrote:
Andres Valverde wrote:1) Imagine a slow but good engine that solves 1000 positions at 3'' averaqe per position. Total time :

1000 x 3" = 3000 s

2) A _fast_ but not so good engine, finds only 500 solutions out of 1000, but it does it at 2"/position : Total time :

500 x 2" + 500 x 4" = 3000 s

Both engines would have the same rating, but former solved double number of positions than latter!.

So, the number of positions solved have to be used somehow in your formula or I'm missing somthing..
A better model might include the number or percentage of solved position. However, what I have consistently seen is markedly high correlations between rated time and positions solved. In other words, they are both measuring, to a large extent, the same thing: playing strength. In most of my analyses I get better rankings using rated time as well, go figure.

As for your example, I am not quite sure I understand it. From what I can see it does not look like you account for the law of averages or you are using an extreme example where my model would of course fail. No engine will solve x positions in one discrete moment of time and then y at another discrete amount of time in these tests. They will actually be distributed unevenly throughout the time period t largely depending on their relative strength, in which case this distribution will be captured in rated time and will correlate with playing strength.
I think it is not such an extreme case : One engine maybe is faster but it has a bad evaluation so it solves less positions but faster (in the positive cases) than the other. I think it is a possible scenery.

I ran a test with Buzz (slightly fast) and Dirty (a bit slower, same search a more elaborated eval), results are as folllow:

Buzz 0.08 (2300 CCRL) , Total time : 2237'' , Solved : 356/714
Dirty 099OW5 (+2450 CCRL) , Total time : 2245'', Solved : 367/714

In this case, results are clearly not correlated with ELO. It was a quick test with 17 ICCF games (2400 ELO +-, draw result), analyzed from move 20 to 40.

I dont want to say your idea is bad, I think it's very interesting, but still believe that number of positions solved must be taken in account at least.
Saludos, Andres
Hart

Re: Alternative methods of testing

Post by Hart »

With my limited grasp of the statistics software I have, I have only been including one in/dependent variable, rated time, because it gave the best results. If I can figure out how to include both, in a sort of (X1, X2, Y1 / X, Y, Z) fashion, maybe I will get even smaller errors.

Do you know how to do this in QtiPlot or another free statistics software package, that is, linear mapping/curve fitting with multiple coefficients?
User avatar
Andres Valverde
Posts: 587
Joined: Sun Feb 18, 2007 11:07 pm
Location: Almeria. SPAIN
Full name: Andres Valverde Toresano

Re: Alternative methods of testing

Post by Andres Valverde »

Hart wrote:With my limited grasp of the statistics software I have, I have only been including one in/dependent variable, rated time, because it gave the best results. If I can figure out how to include both, in a sort of (X1, X2, Y1 / X, Y, Z) fashion, maybe I will get even smaller errors.

Do you know how to do this in QtiPlot or another free statistics software package, that is, linear mapping/curve fitting with multiple coefficients?
No idea, I asked Pradu, let's wait for his answer :)
Saludos, Andres
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Alternative methods of testing

Post by MattieShoes »

I don't know the correct way to do it, but it's easy to cheat with just two -- generate all the different weightings (X1*.01 +X2*.99, X1*.02+X2*.98, etc), then find the correlation coeffecient for each set. Hopefully graphed, the coeffecients will be a nice upside-down parabolic shape and you pick the maximum.

If it's not scripted, one could zero-in on the max by trial and error, but that runs into problems if the shape isnt a nice upside-down parabola.
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Alternative methods of testing

Post by Pradu »

Andres Valverde wrote:No idea, I asked Pradu, let's wait for his answer :)
You could use linear least squares fitting: http://mathworld.wolfram.com/LeastSquar ... omial.html
http://en.wikipedia.org/wiki/Linear_least_squares

I'm not familiar free software that does this (perhaps SciLab or Octave?). I usually just do this with MatLab. Perhaps you can read online on linear least squares and how to implement it or search the web for a least squares fitting tool.