Yes I have, but am not at all satisfied: too many outliers, that may heavily condition results. Since Swami told me of some possible errors, I didn' do a full analysis: there is no out of sample prediction, which I would like. anyway i'll post here a rather unreliable model, where you can see the importance of test.
One of the reasons that make me suspectful is that the last three are used as corrections in interaction terms (could be reasonable, even under a 3rd type-test...)
http://docs.google.com/leaf?id=0B04Ub6O ... M2I1&hl=en
(done with JMP8)
I would suggest to Swami to add some columns specifying CEGT test time, n of Cpu', 32 or 64 bits, when they are not the normal choice. It would help in justifying some anomalies.
noctiferus wrote:Yes I have, but am not at all satisfied: too many outliers, that may heavily condition results. Since Swami told me of some possible errors, I didn' do a full analysis: there is no out of sample prediction, which I would like. anyway i'll post here a rather unreliable model, where you can see the importance of test.
One of the reasons that make me suspectful is that the last three are used as corrections in interaction terms (could be reasonable, even under a 3rd type-test...)
http://docs.google.com/leaf?id=0B04Ub6O ... M2I1&hl=en
(done with JMP8)
I would suggest to Swami to add some columns specifying CEGT test time, n of Cpu', 32 or 64 bits, when they are not the normal choice. It would help in justifying some anomalies.
Yes, I don't put emphasis on results from that excel file since the test was done in Arena which had bug of introducing new move and awarding points for correct guesses to the newly introduced move.
I will use GradualTest and enter the results in Excel file the next time I do the test which will begin after I finish the STS 10. We will have /1000 and /10,000 scores which would be superb! makes it easier to organize the data and more easier to understand
Considering STS suite: with 900 positions it takes at least 9000 s to test. This is 2,5 hours. If You want estimate about engine strength I think You get better estimate by playing blitz games for that time
Jouni wrote:Considering STS suite: with 900 positions it takes at least 9000 s to test. This is 2,5 hours. If You want estimate about engine stregth I think You get better estimate by playing blitz games for that time
Jouni
I don't want to estimate the strength.
I want to estimate the knowledge present in each theme. Dissect the engine into parts and examine the knowledge present in each of underlying theme.
And help authors improve the engine in area where it's weak.
Jouni wrote:Considering STS suite: with 900 positions it takes at least 9000 s to test. This is 2,5 hours.
You could also test it in 1 second per position or 2 seconds per position, the results will also be nearly same and certainly faster than testing it in blitz gauntlet.