Tennison wrote:I completely agree with you Robert : it's impossible to have a huge gain with original because you have tested it a lot and choose the best you can.
But maybe it's possible to have a small gain ('til +5/+10) but not more I think.
As an example I have another test running for the moment :
Houdini 1.5a - Houdini 1.5a T3
It's a 10000 games test ... As soon as it is finished I can give you the games if you want.
As you can see, there is no significant plus, as above.
But each time I run a test with the Storm values to 40,50,60, I have little but better results than original.
Maybe a way to follow.
Don't get me wrong, if you enjoy what you do, far be it from me to cast dispersions. But I just wondered if I am the first person to ever ask you if you had considered helping a programmer who might really need it. God knows, there are plenty who could use your "cpu time" in their testing. I imagine Robert has got things covered with Houdini.
Tennison wrote:T3 settings and Z settings seems (as in Sedat Rating List) a little bit better than original. A small plus to Z here but need more games to confirm because so close !
If you want to make the claims above, you need to play more games. For the moment everything is buried under the tower of statistical uncertainty.
For T3 and Z settings there is only one valid conclusion after 1000 games: they are not significantly different from the default settings.
On the Rybka Forum I published the results obtained with "z", "s" and your "T4" settings in 16´000 game matches against the default Houdini 2.0c.
Here are the results against the standard Houdini 2.0c.
- "s" scored 7797-8203 (38% draws), -8 Elo (+/- 4 Elo)
- "z" scored 7922-8078 (39% draws), -4 Elo (+/- 4 Elo)
- "T4" scored 7914-8086 (40% draws), -4 Elo (+/- 4 Elo)
Again, the only conclusion can be that these settings change very little to the objective strength of the engine.
No
The only conclusion is that they change very little in the time control that you tested.
I read that with longer time control things are different
+172,=240,-113 seems convincing but we are going to have 1000 games at 5+3 blitz time control.
We are discussing Ben's tests and my tests.
There may be different tests played with different conditions on different computers by different people leading to different results.
But that's a different thread.
Tennison wrote:T3 settings and Z settings seems (as in Sedat Rating List) a little bit better than original. A small plus to Z here but need more games to confirm because so close !
If you want to make the claims above, you need to play more games. For the moment everything is buried under the tower of statistical uncertainty.
For T3 and Z settings there is only one valid conclusion after 1000 games: they are not significantly different from the default settings.
On the Rybka Forum I published the results obtained with "z", "s" and your "T4" settings in 16´000 game matches against the default Houdini 2.0c.
Here are the results against the standard Houdini 2.0c.
- "s" scored 7797-8203 (38% draws), -8 Elo (+/- 4 Elo)
- "z" scored 7922-8078 (39% draws), -4 Elo (+/- 4 Elo)
- "T4" scored 7914-8086 (40% draws), -4 Elo (+/- 4 Elo)
Again, the only conclusion can be that these settings change very little to the objective strength of the engine.
No
The only conclusion is that they change very little in the time control that you tested.
I read that with longer time control things are different
+172,=240,-113 seems convincing but we are going to have 1000 games at 5+3 blitz time control.
Agreed...
Just i'd like to add we need to concentrate on game results with:
-Latest fast CPU machines (recommended engines to be tested at least with 6 cores, more cores even better...)
-Popular time controls (not recommended ultra fast time controls)
-The strongest modern openings (the winning percentage should be min 40% Blacks;min 50% Whites)
-For reliable rating min 1000 games is required
Sedat Canbaz wrote:-For reliable rating min 1000 games is required
That is only correct if "reliable" means "with a 20 Elo confidence interval".
The number of games has to match the precision or Elo difference you want to measure. To measure to 10 Elo precision you need at least 4000 games.
Yea Robert, but the only thing left out could be the most important. How do we know that the results from 16,000- 30 second games are as valuable as the results from 4,000- 2 minute games? I mean who is supposed to be the one to decide at which time control you decide anything faster than that would lose accuracy?
Sedat Canbaz wrote:Unfortunately so far i did not notice a such rating (including SCCT),where each participant is based on 4000 games per player
That's why most rating lists are only accurate to about 20 Elo.
And that is just the random error, we're not even talking about the systematic errors arising from hardware and opening choice.
Sedat Canbaz wrote:I mean about SCCT results,e.g after 1000 games per player we can see approx. +/- 5 or 10 Elo,no more no less...
Obviously you don't accept the significance of the "+" and "-" columns in your own rating list which show around +/- 20 Elo.
Why would your rating list somehow escape the basic laws of statistics?
Sedat Canbaz wrote:Unfortunately so far i did not notice a such rating (including SCCT),where each participant is based on 4000 games per player
That's why most rating lists are only accurate to about 20 Elo.
And that is just the random error, we're not even talking about the systematic errors arising from hardware and opening choice.
Sedat Canbaz wrote:I mean about SCCT results,e.g after 1000 games per player we can see approx. +/- 5 or 10 Elo,no more no less...
Obviously you don't accept the significance of the "+" and "-" columns in your own rating list which show around +/- 20 Elo.
Why would your rating list somehow escape the basic laws of statistics?
Robert
Of course... this is my view about SCCT results
Note also that my opinion is for Top 20 Engines
And once more i 'd like to mention that so far i did not notice +/- 20 Elo or 40 Elo difference (in case of after 1000 games per player)
For example,up to 300-500 games per player,we have such possibility to see a such high Elo difference
Sedat Canbaz wrote:-For reliable rating min 1000 games is required
That is only correct if "reliable" means "with a 20 Elo confidence interval".
The number of games has to match the precision or Elo difference you want to measure. To measure to 10 Elo precision you need at least 4000 games.
Yea Robert, but the only thing left out could be the most important. How do we know that the results from 16,000- 30 second games are as valuable as the results from 4,000- 2 minute games? I mean who is supposed to be the one to decide at which time control you decide anything faster than that would lose accuracy?
Best,
george
The most relevant are 20 games at extremely long time controls like 2hours/40moves. The errors in this case are less than 3-4 Elo points, in fact there are almost no errors.