Not surprising to me.Dann Corbit wrote:Yes.Kirill Kryukov wrote: {snip}
Do you seriously look for useful novelties in engine-engine games, even in long time controls?
Found some, too.

Moderator: Ras
Not surprising to me.Dann Corbit wrote:Yes.Kirill Kryukov wrote: {snip}
Do you seriously look for useful novelties in engine-engine games, even in long time controls?
Found some, too.
How many games would be required for about +-20elo margin?bob wrote:That only means you are not playing enough games. To get to the +/- 4 Elo level, you need to play 40,000 games or so. and +/- 4 gives a significant margin for error even with that many games...Kempelen wrote:And when you repeat tests in fast time controls, do you see a repetition in the result?. I have noted that repeting tests in fast time controls, the result change more than repeting in slows. Have you noted something similar?
the basic idea is that if you double the number of games, you reduce the error by a factor of four, as a pretty close estimate. So to get to +/- 1, 64K games would be needed. to reduce the error rate by a factor of 2 (to get to +/-2) you would need approximately 46,000 games, considering that 32,000 games is +/- 4. (sqrt(2) * 32000)Steelman wrote:How many games would be required for about +-20elo margin?bob wrote:That only means you are not playing enough games. To get to the +/- 4 Elo level, you need to play 40,000 games or so. and +/- 4 gives a significant margin for error even with that many games...Kempelen wrote:And when you repeat tests in fast time controls, do you see a repetition in the result?. I have noted that repeting tests in fast time controls, the result change more than repeting in slows. Have you noted something similar?
And test at both fast and then at a slower speed? Slower being no less than 20 to 30 min games. I wish these games could be played at more like 60 or 90 min games but that seems it would take (even for Bob) some time.
I ask this because at these fast speeds the playing strength is effected a great deal. The positional and even tactical abilities have been reduced. Would that not even effect the evaluation of positions. I have "tuned" my evaluations to playing at much slower speeds, not fast. I would think that some values would need to be adjusted for fast speeds? Or is this not true?
So my vote is no. I don't think testing at fast speeds is giving you the "real" data you are looking for? The data at slower speeds would have to be more accurate and a true test of playing strength. Unless of course the program is intended to play speed chess all the time.
Code: Select all
1 Toga2 2665 2 4 428010 59% 2601 22%
2 Glaurung 2.1 2663 3 2 428010 58% 2601 21%
3 Crafty-22.9R17-12 2608 3 3 93384 51% 2597 21%
This would make me suspicious against BayesElo. At the very least the quoted error cannot mean what we think it means. With 428000 games the 2-sigma error in the win percentage should be 80%/sqrt(428000) = 0.12%, which should result in an Elo 2-sigma confidence of 0.85 Elo (in the 30-70% score range).bob wrote:Notice the first two lines, with 428010 games each, and a +2-4 or +3-2 error margin.
All I can say is there were N versions of Crafty, each playing 8K games against glaurung 2, toga2, fruit2 and glaurung 1. Knowing there are exactly 3891 positions in my starting position test set, double that many games per match, you could compute how many different versions of Crafty there are. I assume that in this case, since each version of Crafty is distinct, and each version only plays G1, G2, F2 and T2, that is where the extra uncertainty comes from. But that is just an assumption, where those 4 never play each other, and no two versions of crafty play each other. Given that, it is not so surprising. I believe that the last time I tried, two opponents seemed to follow the expected +/- error as the number of games increase. Here the number of games is high, but the number of games between the programs is a bit warped.hgm wrote:This would make me suspicious against BayesElo. At the very least the quoted error cannot mean what we think it means. With 428000 games the 2-sigma error in the win percentage should be 80%/sqrt(428000) = 0.12%, which should result in an Elo 2-sigma confidence of 0.85 Elo (in the 30-70% score range).bob wrote:Notice the first two lines, with 428010 games each, and a +2-4 or +3-2 error margin.
The quoted error might reflect the uncertainty in the ratings of the opponents, from which the current rating is derived. The error in the rating difference between two players with 428000 games each should be 1.2%, and the covariance given by BayesEo should reflect that.
If not, it is simply wrong.
It's the other way around. to divide the error by 2, you need ngames * sqrt(2). To divide by 4, you need 2x as many games.Hart wrote:If the error is a function of the square of the sample size, then don't you need 4x as many games to get half the error?
You are right. I was thinking about it from a "backward" direction... Although it doesn't really explain the other issue we were discussing...hgm wrote:Hart is right, Bob is wrong.