Toga The Killer 1Y MP 4CPU is the strongest Toga....

bob · Post by **bob** » Fri Jun 26, 2009 8:29 pm

krazyken wrote:
bob wrote:
krazyken wrote:
bob wrote:
yanquis1972 wrote:the results he posted are quite a bit better than nothing, obviously. you can glance at them & get a fairly good idea of TK's strength (id guess about 3000+ CCRL elo). even if i'm off by a long shot, i'm going to be a lot closer than i would if we just had random results, because chess is not random chaos. in fact it's much farther removed from chance than almost any game i can think of.

anyway, it's anyone's choice how to use their hardware & software & by looking at the results posted & combining them with mine i can see that naum is probably not some kind of special poision for TK, but that its performance was what should be expected.
My point was that 10 games is worthless for determining anything. In a 1000 game match, you can probably find 10 games in a row where each side wins. If you trust 10 game results, that's up to you. I know the inaccuracy this involves.
Data points are never worthless. You are going to need much more than a 1000 game match to declare that both players get a 10 in a row as probable. If the two players are equal the chance of one of them getting a 10 in a row is far less than 1%. The chance of both is far lower. True 10 games has a large confidence interval, but it is far from worthless.
Sorry, but if the programs are 100 (or even 200) Elo of each other, 10 games _is_ absolutely worthless for identifying which is best.. Absolutely worthless...

BTW, as far as 10 in a row needing more than 1000 games? See "the birthday paradox". You don't even need 1000 to be reasonably sure...
A. You are changing your statement. I was responding to "10 games is worthless for determining anything" now you are switching to "10 games _is_ absolutely worthless for identifying which is best." Which is statistically a completely different question. The first is false, the second is frequently true, especially given the qualification you added to it.

B. Picking 2 people out of a group in the Birthday Paradox, has nothing to do with the problem of finding a particular streak in a series. The formula for finding the probability of a streak of wins with independent trials is:

(N - x + 1)(p^x)

Where N is the number of trials, x is the length of the streak, and p is the probability of winning. Depending on what p is, I slightly overstated the case before, the probability of finding a streak of 10 wins in 1000 a is close to 1%, not far less. The probability of a streak of 10 wins and a streak of 10 losses is still far less though.

Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.

So exactly _what_ can you conclude from 10 games?

mhull · Post by **mhull** » Fri Jun 26, 2009 9:01 pm

bob wrote: Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.

So exactly _what_ can you conclude from 10 games?

Maybe a GM could tell something from a few games. Like when Roman would advise where crafty was not understanding something.

However, I like your having a zillion games where a GM is not on-call to the project.

I wonder if there are procedural ways to analyze lost games among the zillions. Mates in middlegame, various kinds of endings, pawns for pieces, rooks for queens, etc. Stats for each category. Maybe that could guide tuning. Maybe you already do something like this.

krazyken · Post by **krazyken** » Fri Jun 26, 2009 9:55 pm

bob wrote: Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.

So exactly _what_ can you conclude from 10 games?

That will depend on the results. 10-0-0 tells you one thing, 0-0-10 tells you another.

For example:

Code: Select all

Rybka 2.2n2 mp vs. Glaurung 2.1 +6 -1 =3
Rank Name             Elo    +    - games score oppo. draws 
   1 Rybka 2.2n2 mp   134  165  165    10   75%     0   30% 
   2 Glaurung 2.1       0  165  165    10   25%   134   30%

gives you a 90% confidence that Rybka is better than Glaurung. If you want to know how much better or if 90% isn't good enough, you will need more games to answer that.

Code: Select all

Kiwi 0.6d vs. ZCTx64.MP0.3.2486 +6 -3 =1
Rank Name                Elo    +    - games score oppo. draws 
   1 Kiwi 0.6d            90  172  172    10   65%     0   10% 
   2 ZCTx64.MP0.3.2486     0  172  172    10   35%    90   10%

a closer result, about 84% chance Kiwi is better, not a strong conclusion, but a conclusion nevertheless.

As a final note, if I post 10 games and a few others post their 10 games matches, I would have much more info than if nobody posted their 10 games for fear of having their time called worthless.

ernest · Post by **ernest** » Fri Jun 26, 2009 10:02 pm

krazyken wrote:As a final note, if I post 10 games and a few others post their 10 games matches, I would have much more info than if nobody posted their 10 games for fear of having their time called worthless.

Why always let the others finish the work...
To speak a 10-word sentence you do not need 10 people to utter one word, unless you are severely disabled!

krazyken · Post by **krazyken** » Fri Jun 26, 2009 10:19 pm

ernest wrote:
krazyken wrote:As a final note, if I post 10 games and a few others post their 10 games matches, I would have much more info than if nobody posted their 10 games for fear of having their time called worthless.
Why always let the others finish the work...
To speak a 10-word sentence you do not need 10 people to utter one word, unless you are severely disabled!

Helping others is fun! Why always let the others do the work on their own?

bob · Post by **bob** » Fri Jun 26, 2009 11:11 pm

mhull wrote:
bob wrote: Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.

So exactly _what_ can you conclude from 10 games?
Maybe a GM could tell something from a few games. Like when Roman would advise where crafty was not understanding something.

However, I like your having a zillion games where a GM is not on-call to the project.

I wonder if there are procedural ways to analyze lost games among the zillions. Mates in middlegame, various kinds of endings, pawns for pieces, rooks for queens, etc. Stats for each category. Maybe that could guide tuning. Maybe you already do something like this.

This is a computer chess data mining panacea. But so far, unexploited, unfortunately. The amount of data I throw away in a week's worth of games is astounding.

bob · Post by **bob** » Fri Jun 26, 2009 11:15 pm

krazyken wrote:
bob wrote: Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.

So exactly _what_ can you conclude from 10 games?
That will depend on the results. 10-0-0 tells you one thing, 0-0-10 tells you another.

Yes, one tells me I won ten games, the other tells me I lost ten games. I can draw _no_ conclusions about which is better. I've shows the data in the past. Many times the first 100 games show something _entirely_ different from what I find after 32,000 games.

For example:
Code: Select all
Rybka 2.2n2 mp vs. Glaurung 2.1 +6 -1 =3
Rank Name             Elo    +    - games score oppo. draws 
   1 Rybka 2.2n2 mp   134  165  165    10   75%     0   30% 
   2 Glaurung 2.1       0  165  165    10   25%   134   30% 
gives you a 90% confidence that Rybka is better than Glaurung. If you want to know how much better or if 90% isn't good enough, you will need more games to answer that.
Code: Select all
Kiwi 0.6d vs. ZCTx64.MP0.3.2486 +6 -3 =1
Rank Name                Elo    +    - games score oppo. draws 
   1 Kiwi 0.6d            90  172  172    10   65%     0   10% 
   2 ZCTx64.MP0.3.2486     0  172  172    10   35%    90   10% 
a closer result, about 84% chance Kiwi is better, not a strong conclusion, but a conclusion nevertheless.

As a final note, if I post 10 games and a few others post their 10 games matches, I would have much more info than if nobody posted their 10 games for fear of having their time called worthless.

However, that is _not_ what is happening. Someone is posting ten game results, and drawing a conclusion that is not supported by any statistical analysis of any kind. 10 games. Almost a +/- 200 error bar. Not very informative or useful.

krazyken · Post by **krazyken** » Fri Jun 26, 2009 11:56 pm

bob wrote:
bob wrote: Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.

So exactly _what_ can you conclude from 10 games?

That will depend on the results. 10-0-0 tells you one thing, 0-0-10 tells you another.

Yes, one tells me I won ten games, the other tells me I lost ten games. I can draw _no_ conclusions about which is better. I've shows the data in the past. Many times the first 100 games show something _entirely_ different from what I find after 32,000 games.

Your setup is different. You are using various starting positions, so your results will be dependent upon the order of the starting positions.

krazyken · Post by **krazyken** » Sat Jun 27, 2009 12:18 am

bob wrote:
krazyken wrote:
As a final note, if I post 10 games and a few others post their 10 games matches, I would have much more info than if nobody posted their 10 games for fear of having their time called worthless.
However, that is _not_ what is happening. Someone is posting ten game results, and drawing a conclusion that is not supported by any statistical analysis of any kind. 10 games. Almost a +/- 200 error bar. Not very informative or useful.

I didn't see anybody drawing conclusions in this thread from only 10 games. It seems the person that was being ridiculed for posting only 10 games has already posted 70. and still nobody has drawn any conclusions based on the data.

bob · Post by **bob** » Sat Jun 27, 2009 2:14 am

krazyken wrote:
bob wrote:
bob wrote: Tell me what you can determine from 10 games between A and B. You can't, with any confidence at all, decide which is best. You can't, with any confidence, decide that A and B are at least reliable and can play long matches without one crashing. You can't, with any confidence, decide that neither A or B will uncork an illegal move or fail to recognize an opponent's legal move in a long match.

So exactly _what_ can you conclude from 10 games?

That will depend on the results. 10-0-0 tells you one thing, 0-0-10 tells you another.

Yes, one tells me I won ten games, the other tells me I lost ten games. I can draw _no_ conclusions about which is better. I've shows the data in the past. Many times the first 100 games show something _entirely_ different from what I find after 32,000 games.
Your setup is different. You are using various starting positions, so your results will be dependent upon the order of the starting positions.

Except that the tests start in a random order... so there is no predicting.

Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....