Conditions: Win 2003 w32 / 2CPU / 128MB Hash / HS 8 Moves / 20mn+5s / 120 games
Firebird 1.0 - Rybka 3.0 : 69,5-50,5 (+33/=73/-14) 57,92%-42,08%
Another Firebird-Rybka match (120 games @ 20mn+5s)
Moderator: Ras
-
- Posts: 98
- Joined: Thu Jul 23, 2009 5:40 am
-
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
Andre wrote:Conditions: Win 2003 w32 / 2CPU / 128MB Hash / HS 8 Moves / 20mn+5s / 120 games
Firebird 1.0 - Rybka 3.0 : 69,5-50,5 (+33/=73/-14) 57,92%-42,08%
Interesting data. That suggests a 57 Elo improvement. However, the margins on 120 games may be around 58 Elo. So, insufficient data to
prove an improvement. I don't know the exact margins for 120 games. For 96 games it is 60 and for 200 games it is 42. From that, I guessed at 58 for 120.
-
- Posts: 2331
- Joined: Mon Apr 09, 2007 5:36 pm
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
I have new informations:CRoberson wrote:Andre wrote:Conditions: Win 2003 w32 / 2CPU / 128MB Hash / HS 8 Moves / 20mn+5s / 120 games
Firebird 1.0 - Rybka 3.0 : 69,5-50,5 (+33/=73/-14) 57,92%-42,08%
Interesting data. That suggests a 57 Elo improvement. However, the margins on 120 games may be around 58 Elo. So, insufficient data to
prove an improvement. I don't know the exact margins for 120 games. For 96 games it is 60 and for 200 games it is 42. From that, I guessed at 58 for 120.
Combined score after 402 games
227.0 - 175.0 in favor of RobboLito
+46 Elo for RobboLito
Program Elo + - Games Score Av.Op. Draws
1 RobboLite 0.085d3 x64 : 3148 20 20 402 56.5 % 3102 65.2 %
2 Rybka 3 sp : 3102 20 20 402 43.5 % 3148 65.2 %
PGN available
"Well, I´m just a soul whose intentions are good,
Oh Lord, please don´t let me be misunderstood."
Oh Lord, please don´t let me be misunderstood."
-
- Posts: 317
- Joined: Mon Nov 02, 2009 12:05 am
- Location: Alabama
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
Hi Andre,
Nice results, which clearly seem to indicate that on your machine with your settings that Firebird is stronger. There is nothing that can dispute that. Elostat ratings or whatever you use, calculates the average ratings over a given number of games. The plus/minus difference the further you move away from the middle become more and more extreme and unlikely. For example in the next 120 games at your setting Rybka would have to win 33 and lose 14 in order for the ELO to be the same for both engines. Which is probably unlikely based on what you have experienced so far on your machine right?
If the variance is +/- 58 and your performance difference is 57 then you begin to start grasping at straws if you seriously think that the other program will miraculously turn the next 120 games upside down with a difference as you are showing.
Therefore the only question is does the 57 elo remain stable or are there fluctuations. The fact that Firebird is stronger on your machine with your settings is hard to argue against.
best regards
Nick
Nice results, which clearly seem to indicate that on your machine with your settings that Firebird is stronger. There is nothing that can dispute that. Elostat ratings or whatever you use, calculates the average ratings over a given number of games. The plus/minus difference the further you move away from the middle become more and more extreme and unlikely. For example in the next 120 games at your setting Rybka would have to win 33 and lose 14 in order for the ELO to be the same for both engines. Which is probably unlikely based on what you have experienced so far on your machine right?
If the variance is +/- 58 and your performance difference is 57 then you begin to start grasping at straws if you seriously think that the other program will miraculously turn the next 120 games upside down with a difference as you are showing.
Therefore the only question is does the 57 elo remain stable or are there fluctuations. The fact that Firebird is stronger on your machine with your settings is hard to argue against.
best regards
Nick
-
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
You don't understand. If A outperforms B by N-1 Elo and the margins are +/- N, then there is insufficient evidence to say thatSpacious_Mind wrote:Hi Andre,
Nice results, which clearly seem to indicate that on your machine with your settings that Firebird is stronger. There is nothing that can dispute that. Elostat ratings or whatever you use, calculates the average ratings over a given number of games. The plus/minus difference the further you move away from the middle become more and more extreme and unlikely. For example in the next 120 games at your setting Rybka would have to win 33 and lose 14 in order for the ELO to be the same. Which is probably unlikely based on what you have experienced so far on your machine right?
If the variance is +/- 58 and your performance difference is 57 then you begin to start grasping at straws if you seriously think that the other program will miraculously turn the next 120 games upside down with a difference as you are showing.
Therefore the only question is does the 57 elo remain stable or are there fluctuations. The fact that Firebird is stronger on your machine with your settings is hard to argue against.
best regards
Nick
A is better than B. A must outperform B by more than N Elo with margins at +/- N. With margins at 58, you must score 59 or better.
If your score is within the margins (even by a little) then you are within the fluctuation range for that number of games.
-
- Posts: 317
- Joined: Mon Nov 02, 2009 12:05 am
- Location: Alabama
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
Yes I fully understand. You are going from -1 to +115 therefore both -1 and +115 are unlikely extremes.CRoberson wrote:You don't understand. If A outperforms B by N-1 Elo and the margins are +/- N, then there is insufficient evidence to say thatSpacious_Mind wrote:Hi Andre,
Nice results, which clearly seem to indicate that on your machine with your settings that Firebird is stronger. There is nothing that can dispute that. Elostat ratings or whatever you use, calculates the average ratings over a given number of games. The plus/minus difference the further you move away from the middle become more and more extreme and unlikely. For example in the next 120 games at your setting Rybka would have to win 33 and lose 14 in order for the ELO to be the same. Which is probably unlikely based on what you have experienced so far on your machine right?
If the variance is +/- 58 and your performance difference is 57 then you begin to start grasping at straws if you seriously think that the other program will miraculously turn the next 120 games upside down with a difference as you are showing.
Therefore the only question is does the 57 elo remain stable or are there fluctuations. The fact that Firebird is stronger on your machine with your settings is hard to argue against.
best regards
Nick
A is better than B. A must outperform B by more than N Elo with margins at +/- N. With margins at 58, you must score 59 or better.
If your score is within the margins (even by a little) then you are within the fluctuation range for that number of games.
regards
Nick
Last edited by Spacious_Mind on Sun Jan 24, 2010 9:17 pm, edited 2 times in total.
-
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
Now, you are talking about Robbo instead of Firebird. ok.slobo wrote:I have new informations:CRoberson wrote:Andre wrote:Conditions: Win 2003 w32 / 2CPU / 128MB Hash / HS 8 Moves / 20mn+5s / 120 games
Firebird 1.0 - Rybka 3.0 : 69,5-50,5 (+33/=73/-14) 57,92%-42,08%
Interesting data. That suggests a 57 Elo improvement. However, the margins on 120 games may be around 58 Elo. So, insufficient data to
prove an improvement. I don't know the exact margins for 120 games. For 96 games it is 60 and for 200 games it is 42. From that, I guessed at 58 for 120.
Combined score after 402 games
227.0 - 175.0 in favor of RobboLito
+46 Elo for RobboLito
Program Elo + - Games Score Av.Op. Draws
1 RobboLite 0.085d3 x64 : 3148 20 20 402 56.5 % 3102 65.2 %
2 Rybka 3 sp : 3102 20 20 402 43.5 % 3148 65.2 %
PGN available
Two things come to mind with your data.
1) 46 Elo is a far cry from the 100 Elo you were claiming.
2) At 400 games the margins are +/- 30 Elo. So, Robbo is above the margins in this case. But, the question is how much better is it?
The answer is in the data. It could be 46 Elo stronger or as little as 16 Elo stronger or as much as 76 Elo stronger. The odds are that it
is far from the 100 Elo that you have been claiming in the past.
-
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
If you understand that, then why did you say that Firebird is clearly better and that it is unlikely for the other engine to turn the tables inSpacious_Mind wrote:Yes I fully understand. You are going from -1 to +115 therefore both -1 and +115 are unlikely extremes.CRoberson wrote:You don't understand. If A outperforms B by N-1 Elo and the margins are +/- N, then there is insufficient evidence to say thatSpacious_Mind wrote:Hi Andre,
Nice results, which clearly seem to indicate that on your machine with your settings that Firebird is stronger. There is nothing that can dispute that. Elostat ratings or whatever you use, calculates the average ratings over a given number of games. The plus/minus difference the further you move away from the middle become more and more extreme and unlikely. For example in the next 120 games at your setting Rybka would have to win 33 and lose 14 in order for the ELO to be the same. Which is probably unlikely based on what you have experienced so far on your machine right?
If the variance is +/- 58 and your performance difference is 57 then you begin to start grasping at straws if you seriously think that the other program will miraculously turn the next 120 games upside down with a difference as you are showing.
Therefore the only question is does the 57 elo remain stable or are there fluctuations. The fact that Firebird is stronger on your machine with your settings is hard to argue against.
best regards
Nick
A is better than B. A must outperform B by more than N Elo with margins at +/- N. With margins at 58, you must score 59 or better.
If your score is within the margins (even by a little) then you are within the fluctuation range for that number of games.
regards
Nick
the next 120 games? If the results are within the margins, then the results are within the fluctuation range and it is possible for the other
engine to turn the tables.
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
What is the confidence level attached with your 58 elo margin? Is it the commonly used 95% ?CRoberson wrote:If you understand that, then why did you say that Firebird is clearly better and that it is unlikely for the other engine to turn the tables inSpacious_Mind wrote:Yes I fully understand. You are going from -1 to +115 therefore both -1 and +115 are unlikely extremes.CRoberson wrote:You don't understand. If A outperforms B by N-1 Elo and the margins are +/- N, then there is insufficient evidence to say thatSpacious_Mind wrote:Hi Andre,
Nice results, which clearly seem to indicate that on your machine with your settings that Firebird is stronger. There is nothing that can dispute that. Elostat ratings or whatever you use, calculates the average ratings over a given number of games. The plus/minus difference the further you move away from the middle become more and more extreme and unlikely. For example in the next 120 games at your setting Rybka would have to win 33 and lose 14 in order for the ELO to be the same. Which is probably unlikely based on what you have experienced so far on your machine right?
If the variance is +/- 58 and your performance difference is 57 then you begin to start grasping at straws if you seriously think that the other program will miraculously turn the next 120 games upside down with a difference as you are showing.
Therefore the only question is does the 57 elo remain stable or are there fluctuations. The fact that Firebird is stronger on your machine with your settings is hard to argue against.
best regards
Nick
A is better than B. A must outperform B by more than N Elo with margins at +/- N. With margins at 58, you must score 59 or better.
If your score is within the margins (even by a little) then you are within the fluctuation range for that number of games.
regards
Nick
the next 120 games? If the results are within the margins, then the results are within the fluctuation range and it is possible for the other
engine to turn the tables.
If so, if he gets 57 elo, the probability that Robbolito is stronger should be around 93-94%.
There is no "proof" with perfect certainty using statistics. Even if he would get +100 elo, there would still be a very small probability that the engine is not stronger and that the results could be overturned.
As the elo gap increases, this probability becomes vanishingly small.
-
- Posts: 317
- Joined: Mon Nov 02, 2009 12:05 am
- Location: Alabama
Re: Another Firebird-Rybka match (120 games @ 20mn+5s)
First of all your 58 might not be right because I see other examples where I see 115 games as +/- 56.CRoberson wrote:If you understand that, then why did you say that Firebird is clearly better and that it is unlikely for the other engine to turn the tables inSpacious_Mind wrote:Yes I fully understand. You are going from -1 to +115 therefore both -1 and +115 are unlikely extremes.CRoberson wrote:You don't understand. If A outperforms B by N-1 Elo and the margins are +/- N, then there is insufficient evidence to say thatSpacious_Mind wrote:Hi Andre,
Nice results, which clearly seem to indicate that on your machine with your settings that Firebird is stronger. There is nothing that can dispute that. Elostat ratings or whatever you use, calculates the average ratings over a given number of games. The plus/minus difference the further you move away from the middle become more and more extreme and unlikely. For example in the next 120 games at your setting Rybka would have to win 33 and lose 14 in order for the ELO to be the same. Which is probably unlikely based on what you have experienced so far on your machine right?
If the variance is +/- 58 and your performance difference is 57 then you begin to start grasping at straws if you seriously think that the other program will miraculously turn the next 120 games upside down with a difference as you are showing.
Therefore the only question is does the 57 elo remain stable or are there fluctuations. The fact that Firebird is stronger on your machine with your settings is hard to argue against.
best regards
Nick
A is better than B. A must outperform B by more than N Elo with margins at +/- N. With margins at 58, you must score 59 or better.
If your score is within the margins (even by a little) then you are within the fluctuation range for that number of games.
regards
Nick
the next 120 games? If the results are within the margins, then the results are within the fluctuation range and it is possible for the other
engine to turn the tables.
But thats besides the point. You surely have to agree that -1 and +115 are too extreme and statistically unlikely so therefore the next 120 games will likely show similar results based on exactly the same settings and exactly the same computer whatever those are? Either that or we might as well throw the ELO systems through the window.
Therefore the next 120 games will more likely then not again show that Firebird is better under those EXACT same conditions.
I really don't care if it is Tom, Dick or Harry playing, it makes no difference to me. It's the impression that the other engine whichever it is will miraculously turn things around in the next 120 games which I find baffling.
best regards
Nick