1. Doubling the time
It is usually accepted that doubling the time for a program brings around +70 Elo.
I wanted to see if this is still true with Rybka3 - Rybka3 matches.
On my Intel Core2 Duo @3GHz, XP Pro x64, Fritz11 GUI,
Match of 400 games Rybka3 (4'+2") (called Rybka3 2x time), against Rybka3 (2'+1").
Of course 64-bit Rybkas, 512MB hash each, no ponder (2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg (both from "Permanent Brain", only the first 5 moves are from book). No learning.
Result for Rybka3 2xtime: 268.5 - 131.5 (67.125%) +162 -25 =213
This corresponds to a better performance of +132 Elo for 2xtime, quite a bit higher than the "usual" +70 Elo
Actually, Vasik Rajlich and Larry Kaufman have already mentioned this in the past, but perhaps not to this extent.
2. Doubling the number of cores
In the case of Rybka 3, going from single to dual core is +52 Elo in Blitz CCRL, +74 in Blitz CEGT
What do we get with Rybka3- Rybka3 matches?
Match of 400 games Rybka3 DualCore against Rybka3 SingleCore
Of course 64-bit Rybkas, time 2'+1", 512MB hash each, no ponder, all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 DualCore : 243 - 157 (60.75%) +126 -40 =234
This corresponds to a better performance of +78 Elo for DualCore over SingleCore.
Compared to the +132 Elo for double time, the +78 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.54
(using the logarithmic formula time vs Elo). This does not indicate a terrific scaling (should be at least x1.7).
3. Doubling the number of bits (64-bit vs. 32-bit)
In the case of Rybka 3, going from 32-bit to 64-bit version is +45 Elo in Blitz CCRL, and an incredible(?) +99 to +117 Elo in Blitz CEGT.
In infinite analysis on the start_position, Rybka3 64-bit is x1.72 faster than Rybka3 32-bit.
(note: for Rybka 232a, the ratio was x1.85)
What do we get with Rybka3- Rybka3 matches?
Match of 400 games Rybka3 64-bit against Rybka3 32-bit
Time 2'+1", 512MB hash each, no ponder(2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 64-bit : 251.5 - 148.5 (62.875%) +154 -51 =195
This corresponds to a better performance of +94 Elo for 64-bit over 32-bit.
Compared to the +132 Elo for double time, the +94 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.68
(using the logarithmic formula time vs Elo), not very far from the x1.72 of infinite analysis in the start_position.
Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.
Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xbits)
Moderator: Ras
-
- Posts: 2053
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb
interesting.
You did use very short time for your tests, I'll bet that with increased time the gain will taper off.
You did use very short time for your tests, I'll bet that with increased time the gain will taper off.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb
Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.ernest wrote:1. Doubling the time
It is usually accepted that doubling the time for a program brings around +70 Elo.
I wanted to see if this is still true with Rybka3 - Rybka3 matches.
On my Intel Core2 Duo @3GHz, XP Pro x64, Fritz11 GUI,
Match of 400 games Rybka3 (4'+2") (called Rybka3 2x time), against Rybka3 (2'+1").
Of course 64-bit Rybkas, 512MB hash each, no ponder (2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg (both from "Permanent Brain", only the first 5 moves are from book). No learning.
Result for Rybka3 2xtime: 268.5 - 131.5 (67.125%) +162 -25 =213
This corresponds to a better performance of +132 Elo for 2xtime, quite a bit higher than the "usual" +70 Elo
Actually, Vasik Rajlich and Larry Kaufman have already mentioned this in the past, but perhaps not to this extent.
2. Doubling the number of cores
In the case of Rybka 3, going from single to dual core is +52 Elo in Blitz CCRL, +74 in Blitz CEGT
What do we get with Rybka3- Rybka3 matches?
Match of 400 games Rybka3 DualCore against Rybka3 SingleCore
Of course 64-bit Rybkas, time 2'+1", 512MB hash each, no ponder, all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 DualCore : 243 - 157 (60.75%) +126 -40 =234
This corresponds to a better performance of +78 Elo for DualCore over SingleCore.
Compared to the +132 Elo for double time, the +78 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.54
(using the logarithmic formula time vs Elo). This does not indicate a terrific scaling (should be at least x1.7).
3. Doubling the number of bits (64-bit vs. 32-bit)
In the case of Rybka 3, going from 32-bit to 64-bit version is +45 Elo in Blitz CCRL, and an incredible(?) +99 to +117 Elo in Blitz CEGT.
In infinite analysis on the start_position, Rybka3 64-bit is x1.72 faster than Rybka3 32-bit.
(note: for Rybka 232a, the ratio was x1.85)
What do we get with Rybka3- Rybka3 matches?
Match of 400 games Rybka3 64-bit against Rybka3 32-bit
Time 2'+1", 512MB hash each, no ponder(2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 64-bit : 251.5 - 148.5 (62.875%) +154 -51 =195
This corresponds to a better performance of +94 Elo for 64-bit over 32-bit.
Compared to the +132 Elo for double time, the +94 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.68
(using the logarithmic formula time vs Elo), not very far from the x1.72 of infinite analysis in the start_position.
Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.
I am sometimes tempted to abort a test after a few hundred games because the new version looks bad, but after waiting, it will slowly climb and turn out to be better...
-
- Posts: 1627
- Joined: Thu Mar 09, 2006 12:35 pm
Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb
Yes but i have also tons of results where after 30 000 games, A is better than B by 50 ELO, but after 200 000 games, B is better than A by 30 ELO.bob wrote:Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.ernest wrote: Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.

My point/question is where to stop? After how many games?
I know it actually depends on the range of error bars of the calculated average ELO value but how much should be these bars in order to be satisfying?
Obviously it's a matter of someone's personal taste and a 50±0.2 ELO is enough for some, while for others this has to be 50±0.02 etc....
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
"Is it a boy or girl?"
YES! He replied.....
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb
That result would be impossible. By the time you get to 40,000 games, you are at +/- 4 Elo.George Tsavdaris wrote:Yes but i have also tons of results where after 30 000 games, A is better than B by 50 ELO, but after 200 000 games, B is better than A by 30 ELO.bob wrote:Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.ernest wrote: Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.
My point/question is where to stop? After how many games?
I know it actually depends on the range of error bars of the calculated average ELO value but how much should be these bars in order to be satisfying?
Obviously it's a matter of someone's personal taste and a 50±0.2 ELO is enough for some, while for others this has to be 50±0.02 etc....

But to answer your question, I would not stop until the error bar has no effect on the comparison. If the two ratings are 70 apart, you need enough games so that the error bar won't indicate that the confidence that A is better than B is very low...
Otherwise, might as well flip a coin...
-
- Posts: 2053
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb
Well, even with 2'+1" (and 4'+2" for 2xtime), this used 12 computer nights (100 games/night)krazyken wrote:You did use very short time for your tests

-
- Posts: 2053
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb
Well, still... 400 games with 50% draws gives a standard deviation of 7, or 1.75%. This gives only a ± 20 Elo error within 90% probability, so the results are not invalidated.bob wrote:Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.
