Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xbits)

ernest · Post by **ernest** » Mon Nov 24, 2008 8:17 pm

1. Doubling the time

It is usually accepted that doubling the time for a program brings around +70 Elo.

I wanted to see if this is still true with Rybka3 - Rybka3 matches.

On my Intel Core2 Duo @3GHz, XP Pro x64, Fritz11 GUI,
Match of 400 games Rybka3 (4'+2") (called Rybka3 2x time), against Rybka3 (2'+1").
Of course 64-bit Rybkas, 512MB hash each, no ponder (2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg (both from "Permanent Brain", only the first 5 moves are from book). No learning.
Result for Rybka3 2xtime: 268.5 - 131.5 (67.125%) +162 -25 =213
This corresponds to a better performance of +132 Elo for 2xtime, quite a bit higher than the "usual" +70 Elo

Actually, Vasik Rajlich and Larry Kaufman have already mentioned this in the past, but perhaps not to this extent.

2. Doubling the number of cores

In the case of Rybka 3, going from single to dual core is +52 Elo in Blitz CCRL, +74 in Blitz CEGT

What do we get with Rybka3- Rybka3 matches?

Match of 400 games Rybka3 DualCore against Rybka3 SingleCore
Of course 64-bit Rybkas, time 2'+1", 512MB hash each, no ponder, all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 DualCore : 243 - 157 (60.75%) +126 -40 =234
This corresponds to a better performance of +78 Elo for DualCore over SingleCore.

Compared to the +132 Elo for double time, the +78 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.54
(using the logarithmic formula time vs Elo). This does not indicate a terrific scaling (should be at least x1.7).

3. Doubling the number of bits (64-bit vs. 32-bit)

In the case of Rybka 3, going from 32-bit to 64-bit version is +45 Elo in Blitz CCRL, and an incredible(?) +99 to +117 Elo in Blitz CEGT.
In infinite analysis on the start_position, Rybka3 64-bit is x1.72 faster than Rybka3 32-bit.
(note: for Rybka 232a, the ratio was x1.85)

What do we get with Rybka3- Rybka3 matches?

Match of 400 games Rybka3 64-bit against Rybka3 32-bit
Time 2'+1", 512MB hash each, no ponder(2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 64-bit : 251.5 - 148.5 (62.875%) +154 -51 =195
This corresponds to a better performance of +94 Elo for 64-bit over 32-bit.

Compared to the +132 Elo for double time, the +94 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.68
(using the logarithmic formula time vs Elo), not very far from the x1.72 of infinite analysis in the start_position.

Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.

krazyken · Post by **krazyken** » Mon Nov 24, 2008 8:43 pm

interesting.

You did use very short time for your tests, I'll bet that with increased time the gain will taper off.

bob · Post by **bob** » Mon Nov 24, 2008 9:28 pm

ernest wrote:1. Doubling the time

It is usually accepted that doubling the time for a program brings around +70 Elo.

I wanted to see if this is still true with Rybka3 - Rybka3 matches.

On my Intel Core2 Duo @3GHz, XP Pro x64, Fritz11 GUI,
Match of 400 games Rybka3 (4'+2") (called Rybka3 2x time), against Rybka3 (2'+1").
Of course 64-bit Rybkas, 512MB hash each, no ponder (2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg (both from "Permanent Brain", only the first 5 moves are from book). No learning.
Result for Rybka3 2xtime: 268.5 - 131.5 (67.125%) +162 -25 =213
This corresponds to a better performance of +132 Elo for 2xtime, quite a bit higher than the "usual" +70 Elo

Actually, Vasik Rajlich and Larry Kaufman have already mentioned this in the past, but perhaps not to this extent.

2. Doubling the number of cores

In the case of Rybka 3, going from single to dual core is +52 Elo in Blitz CCRL, +74 in Blitz CEGT

What do we get with Rybka3- Rybka3 matches?

Match of 400 games Rybka3 DualCore against Rybka3 SingleCore
Of course 64-bit Rybkas, time 2'+1", 512MB hash each, no ponder, all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 DualCore : 243 - 157 (60.75%) +126 -40 =234
This corresponds to a better performance of +78 Elo for DualCore over SingleCore.

Compared to the +132 Elo for double time, the +78 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.54
(using the logarithmic formula time vs Elo). This does not indicate a terrific scaling (should be at least x1.7).

3. Doubling the number of bits (64-bit vs. 32-bit)

In the case of Rybka 3, going from 32-bit to 64-bit version is +45 Elo in Blitz CCRL, and an incredible(?) +99 to +117 Elo in Blitz CEGT.
In infinite analysis on the start_position, Rybka3 64-bit is x1.72 faster than Rybka3 32-bit.
(note: for Rybka 232a, the ratio was x1.85)

What do we get with Rybka3- Rybka3 matches?

Match of 400 games Rybka3 64-bit against Rybka3 32-bit
Time 2'+1", 512MB hash each, no ponder(2 cores each), all 5-men TB.
Opening book: 200 games with 5moves_0.ctg, and 200 games with the more recent PB5moves.ctg. No learning.
Result for Rybka3 64-bit : 251.5 - 148.5 (62.875%) +154 -51 =195
This corresponds to a better performance of +94 Elo for 64-bit over 32-bit.

Compared to the +132 Elo for double time, the +94 Elo advantage of DualCore over SingleCore is equivalent to a time advantage of x1.68
(using the logarithmic formula time vs Elo), not very far from the x1.72 of infinite analysis in the start_position.

Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.

Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.

I am sometimes tempted to abort a test after a few hundred games because the new version looks bad, but after waiting, it will slowly climb and turn out to be better...

George Tsavdaris · Post by **George Tsavdaris** » Mon Nov 24, 2008 10:31 pm

bob wrote:
ernest wrote: Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.
Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.

Yes but i have also tons of results where after 30 000 games, A is better than B by 50 ELO, but after 200 000 games, B is better than A by 30 ELO.

My point/question is where to stop? After how many games?
I know it actually depends on the range of error bars of the calculated average ELO value but how much should be these bars in order to be satisfying?
Obviously it's a matter of someone's personal taste and a 50±0.2 ELO is enough for some, while for others this has to be 50±0.02 etc....

bob · Post by **bob** » Mon Nov 24, 2008 11:01 pm

George Tsavdaris wrote:
bob wrote:
ernest wrote: Conclusion: the results of Rybka3 -Rybka3 games, with different parameters, are quite consistent with the usual Rybka -non Rybka tests, but the Elo scale of the differences has to be amost doubled.
Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.
Yes but i have also tons of results where after 30 000 games, A is better than B by 50 ELO, but after 200 000 games, B is better than A by 30 ELO.

My point/question is where to stop? After how many games?
I know it actually depends on the range of error bars of the calculated average ELO value but how much should be these bars in order to be satisfying?
Obviously it's a matter of someone's personal taste and a 50±0.2 ELO is enough for some, while for others this has to be 50±0.02 etc....

That result would be impossible. By the time you get to 40,000 games, you are at +/- 4 Elo.

But to answer your question, I would not stop until the error bar has no effect on the comparison. If the two ratings are 70 apart, you need enough games so that the error bar won't indicate that the confidence that A is better than B is very low...

Otherwise, might as well flip a coin...

ernest · Post by **ernest** » Tue Nov 25, 2008 7:46 pm

krazyken wrote:You did use very short time for your tests

Well, even with 2'+1" (and 4'+2" for 2xtime), this used 12 computer nights (100 games/night)

ernest · Post by **ernest** » Thu Nov 27, 2008 1:46 am

bob wrote:Or you simply need more games. I have tons of results where after 500 games, A is better than B by 50 Elo. But after 30,000 games, B is better by 30.

Well, still... 400 games with 50% draws gives a standard deviation of 7, or 1.75%. This gives only a ± 20 Elo error within 90% probability, so the results are not invalidated.

Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xbits)

Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xbits)

Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb

Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb

Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb

Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb

Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb

Re: Experimenting with Ryb3-Ryb3 games (2xtime, 2xcores, 2xb