I found this test between Deep Rybka 4.1 x64 and Deep Rybka 4.1 (960) x64 at Chess2U Forum:

http://www.chess2u.com/t5502-the-tennis ... iria#31681

I think that this Tennison is the same Ben Tennison of Talkchess. Here is the test:

Code: Select all

```
Games Completed = 4000 of 4000 (Avg game length = 6.667 sec)
Settings = RR/32MB/Book/[b]500ms+50ms[/b]/M 1000cp for 12 moves, D 150 moves/EPD:[b]openings.epd[/b](4000)
1. Deep Rybka 4.1 1953.5/4000 1062-1155-1783 (L: m=571 t=1 i=0 a=583) (D: r=1312 i=281 f=133 s=10 a=47) (tpm=52.2 d=8.8 nps=127344)
2. Deep Rybka 4.1 960 2046.5/4000 1155-1062-1783 (L: m=483 t=0 i=0 a=579) (D: r=1312 i=281 f=133 s=10 a=47) (tpm=52.3 d=8.8 nps=129394)
```

First of all: I am not an expert in tests. The output is clearly from LittleBlitzer and the used EPD seems '4000 openings' by Bob Hyatt IIRC. A good question is ask for the number of cores/threads that each engine used, and also the hardware.Deep Rybka 4.1 960 scores 51,16 %.

Is this only a statistical margin ?

Is this a strength difference ?

Is this a little error margin in the opening book (openings.epd) ?

...

Have a nice debate ...

AFAIK, the only difference between theTest 001 :

How is playing Deep Rybka 4.1 (x64) against Deep Rybka 4.1 960 (x64)? Is there a difference ?

*standard*version and the

*960*one is that the latter is able to play Chess960, aka FRC (Fischer Random Chess), while the first not. So, I am a bit surprised about the speed:

Code: Select all

```
127344 nps ~ 129394 nps - 1.58%.
129394 nps ~ 127344 nps + 1.61%.
```

**a)**

*Is this only a statistical margin?*

According with my math, the results are inside the statistical margin. Writing some numbers with roundings after work with many decimals (hoping no typos in my calculations done with a Casio calculator):

Code: Select all

```
(Referred to non-960 version):
n = 4000 games (+1062 -1155 = 1783)
(Rating difference) = 400·log(1953.5/2046.5) ~ -8.08
(Standard deviation or sigma) = sqrt{(1/4000)·[(1953.5) · (2046.5)/(4000)² - (1783)/(4000 · 4)]} ~ 0.005883 ~ 0.5883%
2-sigma confidence ~ 95.45% confidence (an usual value):
2n·sigma ~ 2 · 4000 · 0.005883 ~ 47.0621
(Lower bound of the rating difference) = 400·log[(1953.5 - 47.0621)/(2046.5 + 47.0621)] ~ -16.27
(Upper bound of the rating difference) = 400·log[(1953.5 + 47.0621)/(2046.5 - 47.0621)] ~ +0.1
(2-sigma confidence interval for rating difference) ~ ]-16.27, +0.1[
```

**b)**

*Is this a strength difference?*

With my limited knowledge on Statistics, I would say that there is not an

*easily measurable*difference, even with 4000 games; I suppose that there is not any kind of bias in this test. If I have to chose, I would bet NO regarding strength difference (other than statistical uncertainties).

**c)**

*Is this a little error margin in the opening book (openings.epd)?*

I do not fully understand the question, but I suspect that this EPD file is very balanced and therefore very trustable. Of course, people with more knowledge than me can answer better to this question.

Other comments:

· The time control is very short from my POV although I have not any problem with it. One lose by non-960 version and no loses by illegal moves... not bad.

· The number of loses by adjudication is very high (more than the half for each engine). I say very high because I usually get 0 loses by adjudication in my few, short and clumsy tests, but I set 'M 777777 cp for 7 moves' instead 'M 1000 cp for 12 moves' (which is the default setting). I do not know if changing this leads to more/less lose adjudications.

· The draw statistics seem very normal from my unexperienced POV. Very logical the split for threefold repetition, insufficient material, the fifty-move rule, stalemate and adjudication (with the condition 'D 150 moves', the default one).

Any comments, corrections... are welcome, as usual.

Regards from Spain.

Ajedrecista.