ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The Tennison's Deliria ...
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Tournaments and Matches Flat
View previous topic :: View next topic  
Author Message
Jesús Muñoz



Joined: 13 Jul 2011
Posts: 1021
Location: Madrid, Spain.

PostPost subject: The Tennison's Deliria ...    Posted: Thu Mar 01, 2012 5:16 pm Reply to topic Reply with quote

Hello:

I found this test between Deep Rybka 4.1 x64 and Deep Rybka 4.1 (960) x64 at Chess2U Forum:

http://www.chess2u.com/t5502-the-tennison-s-deliria#31681

I think that this Tennison is the same Ben Tennison of Talkchess. Here is the test:

Code:
Games Completed = 4000 of 4000 (Avg game length = 6.667 sec)
Settings = RR/32MB/Book/[b]500ms+50ms[/b]/M 1000cp for 12 moves, D 150 moves/EPD:[b]openings.epd[/b](4000)

 1.   Deep Rybka 4.1             1953.5/4000   1062-1155-1783     (L: m=571 t=1 i=0 a=583)   (D: r=1312 i=281 f=133 s=10 a=47)   (tpm=52.2 d=8.8 nps=127344)

 2.   Deep Rybka 4.1 960         2046.5/4000   1155-1062-1783     (L: m=483 t=0 i=0 a=579)   (D: r=1312 i=281 f=133 s=10 a=47)   (tpm=52.3 d=8.8 nps=129394)


Quote:
Deep Rybka 4.1 960 scores 51,16 %.

Is this only a statistical margin ?
Is this a strength difference ?
Is this a little error margin in the opening book (openings.epd) ?
...

Have a nice debate ...


First of all: I am not an expert in tests. The output is clearly from LittleBlitzer and the used EPD seems '4000 openings' by Bob Hyatt IIRC. A good question is ask for the number of cores/threads that each engine used, and also the hardware.

Quote:
Test 001 :

How is playing Deep Rybka 4.1 (x64) against Deep Rybka 4.1 960 (x64)? Is there a difference ?


AFAIK, the only difference between the standard version and the 960 one is that the latter is able to play Chess960, aka FRC (Fischer Random Chess), while the first not. So, I am a bit surprised about the speed:

Code:
127344 nps ~ 129394 nps - 1.58%.
129394 nps ~ 127344 nps + 1.61%.


These are small differences, but I expected even less. Anyway, it does not seem exaggerated. I will try to answer some questions asked by Tennison:

a) Is this only a statistical margin?

According with my math, the results are inside the statistical margin. Writing some numbers with roundings after work with many decimals (hoping no typos in my calculations done with a Casio calculator):

Code:
(Referred to non-960 version):
n = 4000 games (+1062 -1155 = 1783)

(Rating difference) = 400·log(1953.5/2046.5) ~ -8.08
(Standard deviation or sigma) = sqrt{(1/4000)·[(1953.5) · (2046.5)/(4000)² - (1783)/(4000 · 4)]} ~ 0.005883 ~ 0.5883%

2-sigma confidence ~ 95.45% confidence (an usual value):
2n·sigma ~ 2 · 4000 · 0.005883 ~ 47.0621

(Lower bound of the rating difference) = 400·log[(1953.5 - 47.0621)/(2046.5 + 47.0621)] ~ -16.27
(Upper bound of the rating difference) = 400·log[(1953.5 + 47.0621)/(2046.5 - 47.0621)] ~  +0.1

(2-sigma confidence interval for rating difference) ~ ]-16.27, +0.1[


So, pretty equal; with my results (~ 95.45% confidence) Deep Rybka 4.1 should score between ~ 47.66% (-16.27 Elo) and ~ 50.01% (+0.1 Elo) against Deep Rybka 4.1 (960) x64 under the conditions of the test. It looks normal for me.

b) Is this a strength difference?

With my limited knowledge on Statistics, I would say that there is not an easily measurable difference, even with 4000 games; I suppose that there is not any kind of bias in this test. If I have to chose, I would bet NO regarding strength difference (other than statistical uncertainties).

c) Is this a little error margin in the opening book (openings.epd)?

I do not fully understand the question, but I suspect that this EPD file is very balanced and therefore very trustable. Of course, people with more knowledge than me can answer better to this question.

Other comments:

· The time control is very short from my POV although I have not any problem with it. One lose by non-960 version and no loses by illegal moves... not bad.

· The number of loses by adjudication is very high (more than the half for each engine). I say very high because I usually get 0 loses by adjudication in my few, short and clumsy tests, but I set 'M 777777 cp for 7 moves' instead 'M 1000 cp for 12 moves' (which is the default setting). I do not know if changing this leads to more/less lose adjudications.

· The draw statistics seem very normal from my unexperienced POV. Very logical the split for threefold repetition, insufficient material, the fifty-move rule, stalemate and adjudication (with the condition 'D 150 moves', the default one).

Any comments, corrections... are welcome, as usual.

Regards from Spain.

Ajedrecista.
_________________
Six Fortran 95 tools.

Online Checkers Library.

Chess will never be solved.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Subject Author Date/Time
The Tennison's Deliria ... Jesús Muñoz Thu Mar 01, 2012 5:16 pm
      Re: The Tennison's Deliria ... Adam Hair Thu Mar 01, 2012 6:27 pm
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Tournaments and Matches

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads