Something wrong with the testing at CEGT?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
bigo

Something wrong with the testing at CEGT?

Post by bigo » Wed Aug 15, 2007 7:20 pm

I find it rather strange that Deepfritz 10 is rated more then 30 elo when running on two CPUs as opposed to 4 CPUs? Something is got to be wrong this doesn't make sense, anyone with an explanation?

User avatar
Werner
Posts: 2593
Joined: Wed Mar 08, 2006 9:09 pm

Re: Something wrong with the testing at CEGT?

Post by Werner » Thu Aug 16, 2007 3:40 pm

Very "nice" headline :evil:

What if there is something wrong with the 4CPU implementation inside Deep Fritz 10?
Werner

User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 7:44 pm
Location: Amman,Jordan

Re: Something wrong with the testing at CEGT?

Post by Dr.Wael Deeb » Thu Aug 16, 2007 3:56 pm

Werner wrote:Very "nice" headline :evil:

What if there is something wrong with the 4CPU implementation inside Deep Fritz 10?
Actauly,there are a lot of engines which perform better using 2 cpu's and CEGT is not guilty at all,this phenomena is observed in other rating lists....
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….

Wolfgang
Posts: 370
Joined: Fri May 12, 2006 11:08 pm

Re: Something wrong with the testing at CEGT?

Post by Wolfgang » Fri Aug 17, 2007 9:26 am

bigo wrote:I find it rather strange that Deepfritz 10 is rated more then 30 elo when running on two CPUs as opposed to 4 CPUs? Something is got to be wrong this doesn't make sense, anyone with an explanation?
this has nothing to do with CEGT or other rating lists.

Btw. CCRL has the same phenomenon, difference is +25 in f/o the 2CPU-version in their 40/40-list. So I agree with Werner that (maybe) implementation of 4CPU-support is suboptmial. I agree with you that this is not "normal", others like Naum e.g. gain more benifit from 4CPU.

Would be nice if you could inform yourself before using such weird headlines
Best
Wolfgang
CEGT-Team

Steve B

Re: Something wrong with the testing at CEGT?

Post by Steve B » Fri Aug 17, 2007 9:40 am

Wolfgang wrote:
bigo wrote:I find it rather strange that Deepfritz 10 is rated more then 30 elo when running on two CPUs as opposed to 4 CPUs? Something is got to be wrong this doesn't make sense, anyone with an explanation?
this has nothing to do with CEGT or other rating lists.

Btw. CCRL has the same phenomenon, difference is +25 in f/o the 2CPU-version in their 40/40-list. So I agree with Werner that (maybe) implementation of 4CPU-support is suboptmial. I agree with you that this is not "normal", others like Naum e.g. gain more benifit from 4CPU.

Would be nice if you could inform yourself before using such weird headlines
hi Wolfgang

can i ask ,,,why do i not see many more posts from the CEGT group here as i see from the CCRL group?

i have never followed the goings on between the different testing groups before but as i have a slight almost infinitesimal chance of becoming a mod here i would like to ask that question

any particular reason for that or is it that you guys simply don't post often?

Best Regards
Steve

bob
Posts: 20923
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Something wrong with the testing at CEGT?

Post by bob » Fri Aug 17, 2007 3:23 pm

Dr.Wael Deeb wrote:
Werner wrote:Very "nice" headline :evil:

What if there is something wrong with the 4CPU implementation inside Deep Fritz 10?
Actauly,there are a lot of engines which perform better using 2 cpu's and CEGT is not guilty at all,this phenomena is observed in other rating lists....
It speaks to some testing issues. I can't imagine why a program on 4 cpus would be _weaker_ than on 2. I can easily imagine that a program on 4 would be no stronger than on 2 (or even 1) if the parallel search is not effective. But playing worse makes no sense. I've never personally observed that except in simply horrible parallel searches, which I somehow doubt Frans did in Fritz.

bigo

Re: Something wrong with the testing at CEGT?

Post by bigo » Fri Aug 17, 2007 11:46 pm

bob wrote:
Dr.Wael Deeb wrote:
Werner wrote:Very "nice" headline :evil:

What if there is something wrong with the 4CPU implementation inside Deep Fritz 10?
Actauly,there are a lot of engines which perform better using 2 cpu's and CEGT is not guilty at all,this phenomena is observed in other rating lists....
It speaks to some testing issues. I can't imagine why a program on 4 cpus would be _weaker_ than on 2. I can easily imagine that a program on 4 would be no stronger than on 2 (or even 1) if the parallel search is not effective. But playing worse makes no sense. I've never personally observed that except in simply horrible parallel searches, which I somehow doubt Frans did in Fritz.

Thanks Dr Hyaat!


I never meant the question as an insult to Cegt , seems these guys are Hypersensitive. Just seeking out an educated explanation of how this could happen.

Uri Blass
Posts: 8940
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Something wrong with the testing at CEGT?

Post by Uri Blass » Fri Aug 17, 2007 11:59 pm

bob wrote:
Dr.Wael Deeb wrote:
Werner wrote:Very "nice" headline :evil:

What if there is something wrong with the 4CPU implementation inside Deep Fritz 10?
Actauly,there are a lot of engines which perform better using 2 cpu's and CEGT is not guilty at all,this phenomena is observed in other rating lists....
It speaks to some testing issues. I can't imagine why a program on 4 cpus would be _weaker_ than on 2. I can easily imagine that a program on 4 would be no stronger than on 2 (or even 1) if the parallel search is not effective. But playing worse makes no sense. I've never personally observed that except in simply horrible parallel searches, which I somehow doubt Frans did in Fritz.
both cegt and ccrl found the same thing.
testing was done with ponder off.
I do not know about testing with ponder on and I guess that Frans did not care much about 4 cpu because only small minority of the customers use 4 cpu and majority use 1 cpu or 2 cpu.

Uri

bob
Posts: 20923
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Something wrong with the testing at CEGT?

Post by bob » Sat Aug 18, 2007 1:18 am

Uri Blass wrote:
bob wrote:
Dr.Wael Deeb wrote:
Werner wrote:Very "nice" headline :evil:

What if there is something wrong with the 4CPU implementation inside Deep Fritz 10?
Actauly,there are a lot of engines which perform better using 2 cpu's and CEGT is not guilty at all,this phenomena is observed in other rating lists....
It speaks to some testing issues. I can't imagine why a program on 4 cpus would be _weaker_ than on 2. I can easily imagine that a program on 4 would be no stronger than on 2 (or even 1) if the parallel search is not effective. But playing worse makes no sense. I've never personally observed that except in simply horrible parallel searches, which I somehow doubt Frans did in Fritz.
both cegt and ccrl found the same thing.
testing was done with ponder off.
I do not know about testing with ponder on and I guess that Frans did not care much about 4 cpu because only small minority of the customers use 4 cpu and majority use 1 cpu or 2 cpu.

Uri
The problem with that is that it would be _very_ difficult to write a parallel search that plays better with two processors than with one, but then plays worse with 4 processors than with two. It would not be that hard to write one that plays no better with 4 than with 2. But worse? boggles the mind...

Dann Corbit
Posts: 11978
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Something wrong with the testing at CEGT?

Post by Dann Corbit » Sat Aug 18, 2007 3:49 am

bigo wrote:I find it rather strange that Deepfritz 10 is rated more then 30 elo when running on two CPUs as opposed to 4 CPUs? Something is got to be wrong this doesn't make sense, anyone with an explanation?
Deep Fritz 10 2CPU 2892 11 11 2270 55.1 % 2857 36.5 %
Deep Fritz 10 4CPU 2853 23 23 573 40.5 % 2920 34.6 %
-39 Elo with a window of +/- 34 Elo. Probably a bug of some kind (in testing, in the program, or where-ever)

Deep Shredder 10 x64 2CPU 2859 11 11 2510 50.7 % 2854 38.3 %
Deep Shredder 10 x64 4CPU 2884 21 21 618 45.1 % 2918 42.6 %
+25 Elo with a window of +/- 32 Elo. It is uncertain that 4 CPUs is stronger than 2 CPUs, but more probable that it is.

Hiarcs 11.1 2CPU 2874 15 15 1224 48.1 % 2887 41.0 %
Hiarcs 11.1 4CPU 2912 24 24 468 47.8 % 2928 43.8 %
+38 Elo with a window of +/- 39 Elo. It is uncertain that 4 CPUs is stronger than 2 CPUs, but more probable that it is.

Loop 13.6 x64 2CPU 2834 20 20 684 47.4 % 2852 41.7 %
Loop 13.6 x64 4CPU 2888 31 31 300 39.7 % 2961 39.3 %
+54 Elo with a window of +/- 51 Elo. There is a strong chance that Loop is stronger with 4 CPUs than with 2 CPUs (given 2 standard deviations).

Loop M1-P 2CPU 2858 19 19 736 46.1 % 2885 42.9 %
Loop M1-P 4CPU 2891 28 28 373 41.3 % 2952 37.0 %
+33 Elo with a window of +/- 47 Elo. It is uncertain that 4 CPUs is stronger than 2 CPUs, but more probable that it is.

Loop M1-T 2CPU 2863 17 17 952 50.6 % 2859 41.0 %
Loop M1-T 4CPU 2876 28 28 364 39.3 % 2951 41.2 %
+13 Elo with a window of +/- 45 Elo. It is uncertain that 4 CPUs is stronger than 2 CPUs.

Naum 2.1 x64 2CPU 2840 12 12 1941 48.6 % 2850 43.3 %
Naum 2.1 x64 4CPU 2912 27 27 350 44.7 % 2949 44.9 %
+72 Elo with a window of +/- 39 Elo. There is a very strong chance that Naum 2.1 x64 is stronger with 4 CPUs than with 2 CPUs (given 2 standard deviations).

Naum 2.2 x64 2CPU 2880 16 16 984 52.3 % 2864 42.6 %
Naum 2.2 x64 4CPU 2937 25 26 384 48.7 % 2946 46.4 %
+57 Elo with a window of +/- 41 Elo. There is a strong chance that Naum 2.2 x64 is stronger with 4 CPUs than with 2 CPUs (given 2 standard deviations).

Zap!Chess Paderborn x64 2CPU 2839 12 12 1917 50.2 % 2838 40.6 %
Zap!Chess Paderborn x64 4CPU 2886 30 30 312 40.7 % 2951 39.1 %
+57 Elo with a window of +/- 41 Elo. There is a strong chance that Zap!Chess Paderborn x64 is stronger with 4 CPUs than with 2 CPUs (given 2 standard deviations).


Zap!Chess Zanzibar x64 2CPU 2930 13 13 1694 56.9 % 2881 42.7 %
Zap!Chess Zanzibar x64 4CPU 2994 27 27 339 56.2 % 2950 46.3 %
+64 Elo with a window of +/- 40 Elo. There is a very strong chance that Zap!Chess Zanzibar x64 is stronger with 4 CPUs than with 2 CPUs (given 2 standard deviations).

Overall, these results are not unexpected in the least. In every case but one, there is not any indication that 4 CPUs is weaker than 2 CPUs. The average gain seems to be about 50, just like usual for 2x power increase.

I think that the most likely reason for Deep Fritz's result is a bug in Deep Fritz.

Post Reply