4'+2" UPDATE: Loop 13.5 is here... matches underway

Erik Roggenburg · Post by **Erik Roggenburg** » Sat Apr 07, 2007 11:05 pm

So far, Loop 13.5.32 2CPU vs. Toga II 1.2.1a is as follows:
+24/-7/=16, 68%

I'm really interested in seeing where Loop stands when the dust settles.

Will Loop capture the #2 spot, or will HIARCS 11.1 MP UCI hang on to it?

Erik Roggenburg · Post by **Erik Roggenburg** » Sun Apr 08, 2007 4:34 am

Code: Select all

2 Loop 13.5.32 2CPU         &#58; 2729  1100 (+521,=367,-212&#41;, 64.0 %

Deep Shredder 10 UCI          &#58; 100 (+ 42,= 38,- 20&#41;, 61.0 %
Fruit 2.2.1                   &#58; 100 (+ 50,= 41,-  9&#41;, 70.5 %
Glaurung 1.2 SMP              &#58; 100 (+ 57,= 25,- 18&#41;, 69.5 %
Deep Fritz 10                 &#58; 100 (+ 34,= 36,- 30&#41;, 52.0 %
Deep Pharaon 3.5.1            &#58; 100 (+ 60,= 28,- 12&#41;, 74.0 %
Deep Frenzee 3.0              &#58; 100 (+ 71,= 17,- 12&#41;, 79.5 %
Chess Tiger 2007 UCI          &#58; 100 (+ 59,= 34,-  7&#41;, 76.0 %
Deep Junior 10.1              &#58; 100 (+ 46,= 30,- 24&#41;, 61.0 %
Scorpio 1.91 2CPU             &#58; 100 (+ 65,= 31,-  4&#41;, 80.5 %
HIARCS 11.1 MP UCI            &#58; 100 (+ 20,= 47,- 33&#41;, 43.5 %
Rybka 2.3.1 mp 32-bit         &#58; 100 (+ 17,= 40,- 43&#41;, 37.0 %

Code: Select all

5 LoopMP 12.32 2CPU         &#58; 2706  920 (+373,=338,-209&#41;, 58.9 %

Deep Shredder 10 UCI          &#58;  60 (+ 24,= 16,- 20&#41;, 53.3 %
Fruit 2.2.1                   &#58;  60 (+ 21,= 34,-  5&#41;, 63.3 %
Spike 1.2 Turin               &#58;  60 (+ 26,= 23,- 11&#41;, 62.5 %
Toga II 1.2.1a                &#58;  60 (+ 23,= 30,-  7&#41;, 63.3 %
Glaurung 1.2 SMP              &#58;  60 (+ 34,= 19,-  7&#41;, 72.5 %
Deep Fritz 10                 &#58;  60 (+ 17,= 23,- 20&#41;, 47.5 %
Deep Pharaon 3.5.1            &#58;  60 (+ 43,= 13,-  4&#41;, 82.5 %
Deep Frenzee 3.0              &#58;  60 (+ 42,=  9,-  9&#41;, 77.5 %
Chess Tiger 2007 UCI          &#58;  60 (+ 34,= 21,-  5&#41;, 74.2 %
Deep Junior 10.1              &#58;  60 (+ 22,= 21,- 17&#41;, 54.2 %
Naum 2.1 MP                   &#58;  60 (+ 21,= 27,- 12&#41;, 57.5 %
Scorpio 1.91 2CPU             &#58;  60 (+ 30,= 20,- 10&#41;, 66.7 %
HIARCS 11.1 MP UCI            &#58; 100 (+ 25,= 39,- 36&#41;, 44.5 %
Rybka 2.3.1 mp 32-bit         &#58; 100 (+ 11,= 43,- 46&#41;, 32.5 %

That's what I have compiled thus far. I'll have full data for the engines probably by tomorrow.

Erik Roggenburg · Post by **Erik Roggenburg** » Sun Apr 08, 2007 10:42 am

WHEW! It is finally over! Loop 13.5 is an improvement over the previous version, LoopMP 12.32 - according to the games played on my hardware:

FULL LIST:

Code: Select all

Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.1 mp 32-bit          &#58; 2828   16  15  1600    74.3 %   2643   27.9 %
  2 Rybka 2.2 mp 32-bit            &#58; 2817   15  15  1580    72.7 %   2647   30.4 %
  3 Rybka 2.3 LK mp 32-bit         &#58; 2811   18  18  1140    73.1 %   2638   28.8 %
  4 Rybka 2.3 mp 32-bit            &#58; 2811   18  18  1140    73.0 %   2638   30.4 %
  5 HIARCS 11.1 MP UCI             &#58; 2731   13  13  1700    61.2 %   2652   35.6 %
  6 Loop 13.5.32 2CPU              &#58; 2729   14  14  1500    62.8 %   2638   36.5 %
  7 Deep Fritz 10                  &#58; 2723   14  14  1680    59.6 %   2655   30.7 %
  8 LoopMP 12.32 2CPU              &#58; 2704   13  13  1680    56.9 %   2656   36.5 %
  9 HIARCS 11 MP UCI               &#58; 2696   14  14  1480    58.1 %   2640   37.2 %
 10 Deep Shredder 10 UCI           &#58; 2676   14  14  1680    52.7 %   2657   29.6 %
 11 Naum 2.1 MP                    &#58; 2666   14  14  1500    50.2 %   2665   38.2 %
 12 HIARCS 11 UCI                  &#58; 2654   15  15  1380    52.7 %   2635   35.7 %
 13 Deep Junior 10.1               &#58; 2640   14  14  1680    47.4 %   2658   31.0 %
 14 Toga II 1.2.1a                 &#58; 2638   14  14  1680    47.1 %   2658   33.9 %
 15 Spike 1.2 Turin                &#58; 2633   13  13  1680    46.3 %   2658   36.2 %
 16 Hiarcs X54 UCI                 &#58; 2629   16  16  1140    51.2 %   2621   35.7 %
 17 Hiarcs X50 UCI                 &#58; 2628   16  16  1140    51.0 %   2621   36.7 %
 18 Fritz 9                        &#58; 2623   15  15  1380    48.1 %   2636   29.9 %
 19 Fruit 2.2.1                    &#58; 2621   14  14  1680    44.6 %   2659   33.3 %
 20 Glaurung 1.2 SMP               &#58; 2585   14  14  1680    39.3 %   2660   27.9 %
 21 Chess Tiger 2007 UCI           &#58; 2562   14  14  1680    36.1 %   2661   32.0 %
 22 Naum 2.0                       &#58; 2549   16  16  1200    38.9 %   2627   36.2 %
 23 Scorpio 1.91 2CPU              &#58; 2542   16  16  1440    31.8 %   2675   30.3 %
 24 Deep Pharaon 3.5.1             &#58; 2527   15  15  1680    31.5 %   2662   29.0 %
 25 Chess Tiger 15.0               &#58; 2525   17  17  1140    35.8 %   2626   34.5 %
 26 Deep Frenzee 3.0               &#58; 2508   15  15  1680    29.0 %   2663   24.6 %
 27 Scorpio 1.8 2CPU               &#58; 2503   16  16  1380    31.1 %   2641   30.0 %

Always remember:

Code: Select all

Time control&#58; 4'+2" 
Hash&#58; 128 MB 
EGTBs&#58; 3, 4, 5, and some 6 piece tables are available. Engines access or do not access based upon their default settings. 
Testset&#58; Noomen 2006 &#40;through 2007 Mar 03&#41;; Silver Suite v2 &#40;FULL&#41; 
Ponder&#58; OFF 
Hardware&#58; AMD X2 4400+ with 2GB of RAM 

GUIs&#58; Primarily Deep Fritz 10, but I have used Arena, Lokasoft's ERT, and Shredderchess too.  Received a version of Scorpio 1.91 from Daniel Sharwul on 3/16/07.  That version is being used in DF10 GUI, and so far, no crashes!
 
Cores and such&#58; If an engine is capable of running on multiple cores, it does so in my tests. Since my hardware is dual core, I can't run an engine on anything greater than 2 cores. 

I wouldn't call an engine "Blah-blah 3.98 MP" or "Deep Flarbin Blah 7.3XY" if it were not running on 2 cores. If I ever did run Deep Fritz 10 on 1 CPU, I'd add the tag "1CPU" to its name. Similar to how I handle Scorpio 1.8 - I have to add the 2CPU tag so everyone knows it is running on 2 cores.

Erik Roggenburg · Post by **Erik Roggenburg** » Sun Apr 08, 2007 11:03 am

In the Pruned List, which consists only of the "best/strongest" engine versions that have been tested, Loop 13.5.32 2CPU takes over the #2 position from HIARCS 11.1 MP UCI. It is a very, very close, and statistically speaking, the engines are identical in strength.

From a stability standpoint, Loop did not crash once in 1,400 games, so that makes me very happy.

What is going to be interesting on a go forward basis is seeing whether or not engines like HIARCS, Loop, and Rybka maintain the amount that they improve from old version to new version.

HIARCS 11.1 MP UCI was a 35 Elo improvement over HIARCS 11 MP UCI according to the FULL LIST. That's about a 1.3% improvement.

Loop 13.5.32 2CPU was a 25 Elo improvement over LoopMP 12.32 2CPU, a .92% improvement.

Rybka 2.3.1 mp 32-bit was a 11 Elo improvement over Rybka 2.2 mp 32-bit, a .39% improvement.

What might this indicate? Well, it indicates that it could get harder and harder to squeeze Elo improvement out of an existing engine. Sort of a law of diminishing returns in the programming world, I guess.

Let's speculate for fun! What if HIARCS team squeezes out a .75% improvement in their next version over HIARCS 11.1 MP UCI? That would put them at 2751 Elo.

Now, what if Vas and company eek out the same .39% improvement in their next version over Rybka 2.3.1 mp 32-bit? That would put them at 2839 Elo.

Rybka still would have a commanding lead in the Elo race, but it would be shrinking.

What if the HIARCS team pulls off another fine improvement, and eeks out 1.3% instead of the previously assumed .75%? That puts HIARCS at 2813 Elo. AHA! Now things start to get interesting!

As a fan of computer chess engines, I truly hope that one or two engines can continue to make the kinds of strides that they have been able to make lately. I'd like to see the performance gap closed a bit between Rybka and the rest of the field. I bet the playchess.com server would become a more interesting place if Rybka were no longer a clear #1. Remember the days when Shredder was #1? It was #1, but it wasn't a DOMINANT #1, so folks could toss out HIARCS, Fritz, Chess Tiger, Junior, etc. and still be competitive.

We'll see what develops!

PRUNED LIST:

Code: Select all

    Program                          Elo    +   -   Games   Score   Av.Op.  Draws

  1 Rybka 2.3.1 mp 32-bit          &#58; 2825   17  17  1400    75.9 %   2626   25.0 %
  2 Loop 13.5.32 2CPU              &#58; 2730   15  15  1400    63.7 %   2632   34.9 %
  3 HIARCS 11.1 MP UCI             &#58; 2729   15  15  1400    63.5 %   2633   33.6 %
  4 Deep Fritz 10                  &#58; 2721   19  19   960    60.2 %   2649   29.2 %
  5 Naum 2.1 MP                    &#58; 2665   18  18   960    51.8 %   2653   36.5 %
  6 Deep Shredder 10 UCI           &#58; 2664   19  19   960    51.6 %   2653   28.8 %
  7 Deep Junior 10.1               &#58; 2636   18  18   960    47.3 %   2654   31.1 %
  8 Toga II 1.2.1a                 &#58; 2633   18  18   960    46.9 %   2655   32.1 %
  9 Fruit 2.2.1                    &#58; 2628   18  18   960    46.2 %   2655   32.8 %
 10 Spike 1.2 Turin                &#58; 2626   18  18   960    45.8 %   2655   36.8 %
 11 Glaurung 1.2 SMP               &#58; 2597   19  19   960    41.4 %   2657   28.6 %
 12 Chess Tiger 2007 UCI           &#58; 2557   19  19   960    35.7 %   2659   30.3 %
 13 Scorpio 1.91 2CPU              &#58; 2541   19  19   960    33.5 %   2660   32.3 %
 14 Deep Pharaon 3.5.1             &#58; 2536   19  19   960    32.9 %   2661   29.3 %
 15 Deep Frenzee 3.0               &#58; 2509   20  20   960    29.3 %   2662   24.1 %

Spock · Post by **Spock** » Sun Apr 08, 2007 11:08 am

Erik Roggenburg wrote:In the Pruned List, which consists only of the "best/strongest" engine versions that have been tested, Loop 13.5.32 2CPU takes over the #2 position from HIARCS 11.1 MP UCI. It is a very, very close, and statistically speaking, the engines are identical in strength.

You could probably add Deep Fritz 10 to that statement as well. It is also within the statistical margin of error

Erik Roggenburg · Post by **Erik Roggenburg** » Sun Apr 08, 2007 11:13 am

Spock wrote:
Erik Roggenburg wrote:In the Pruned List, which consists only of the "best/strongest" engine versions that have been tested, Loop 13.5.32 2CPU takes over the #2 position from HIARCS 11.1 MP UCI. It is a very, very close, and statistically speaking, the engines are identical in strength.
You could probably add Deep Fritz 10 to that statement as well. It is also within the statistical margin of error

Oh, yes! Very true. Deep Fritz 10 has been a very nice addition to the stable, and it remains one of my choices for analysis today.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Sun Apr 08, 2007 11:14 am

Spock wrote:
Erik Roggenburg wrote:In the Pruned List, which consists only of the "best/strongest" engine versions that have been tested, Loop 13.5.32 2CPU takes over the #2 position from HIARCS 11.1 MP UCI. It is a very, very close, and statistically speaking, the engines are identical in strength.
You could probably add Deep Fritz 10 to that statement as well. It is also within the statistical margin of error

I totaly agree Ray

Yarget · Post by **Yarget** » Sun Apr 08, 2007 12:33 pm

Hello Erik!

Thanks for your detailed test of the latest Loop engine. Keep up your excellent testwork

Best regards
Per

Shaun · Post by **Shaun** » Sun Apr 08, 2007 1:33 pm

Hi Erik,

Currently in the CCRL testing 13.5 is behind 12.32 - I will be playing more games with 13.5 to increase our opponent overlap.

currently we have (40/4) the following overlap

Loops score your testing/our testing:

Chess Tiger 2007 UCI 76.0%/70.3%
Rybka 2.3.1 mp 32-bit 37.0%/20.3%
Toga 1.2.1a ?%/65.6%
Spike 1.2 Turin 67.0%/76.6%

So up and down (although I could not find your Toga score probably missed it).

One thing I have noticed that on my core 2 duo the 64bit compile give no improvement.

As the overlap in opponents inceases it will be interesting to see how this unfolds.

Thanks again for your interesting rating list

Shaun

dadij · Post by **dadij** » Sun Apr 08, 2007 5:03 pm

Erik Roggenburg wrote: HIARCS 11.1 MP UCI was a 35 Elo improvement over HIARCS 11 MP UCI according to the FULL LIST. That's about a 1.3% improvement.

Loop 13.5.32 2CPU was a 25 Elo improvement over LoopMP 12.32 2CPU, a .92% improvement.

Rybka 2.3.1 mp 32-bit was a 11 Elo improvement over Rybka 2.2 mp 32-bit, a .39% improvement.

The Elo rating scale is an open-ended floating scale without reproducible fixed points. Therefore it is best to stay away from percentage calculations like those above. Only the actual rating difference (as number of rating points) is meaningful.

Apart from that, Erik, thank you for your interesting tests

4'+2" UPDATE: Loop 13.5 is here... matches underway

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa

Re: 4'+2" UPDATE: Loop 13.5 is here... matches underwa