A match between the past (Fruit) and the present (Houdini 3)

Yarget · Post by **Yarget** » Fri Dec 28, 2012 11:26 pm

Hello everyone!

How would you expect the current number one (Houdini 3) to perform against the number one engine in seven years time in 2019? Perhaps a decent defeat (a score around 20%) or perhaps worse? It is ofcourse impossible to predict but considering the strength of Houdini 3 it’s simply hard to imagine that the current number one would be totally outplayed. However if you look at a small test I have made it might just become true….

Let’s go seven years back in time. In December 2005 (it was before the Rybka-era) engines like Fritz 9, Shredder 9, Hiarcs 10 and Fruit 2.2.1 were fighting for the topspot in the ratinglists. A tight struggle but with Fruit as the number one in at least two ratinglists from that time:

http://www.stmintz.com/ccc/index.php?id=485608

http://www.computerschach.de/index.php? ... Itemid=248

Also seven years ago I was testing a lot and I was running my own ratinglist (The PEJ Ratinglist) before I had the chance to join Klaus Wlotzka and made a MP ratinglistversion of the CSS Ratinglist. In this period I was testing thousands of games and I must confess that Fruit was one of my favourite engines (and so much bigger was my disappointment that a multicore version never was released). In my opinion Fruit was (and still is) a strong all-round engine and I was impressed that it could compete with its strong german rivals (Fritz 9 and Shredder 9). If someone exactly seven years ago had asked me how my favourite engine Fruit would perform against the number one seven years later I think I would predict a score of 20 or maybe even 25 percent…

I have had a very long break from testing but I have always once or twice a month checked the talkchess forum and some of the leading ratinglists. When I discovered that Houdini 3 had made a huge step forward I couldn’t help buying this extraordinary engine and soon I decided to make a match between the past (Fruit 2.2.1) and the present (Houdini 3) based on the way Klaus Wlotzka was running his great ratinglist. The tests were made without openingbooks but based on 10 carefully selected openingpositions that every engine had to play with both white and black (all together 20 games in one match). Here you can see the fixed openingpositions:

http://www.computerschach.de/index.php? ... Itemid=169

And here follows the exact testconditions:

Operatingsystem: Windows 7 64-bits
Intel Core i7-2670QM 2,20 GHz (each engine using one core)
Fritz 13 GUI
10+10 (10 Minutes for the whole game plus 10 seconds for each move)
Ponder = ON
Tablebases: 3+4 pieces (8 MB cache)
1024 MB Hashtables for each engine
Books: No books allowed, engines play on their own from the startpoint of each openingposition

Before starting the match I checked a couple of ratinglists to see what I should expect regarding the final result and the prediction was a clear victory like 18,5 - 1,5 or even 19 – 1 in favour of Houdini. It was (and still is) hard for me to believe that good old Fruit was about to face not only a defeat but a true massacre.

So how did the match go between the past and the present? Well, first of all I must say that I really enjoyed following these games. Again and again Houdini surprised and amazed both Fruit and me and very soon I realized that the difference between the two engines were bigger than I expected, in fact I had the feeling that there were more than seven years between them! In a typical middlegame position Fruit would evaluate the position like minus 0,6 or 0,7 while Houdini had an evaluation around 1,5 or even more. And once and again Houdini proved that its evaluation was correct and won game after game. The final result was:

Houdini 3 – Fruit 2.2.1 19,5 - 0,5

In fact it was a small miracle that Fruit achieved a draw. After 19 games Houdini was leading 19 to 0 but then in the 20th and final game Fruit managed to secure a hard-fought draw! The total massacre was avoided but the match proved to someone like me (that haven’t been testing for several years) that there has been a huge development of strength of chessengines in the last seven years.

I wonder if this development can continue with the same speed in the next seven years? Will the number one engine in December 2019 be able to humiliate Houdini 3 the way this engine humiliated Fruit 2.2.1? I really don’t know but I do know that I will run this match in December 2019 and publish it here in Talkchess forum!

Best regards
Per

lucasart · Post by **lucasart** » Sat Dec 29, 2012 4:30 am

Yarget wrote: Let’s go seven years back in time. In December 2005 (it was before the Rybka-era) engines like Fritz 9, Shredder 9, Hiarcs 10 and Fruit 2.2.1 were fighting for the topspot in the ratinglists. A tight struggle but with Fruit as the number one in at least two ratinglists from that time:

http://www.stmintz.com/ccc/index.php?id=485608

http://www.computerschach.de/index.php? ... Itemid=248

The best version of Fruit at that time is Fruit 05/11/03, the last "beta" version by Fabien Letouzey. This version is stronger than Fruit 2.2.1, and even stronger than many of Ryan Benitez's private versions released after:
http://www.computerchess.org.uk/ccrl/40 ... t_all.html

But, as always, rating lists are lagging behind. **Especially** SSDF, which got updated once every year or something (I can't remember, but I remember it took forever, so I guess by now, everybody has lost interest in SSDF).

Anyway, seen from Houdini 3's perspective, Fruit 2.2.1, Fruit 05/11/03, Toga II, etc. are all newbie opponents...

Graham Banks · Post by **Graham Banks** » Sat Dec 29, 2012 4:33 am

lucasart wrote:The best version of Fruit at that time is Fruit 05/11/03, the last "beta" version by Fabien Letouzey. This version is stronger than Fruit 2.2.1, and even stronger than Fruit 2.3.1 by Ryan Benitez:
http://www.computerchess.org.uk/ccrl/40 ... t_all.html....

Not in our 40/40 testing, it isn't. Fruit 2.3.1 is the strongest. And yes, I know that error margins have to be taken into account, but that works both ways.

Code: Select all

CCRL 40/40 Rating List - Custom engine selection
427978 games played by 1259 programs, run by 18 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz)
Computed on December 22, 2012 with Bayeselo based on 427'978 games
Tested by CCRL team, 2005-2012, http://computerchess.org.uk/ccrl/4040/

Rank                Engine                 Elo   +    -   Score  AvOp  Games
1 Fruit 2.3.1                          2798  +13  -13  50.3%   -2.3  1737
  Fruit 051103                         2781  +19  -19  49.9%   -1.5   761
  Fruit 2.2.1                          2749  +12  -12  52.3%  -16.6  2037

Ajedrecista · Post by **Ajedrecista** » Sat Dec 29, 2012 1:01 pm

Hello:

Graham Banks wrote:
lucasart wrote:The best version of Fruit at that time is Fruit 05/11/03, the last "beta" version by Fabien Letouzey. This version is stronger than Fruit 2.2.1, and even stronger than Fruit 2.3.1 by Ryan Benitez:
http://www.computerchess.org.uk/ccrl/40 ... t_all.html....
Not in our 40/40 testing, it isn't. Fruit 2.3.1 is the strongest. And yes, I know that error margins have to be taken into account, but that works both ways.
Code: Select all
CCRL 40/40 Rating List - Custom engine selection
427978 games played by 1259 programs, run by 18 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 40 minutes on Athlon 64 X2 4600+ (2.4 GHz)
Computed on December 22, 2012 with Bayeselo based on 427'978 games
Tested by CCRL team, 2005-2012, http://computerchess.org.uk/ccrl/4040/

Rank                Engine                 Elo   +    -   Score  AvOp  Games
1 Fruit 2.3.1                          2798  +13  -13  50.3%   -2.3  1737
  Fruit 051103                         2781  +19  -19  49.9%   -1.5   761
  Fruit 2.2.1                          2749  +12  -12  52.3%  -16.6  2037

I just took a glance to your data of CCRL 40/40 and the difference between two engines including their error bars can be modelled as a Normal Difference Distribution if I am not wrong... the new mean is the ratings difference while the new standard deviation is the SRSS (Square Root of Squares Sum):

Code: Select all

2798 - 2781 ± sqrt(|13|² + |19|²) = 17 ± sqrt(530) ~ 17 ± 23.02 ~ 17 ± 23 ~ [-6, 40]

Taking a look on LOS, it gives 96% for Fruit 2.3.1 against 051103 version, which has 4% obviously.

------------------------

Just a thing I have noticed: with the inclusion of pgn codes, the button of underline and even the shortcut alt+u now print pgn instead of u (of underlining), which I had to enter manually on the keyboard... well, I usually use the keyboard for opening and closing tags, but I say it for other people on the forum.

Regards from Spain.

Ajedrecista.

Yarget · Post by **Yarget** » Sat Dec 29, 2012 1:01 pm

The best version of Fruit at that time is Fruit 05/11/03, the last "beta" version by Fabien Letouzey.

Not in our 40/40 testing, it isn't. Fruit 2.3.1 is the strongest.

True, I remember that following the release of Fruit 2.2.1 there appeared quite a lot of Fruit versions (Fruit 2.3.1, Fruit 05/11/03, Fruit 2.3.4n, Fruit 2.3.5m and several more) and some of them are stronger than the original version 2.2.1. However my aim was not to test Houdini 3 against the strongest Fruit version but to make a match between the current number one and the engine that was number one exactly seven years ago (and in December 2005 Fruit 2.2.1 was the number one (although very close to Hiarcs 10 and Fritz 9)).

Naturally I agree with you Lucas, no matter which Fruit version Houdini will face the result will be the same. I'm still amazed by the brutal way Houdini crushed Fruit and I still wonder if the engine that is leading the ratinglists in December 2019 will be able to humiliate Houdini 3 like this engine treated Fruit 2.2.1? Quite frankly I don't think so but this would probably also have been my answer if I was asked the same question seven years ago!

Best regards
Per

Yarget · Post by **Yarget** » Sat Dec 29, 2012 1:04 pm

Ups, sorry I made a small mistake. The first two lines in my recent post was quotes by Lucas and Graham....

Vinvin · Post by **Vinvin** » Sat Dec 29, 2012 4:24 pm

And here are all the Fruit versions from the unified list with more than 500 games :

Code: Select all

Rank Name                           Elo        +       -       Games  Score   Oppo. 
 429 Fruit 090705 64-bit          2936.26    12.69   10.35     6047  44.22%  2976.98
 450 Fruit 2.3.5m p15             2925.92    18.77   18.12     1150  49.17%  2930.33
 496 Fruit 2.4 Beta A             2908.94    15.62   15.67     1922  49.92%  2909.33
 516 Grapefruit 1.0               2902.56    13.02   10.14     4637  48.56%  2911.93
 540 Fruit 090705                 2892.31    15.28   15.35     2303  43.75%  2937.54
 599 Fruit 2.3.3f Beta            2872.45    13.07   12.99     2900  48.22%  2883.26
 626 Fruit 2.3.3j Beta            2863.41    19.87   20.14      959  46.98%  2882.12
 640 Fruit 2.3 Lac                2858.89    21.33   21.36      895  47.71%  2873.19
 668 Fruit 2.3.1                  2850.58    14.87   14.83     2615  50.08%  2849.18
 694 Fruit 05/11/03               2844.69    12.75   10.35     5459  44.06%  2884.99
 703 Fruit 2.3 agg                2841.23    23.02   22.55      762  48.43%  2850.52
 773 Fruit 2.2.1                  2813.95    11.15   11.75    12957  55.51%  2774.81
 803 Fruit 2.2 Uri                2802.91    26.30   25.70      578  64.97%  2702.23
 843 Fruit 1.0 Gambit Beta 4bx    2787.90    19.77   19.50     1106  50.27%  2788.19
 931 Fruit 2.1                    2754.06    12.35   11.08     5377  55.18%  2718.58
1308 Fruit 2.0                    2656.34    17.00   16.29     1841  48.94%  2664.16

[/quote]

S.Taylor · Post by **S.Taylor** » Sat Dec 29, 2012 7:39 pm

Yarget wrote:Hello everyone!

How would you expect the current number one (Houdini 3) to perform against the number one engine in seven years time in 2019? Perhaps a decent defeat (a score around 20%) or perhaps worse? It is ofcourse impossible to predict but considering the strength of Houdini 3 it’s simply hard to imagine that the current number one would be totally outplayed. However if you look at a small test I have made it might just become true….

Let’s go seven years back in time. In December 2005 (it was before the Rybka-era) engines like Fritz 9, Shredder 9, Hiarcs 10 and Fruit 2.2.1 were fighting for the topspot in the ratinglists. A tight struggle but with Fruit as the number one in at least two ratinglists from that time:

http://www.stmintz.com/ccc/index.php?id=485608

http://www.computerschach.de/index.php? ... Itemid=248

Also seven years ago I was testing a lot and I was running my own ratinglist (The PEJ Ratinglist) before I had the chance to join Klaus Wlotzka and made a MP ratinglistversion of the CSS Ratinglist. In this period I was testing thousands of games and I must confess that Fruit was one of my favourite engines (and so much bigger was my disappointment that a multicore version never was released). In my opinion Fruit was (and still is) a strong all-round engine and I was impressed that it could compete with its strong german rivals (Fritz 9 and Shredder 9). If someone exactly seven years ago had asked me how my favourite engine Fruit would perform against the number one seven years later I think I would predict a score of 20 or maybe even 25 percent…

I have had a very long break from testing but I have always once or twice a month checked the talkchess forum and some of the leading ratinglists. When I discovered that Houdini 3 had made a huge step forward I couldn’t help buying this extraordinary engine and soon I decided to make a match between the past (Fruit 2.2.1) and the present (Houdini 3) based on the way Klaus Wlotzka was running his great ratinglist. The tests were made without openingbooks but based on 10 carefully selected openingpositions that every engine had to play with both white and black (all together 20 games in one match). Here you can see the fixed openingpositions:

http://www.computerschach.de/index.php? ... Itemid=169

And here follows the exact testconditions:

Operatingsystem: Windows 7 64-bits
Intel Core i7-2670QM 2,20 GHz (each engine using one core)
Fritz 13 GUI
10+10 (10 Minutes for the whole game plus 10 seconds for each move)
Ponder = ON
Tablebases: 3+4 pieces (8 MB cache)
1024 MB Hashtables for each engine
Books: No books allowed, engines play on their own from the startpoint of each openingposition

Before starting the match I checked a couple of ratinglists to see what I should expect regarding the final result and the prediction was a clear victory like 18,5 - 1,5 or even 19 – 1 in favour of Houdini. It was (and still is) hard for me to believe that good old Fruit was about to face not only a defeat but a true massacre.

So how did the match go between the past and the present? Well, first of all I must say that I really enjoyed following these games. Again and again Houdini surprised and amazed both Fruit and me and very soon I realized that the difference between the two engines were bigger than I expected, in fact I had the feeling that there were more than seven years between them! In a typical middlegame position Fruit would evaluate the position like minus 0,6 or 0,7 while Houdini had an evaluation around 1,5 or even more. And once and again Houdini proved that its evaluation was correct and won game after game. The final result was:

Houdini 3 – Fruit 2.2.1 19,5 - 0,5

In fact it was a small miracle that Fruit achieved a draw. After 19 games Houdini was leading 19 to 0 but then in the 20th and final game Fruit managed to secure a hard-fought draw! The total massacre was avoided but the match proved to someone like me (that haven’t been testing for several years) that there has been a huge development of strength of chessengines in the last seven years.

I wonder if this development can continue with the same speed in the next seven years? Will the number one engine in December 2019 be able to humiliate Houdini 3 the way this engine humiliated Fruit 2.2.1? I really don’t know but I do know that I will run this match in December 2019 and publish it here in Talkchess forum!

Best regards
Per

When it knows the job, it knows the job.
Rybka 1 was way above everything that came before it.
Rybka 2 did almost the same thing again, and then Rybka 3 made Rybka 2 look like a joke, as did Rybka 4.1 do to Rybka 3.
Then houdini 1.5a basically toppled Rybka 4.1 convincingly, and Houdini 3 is a little stronger than Houdini 1.5a.

I would have expected that Houdini 3 would beat anything in every game, which came before Rybka 1. At worsed, perhaps a very rare draw from the best of them.

Even if there were a few draws from a hundred games, i would expect it would take much more to have one win. (This has always been my speculation, because Houdini ought to know its job vs Fruit, whilst a draw might happen if Fruit did not make a bad enough mistake, and Houdini didn't provoke it to, strongly enough).

Besides, I thought there was something which was better than Fruit, before Rybka came out. Maybe Toga? or something similar or even better than that?

lucasart · Post by **lucasart** » Sun Dec 30, 2012 5:39 am

S.Taylor wrote: Besides, I thought there was something which was better than Fruit, before Rybka came out. Maybe Toga? or something similar or even better than that ?

Versions of Toga that were stronger than Fruit 05/11/03, eventually appeared. But that was later. I still say that, as of december 2005 (which is the context of this thread), Fruit 05/11/03 was the strongest.

Adam Hair · Post by **Adam Hair** » Sun Dec 30, 2012 11:26 am

lucasart wrote:
S.Taylor wrote: Besides, I thought there was something which was better than Fruit, before Rybka came out. Maybe Toga? or something similar or even better than that ?
Versions of Toga that were stronger than Fruit 05/11/03, eventually appeared. But that was later. I still say that, as of december 2005 (which is the context of this thread), Fruit 05/11/03 was the strongest.

Fruit 05/11/03 was the strongest Fruit that existed in December 2005. But, it does not appear to have been available until 2007. Fruit 2.2.1 was the strongest version available to the community in December 2005.