Long TC matches with Houdini 3 Beta

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Long TC matches with Houdini 3 Beta

Post by Houdini »

The 3 matches have now finished.

Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).

Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).

Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).

Download All the Games (single zip containing the 3 matches).

Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
The individual match results have very large confidence intervals, please consider these when discussing results.

Cheers,
Robert
carldaman
Posts: 2283
Joined: Sat Jun 02, 2012 2:13 am

Re: Long TC matches with Houdini 3 Beta

Post by carldaman »

Nice result, Robert. Also, ironic and interesting that the win % =~ phi, wonder if there is a significance :wink:

Regards,
CL
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Long TC matches with Houdini 3 Beta

Post by Laskos »

Houdini wrote:The 3 matches have now finished.

Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).

Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).

Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).

Download All the Games (single zip containing the 3 matches).

Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
The individual match results have very large confidence intervals, please consider these when discussing results.

Cheers,
Robert
Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Long TC matches with Houdini 3 Beta

Post by lkaufman »

Based on the "Grand unified list (Ordo)", and excluding the self-play match between Houdinis since that is known to exaggerate rating differences, I get +47.5 over H2 and +45 over the higher-rated (!) Houdini 1.5. If we assume that H2 is really +15 over H1.5, then this would work out to about +40 over H2 and +55 over H1.5, with of course a large margin of error. We'll have to judge by the 40/40 and 40/20 rating lists of CCRL and CEGT (since no slower testing is done quickly enough and with large enough samples to be useful). If you really do show a 40+ elo gain over the top Houdini version on these lists, I would say "well done". Even in a year, it's not easy to improve a top engine by forty elo points; Komodo has done so, but Rybka, Stockfish, and Critter have not with their latest versions.
Uri Blass
Posts: 10410
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Long TC matches with Houdini 3 Beta

Post by Uri Blass »

I think that it may be interesting if Robert repeat the same matchs against the same opponents at significantly faster time control(I suggest 6+2 time control or 3+1 time control).
Uri Blass
Posts: 10410
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Long TC matches with Houdini 3 Beta

Post by Uri Blass »

Don wrote:
Houdini wrote:
Don wrote:Hi Robert,

The results so far are pretty impressive. It looks like you have a nice gain here.

I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.

What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.

Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.

Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.

So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.

Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.

Code: Select all

60+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4485.00 64 bit    3032.5    6.8    6.8    8867   56.7%  2972.4   42.6% 
   2 Komodo 4476.04 64 bit    3022.3    6.3    6.3   10206   54.6%  2982.3   44.1% 
   3 Komodo 4483.00 64 bit    3021.5    6.3    6.3   10181   54.7%  2979.3   44.0% 
   4 Komodo 4485.19 64 bit    3020.2    6.8    6.8    8848   54.2%  2983.2   44.8% 
   5 Komodo 4477.45 64 bit    3019.6    6.8    6.8    8714   54.8%  2977.5   44.5% 
   6 Komodo 4487.06 64 bit    3019.3    6.4    6.4    9859   54.1%  2983.2   43.9% 
   7 Houdini 1.5a x64         3018.0    3.7    3.7   30090   50.6%  3012.1   41.9% 
   8 Komodo 4481.02 64 bit    3017.7    6.9    6.9    8555   54.5%  2978.3   44.5% 
   9 Komodo 4482.02 64 bit    3017.3   11.0   11.0    3358   54.0%  2981.2   44.3% 
  10 Komodo 4481.00 64 bit    3017.0    5.8    5.8   12144   53.8%  2983.2   43.9% 
  11 Komodo 4477.08 64 bit    3015.6    9.0    9.0    5051   54.7%  2974.5   42.6% 
  12 Komodo 4479.00 64 bit    3014.6   11.0   11.0    3349   53.9%  2979.5   44.3% 
  13 Komodo 4477.15 64 bit    3013.8    5.4    5.4   13705   54.6%  2973.8   44.7% 
  14 Critter 1.4 64-bit SSE4  3000.0    3.0    3.0   44334   48.6%  3011.8   45.8% 
  15 Stockfish 2.2.2 JA       2931.7    3.0    3.0   44317   40.3%  3017.8   44.6% 


90+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4471.02 64 bit    3060.9   12.9   12.9    2530   57.9%  2990.0   35.4% 
   2 Komodo 4467.01 64 bit    3027.6    8.7    8.7    5300   54.4%  2990.0   42.0% 
   3 Houdini 1.5a x64         3025.2    7.2    7.2    7884   49.7%  3027.6   39.1% 
   4 Komodo 4468.00 64 bit    3024.8    8.7    8.7    5298   54.1%  2990.1   42.4% 
   5 Komodo 4471.01 64 bit    3021.6    8.6    8.6    5321   53.7%  2990.0   43.5% 
   6 Komodo 5 64 bit dev      3020.7    8.7    8.7    5313   53.7%  2990.1   43.1% 
   7 Critter 1.4 64-bit SSE4  3000.0    7.1    7.1    7957   46.7%  3027.6   44.4% 
   8 Stockfish 2.2.2 JA       2945.0    7.1    7.1    7921   40.4%  3027.6   42.4% 


120+2

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4467.01 64 bit    3036.1    8.5    8.5    5594   55.0%  2992.1   43.9% 
   2 Houdini 1.5a x64         3030.6    6.0    6.0   11500   50.4%  3027.7   42.1% 
   3 Komodo 4463.00 64 bit    3029.7    6.1    6.1   10939   54.3%  2992.3   44.6% 
   4 Komodo 4466.02 64 bit    3027.1    7.5    7.5    7127   53.9%  2992.2   45.0% 
   5 Komodo 5 64 bit dev      3021.9    6.1    6.1   10906   53.4%  2992.2   44.4% 
   6 Critter 1.4 64-bit SSE4  3000.0    5.9    5.9   11530   46.8%  3027.7   45.5% 
   7 Stockfish 2.2.2 JA       2946.2    5.9    5.9   11536   40.7%  3027.7   45.9% 
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.

I wonder if it was really a big improvement relative to other versions of komodo or maybe there is some mistake in the data or some problem in the machine that tested it.
melajara
Posts: 213
Joined: Thu Dec 16, 2010 4:39 pm

Re: Long TC matches with Houdini 3 Beta

Post by melajara »

So everybody here has Houdini 3 for testing when it is not available for the general public?

This would make sense to me as it is a sure way to pinpoint glitches, bugs, under optimizations etc, on diverse machine architectures and time controls.

Anyway I understand Robert is in no hurry to release his engine V3 as he has no competition with Houdini 2 yet, Komodo 5 MP still being vaporware ;-)
Per ardua ad astra
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Long TC matches with Houdini 3 Beta

Post by Laskos »

lkaufman wrote:Based on the "Grand unified list (Ordo)", and excluding the self-play match between Houdinis since that is known to exaggerate rating differences, I get +47.5 over H2 and +45 over the higher-rated (!) Houdini 1.5. If we assume that H2 is really +15 over H1.5, then this would work out to about +40 over H2 and +55 over H1.5, with of course a large margin of error. We'll have to judge by the 40/40 and 40/20 rating lists of CCRL and CEGT (since no slower testing is done quickly enough and with large enough samples to be useful). If you really do show a 40+ elo gain over the top Houdini version on these lists, I would say "well done". Even in a year, it's not easy to improve a top engine by forty elo points; Komodo has done so, but Rybka, Stockfish, and Critter have not with their latest versions.
Well, I suppose the grand unified list contains mainly short TC results, better take CCRL 40/40 and CEGT 40/20, then it will be 50-60 points improvement at long TC, if I am not wrong. Even lager if the result against H2.0 is considered. We will see IPON in two weeks for blitz.

Kai
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Long TC matches with Houdini 3 Beta

Post by MM »

Laskos wrote:




Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.


Best Regards
MM
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Long TC matches with Houdini 3 Beta

Post by MM »

Uri Blass wrote:I think that it may be interesting if Robert repeat the same matchs against the same opponents at significantly faster time control(I suggest 6+2 time control or 3+1 time control).
Yes.
MM