Long TC matches with Houdini 3 Beta

Houdini · Post by **Houdini** » Fri Sep 28, 2012 12:18 am

The 3 matches have now finished.

Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).

Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).

Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).

Download All the Games (single zip containing the 3 matches).

Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
The individual match results have very large confidence intervals, please consider these when discussing results.

Cheers,
Robert

carldaman · Post by **carldaman** » Fri Sep 28, 2012 12:21 am

Nice result, Robert. Also, ironic and interesting that the win % =~ phi, wonder if there is a significance

Regards,
CL

Laskos · Post by **Laskos** » Fri Sep 28, 2012 12:56 am

Houdini wrote:The 3 matches have now finished.

Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).

Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).

Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).

Download All the Games (single zip containing the 3 matches).

Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
The individual match results have very large confidence intervals, please consider these when discussing results.

Cheers,
Robert

Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.

lkaufman · Post by **lkaufman** » Fri Sep 28, 2012 7:37 am

Based on the "Grand unified list (Ordo)", and excluding the self-play match between Houdinis since that is known to exaggerate rating differences, I get +47.5 over H2 and +45 over the higher-rated (!) Houdini 1.5. If we assume that H2 is really +15 over H1.5, then this would work out to about +40 over H2 and +55 over H1.5, with of course a large margin of error. We'll have to judge by the 40/40 and 40/20 rating lists of CCRL and CEGT (since no slower testing is done quickly enough and with large enough samples to be useful). If you really do show a 40+ elo gain over the top Houdini version on these lists, I would say "well done". Even in a year, it's not easy to improve a top engine by forty elo points; Komodo has done so, but Rybka, Stockfish, and Critter have not with their latest versions.

Uri Blass · Post by **Uri Blass** » Fri Sep 28, 2012 8:35 am

I think that it may be interesting if Robert repeat the same matchs against the same opponents at significantly faster time control(I suggest 6+2 time control or 3+1 time control).

Uri Blass · Post by **Uri Blass** » Fri Sep 28, 2012 8:49 am

Don wrote:
Houdini wrote:
Don wrote:Hi Robert,

The results so far are pretty impressive. It looks like you have a nice gain here.

I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.

What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.

Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.

Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.

So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.

Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
Code: Select all
60+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4485.00 64 bit    3032.5    6.8    6.8    8867   56.7%  2972.4   42.6% 
   2 Komodo 4476.04 64 bit    3022.3    6.3    6.3   10206   54.6%  2982.3   44.1% 
   3 Komodo 4483.00 64 bit    3021.5    6.3    6.3   10181   54.7%  2979.3   44.0% 
   4 Komodo 4485.19 64 bit    3020.2    6.8    6.8    8848   54.2%  2983.2   44.8% 
   5 Komodo 4477.45 64 bit    3019.6    6.8    6.8    8714   54.8%  2977.5   44.5% 
   6 Komodo 4487.06 64 bit    3019.3    6.4    6.4    9859   54.1%  2983.2   43.9% 
   7 Houdini 1.5a x64         3018.0    3.7    3.7   30090   50.6%  3012.1   41.9% 
   8 Komodo 4481.02 64 bit    3017.7    6.9    6.9    8555   54.5%  2978.3   44.5% 
   9 Komodo 4482.02 64 bit    3017.3   11.0   11.0    3358   54.0%  2981.2   44.3% 
  10 Komodo 4481.00 64 bit    3017.0    5.8    5.8   12144   53.8%  2983.2   43.9% 
  11 Komodo 4477.08 64 bit    3015.6    9.0    9.0    5051   54.7%  2974.5   42.6% 
  12 Komodo 4479.00 64 bit    3014.6   11.0   11.0    3349   53.9%  2979.5   44.3% 
  13 Komodo 4477.15 64 bit    3013.8    5.4    5.4   13705   54.6%  2973.8   44.7% 
  14 Critter 1.4 64-bit SSE4  3000.0    3.0    3.0   44334   48.6%  3011.8   45.8% 
  15 Stockfish 2.2.2 JA       2931.7    3.0    3.0   44317   40.3%  3017.8   44.6% 


90+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4471.02 64 bit    3060.9   12.9   12.9    2530   57.9%  2990.0   35.4% 
   2 Komodo 4467.01 64 bit    3027.6    8.7    8.7    5300   54.4%  2990.0   42.0% 
   3 Houdini 1.5a x64         3025.2    7.2    7.2    7884   49.7%  3027.6   39.1% 
   4 Komodo 4468.00 64 bit    3024.8    8.7    8.7    5298   54.1%  2990.1   42.4% 
   5 Komodo 4471.01 64 bit    3021.6    8.6    8.6    5321   53.7%  2990.0   43.5% 
   6 Komodo 5 64 bit dev      3020.7    8.7    8.7    5313   53.7%  2990.1   43.1% 
   7 Critter 1.4 64-bit SSE4  3000.0    7.1    7.1    7957   46.7%  3027.6   44.4% 
   8 Stockfish 2.2.2 JA       2945.0    7.1    7.1    7921   40.4%  3027.6   42.4% 


120+2

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4467.01 64 bit    3036.1    8.5    8.5    5594   55.0%  2992.1   43.9% 
   2 Houdini 1.5a x64         3030.6    6.0    6.0   11500   50.4%  3027.7   42.1% 
   3 Komodo 4463.00 64 bit    3029.7    6.1    6.1   10939   54.3%  2992.3   44.6% 
   4 Komodo 4466.02 64 bit    3027.1    7.5    7.5    7127   53.9%  2992.2   45.0% 
   5 Komodo 5 64 bit dev      3021.9    6.1    6.1   10906   53.4%  2992.2   44.4% 
   6 Critter 1.4 64-bit SSE4  3000.0    5.9    5.9   11530   46.8%  3027.7   45.5% 
   7 Stockfish 2.2.2 JA       2946.2    5.9    5.9   11536   40.7%  3027.7   45.9% 

It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.

I wonder if it was really a big improvement relative to other versions of komodo or maybe there is some mistake in the data or some problem in the machine that tested it.

melajara · Post by **melajara** » Fri Sep 28, 2012 9:00 am

So everybody here has Houdini 3 for testing when it is not available for the general public?

This would make sense to me as it is a sure way to pinpoint glitches, bugs, under optimizations etc, on diverse machine architectures and time controls.

Anyway I understand Robert is in no hurry to release his engine V3 as he has no competition with Houdini 2 yet, Komodo 5 MP still being vaporware

Laskos · Post by **Laskos** » Fri Sep 28, 2012 9:06 am

lkaufman wrote:Based on the "Grand unified list (Ordo)", and excluding the self-play match between Houdinis since that is known to exaggerate rating differences, I get +47.5 over H2 and +45 over the higher-rated (!) Houdini 1.5. If we assume that H2 is really +15 over H1.5, then this would work out to about +40 over H2 and +55 over H1.5, with of course a large margin of error. We'll have to judge by the 40/40 and 40/20 rating lists of CCRL and CEGT (since no slower testing is done quickly enough and with large enough samples to be useful). If you really do show a 40+ elo gain over the top Houdini version on these lists, I would say "well done". Even in a year, it's not easy to improve a top engine by forty elo points; Komodo has done so, but Rybka, Stockfish, and Critter have not with their latest versions.

Well, I suppose the grand unified list contains mainly short TC results, better take CCRL 40/40 and CEGT 40/20, then it will be 50-60 points improvement at long TC, if I am not wrong. Even lager if the result against H2.0 is considered. We will see IPON in two weeks for blitz.

Kai

MM · Post by MM » Fri Sep 28, 2012 11:03 am

Laskos wrote:

Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.

Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.

+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.

On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.

I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.

Best Regards

MM · Post by MM » Fri Sep 28, 2012 11:09 am

Uri Blass wrote:I think that it may be interesting if Robert repeat the same matchs against the same opponents at significantly faster time control(I suggest 6+2 time control or 3+1 time control).

Yes.

Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta