The 3 matches have now finished.
Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).
Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download All the Games (single zip containing the 3 matches).
Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
The individual match results have very large confidence intervals, please consider these when discussing results.
Cheers,
Robert
Long TC matches with Houdini 3 Beta
Moderators: hgm, Rebel, chrisw
-
- Posts: 2283
- Joined: Sat Jun 02, 2012 2:13 am
Re: Long TC matches with Houdini 3 Beta
Nice result, Robert. Also, ironic and interesting that the win % =~ phi, wonder if there is a significance
Regards,
CL
Regards,
CL
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Long TC matches with Houdini 3 Beta
Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.Houdini wrote:The 3 matches have now finished.
Houdini 3 - Komodo 5 match: +43 -19 =58
72-48 (+68 Elo ± 42 Elo).
Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download All the Games (single zip containing the 3 matches).
Over-all result: Houdini 3 scored 61.8% (+81 Elo ± 24 Elo) against the average of Houdini 2.0c, Komodo 5 and Stockfish 2.3.1.
The individual match results have very large confidence intervals, please consider these when discussing results.
Cheers,
Robert
-
- Posts: 5966
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Long TC matches with Houdini 3 Beta
Based on the "Grand unified list (Ordo)", and excluding the self-play match between Houdinis since that is known to exaggerate rating differences, I get +47.5 over H2 and +45 over the higher-rated (!) Houdini 1.5. If we assume that H2 is really +15 over H1.5, then this would work out to about +40 over H2 and +55 over H1.5, with of course a large margin of error. We'll have to judge by the 40/40 and 40/20 rating lists of CCRL and CEGT (since no slower testing is done quickly enough and with large enough samples to be useful). If you really do show a 40+ elo gain over the top Houdini version on these lists, I would say "well done". Even in a year, it's not easy to improve a top engine by forty elo points; Komodo has done so, but Rybka, Stockfish, and Critter have not with their latest versions.
-
- Posts: 10410
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Long TC matches with Houdini 3 Beta
I think that it may be interesting if Robert repeat the same matchs against the same opponents at significantly faster time control(I suggest 6+2 time control or 3+1 time control).
-
- Posts: 10410
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Long TC matches with Houdini 3 Beta
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.Don wrote:Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.Houdini wrote:Don, thank you, we both know how hard work every Elo point gain is.Don wrote:Hi Robert,
The results so far are pretty impressive. It looks like you have a nice gain here.
I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.
What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.
Don
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.
Robert
So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.
Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
Code: Select all
60+1 Rank Name Elo + - games score oppo. draws 1 Komodo 4485.00 64 bit 3032.5 6.8 6.8 8867 56.7% 2972.4 42.6% 2 Komodo 4476.04 64 bit 3022.3 6.3 6.3 10206 54.6% 2982.3 44.1% 3 Komodo 4483.00 64 bit 3021.5 6.3 6.3 10181 54.7% 2979.3 44.0% 4 Komodo 4485.19 64 bit 3020.2 6.8 6.8 8848 54.2% 2983.2 44.8% 5 Komodo 4477.45 64 bit 3019.6 6.8 6.8 8714 54.8% 2977.5 44.5% 6 Komodo 4487.06 64 bit 3019.3 6.4 6.4 9859 54.1% 2983.2 43.9% 7 Houdini 1.5a x64 3018.0 3.7 3.7 30090 50.6% 3012.1 41.9% 8 Komodo 4481.02 64 bit 3017.7 6.9 6.9 8555 54.5% 2978.3 44.5% 9 Komodo 4482.02 64 bit 3017.3 11.0 11.0 3358 54.0% 2981.2 44.3% 10 Komodo 4481.00 64 bit 3017.0 5.8 5.8 12144 53.8% 2983.2 43.9% 11 Komodo 4477.08 64 bit 3015.6 9.0 9.0 5051 54.7% 2974.5 42.6% 12 Komodo 4479.00 64 bit 3014.6 11.0 11.0 3349 53.9% 2979.5 44.3% 13 Komodo 4477.15 64 bit 3013.8 5.4 5.4 13705 54.6% 2973.8 44.7% 14 Critter 1.4 64-bit SSE4 3000.0 3.0 3.0 44334 48.6% 3011.8 45.8% 15 Stockfish 2.2.2 JA 2931.7 3.0 3.0 44317 40.3% 3017.8 44.6% 90+1 Rank Name Elo + - games score oppo. draws 1 Komodo 4471.02 64 bit 3060.9 12.9 12.9 2530 57.9% 2990.0 35.4% 2 Komodo 4467.01 64 bit 3027.6 8.7 8.7 5300 54.4% 2990.0 42.0% 3 Houdini 1.5a x64 3025.2 7.2 7.2 7884 49.7% 3027.6 39.1% 4 Komodo 4468.00 64 bit 3024.8 8.7 8.7 5298 54.1% 2990.1 42.4% 5 Komodo 4471.01 64 bit 3021.6 8.6 8.6 5321 53.7% 2990.0 43.5% 6 Komodo 5 64 bit dev 3020.7 8.7 8.7 5313 53.7% 2990.1 43.1% 7 Critter 1.4 64-bit SSE4 3000.0 7.1 7.1 7957 46.7% 3027.6 44.4% 8 Stockfish 2.2.2 JA 2945.0 7.1 7.1 7921 40.4% 3027.6 42.4% 120+2 Rank Name Elo + - games score oppo. draws 1 Komodo 4467.01 64 bit 3036.1 8.5 8.5 5594 55.0% 2992.1 43.9% 2 Houdini 1.5a x64 3030.6 6.0 6.0 11500 50.4% 3027.7 42.1% 3 Komodo 4463.00 64 bit 3029.7 6.1 6.1 10939 54.3% 2992.3 44.6% 4 Komodo 4466.02 64 bit 3027.1 7.5 7.5 7127 53.9% 2992.2 45.0% 5 Komodo 5 64 bit dev 3021.9 6.1 6.1 10906 53.4% 2992.2 44.4% 6 Critter 1.4 64-bit SSE4 3000.0 5.9 5.9 11530 46.8% 3027.7 45.5% 7 Stockfish 2.2.2 JA 2946.2 5.9 5.9 11536 40.7% 3027.7 45.9%
I wonder if it was really a big improvement relative to other versions of komodo or maybe there is some mistake in the data or some problem in the machine that tested it.
-
- Posts: 213
- Joined: Thu Dec 16, 2010 4:39 pm
Re: Long TC matches with Houdini 3 Beta
So everybody here has Houdini 3 for testing when it is not available for the general public?
This would make sense to me as it is a sure way to pinpoint glitches, bugs, under optimizations etc, on diverse machine architectures and time controls.
Anyway I understand Robert is in no hurry to release his engine V3 as he has no competition with Houdini 2 yet, Komodo 5 MP still being vaporware
This would make sense to me as it is a sure way to pinpoint glitches, bugs, under optimizations etc, on diverse machine architectures and time controls.
Anyway I understand Robert is in no hurry to release his engine V3 as he has no competition with Houdini 2 yet, Komodo 5 MP still being vaporware
Per ardua ad astra
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Long TC matches with Houdini 3 Beta
Well, I suppose the grand unified list contains mainly short TC results, better take CCRL 40/40 and CEGT 40/20, then it will be 50-60 points improvement at long TC, if I am not wrong. Even lager if the result against H2.0 is considered. We will see IPON in two weeks for blitz.lkaufman wrote:Based on the "Grand unified list (Ordo)", and excluding the self-play match between Houdinis since that is known to exaggerate rating differences, I get +47.5 over H2 and +45 over the higher-rated (!) Houdini 1.5. If we assume that H2 is really +15 over H1.5, then this would work out to about +40 over H2 and +55 over H1.5, with of course a large margin of error. We'll have to judge by the 40/40 and 40/20 rating lists of CCRL and CEGT (since no slower testing is done quickly enough and with large enough samples to be useful). If you really do show a 40+ elo gain over the top Houdini version on these lists, I would say "well done". Even in a year, it's not easy to improve a top engine by forty elo points; Komodo has done so, but Rybka, Stockfish, and Critter have not with their latest versions.
Kai
-
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
Re: Long TC matches with Houdini 3 Beta
Pity that the tactical mode weakens the engine (overall), otherwise it would be by default.Laskos wrote:
Seems 50-60 points imrovement at long TC, which is impressive. Also, in tactical mode beats everything on test suites.
+50/60 elo? Maybe, but i would be very prudent about it,
i think it's very soon to estimate an improvement, note that many opening lines of the matches are pretty long and that probably favors one side or another (usually Houdini adapts to a wider range of positions of the middlegame better than other engines).
Note also that 360 games still are not a sample to jump to conclusions.
On the other hand, considering that probably H3 is not tactical stronger than H2 or, at least, it's not much stronger than it, if H3 should be really so strong overall at long time control it would mean that Robert Houdart has made a superb work on the positional play.
I would be curious to know some results of H3 at chess960, in which the influence of the opening book is zero.
Best Regards
MM
-
- Posts: 766
- Joined: Sun Oct 16, 2011 11:25 am
Re: Long TC matches with Houdini 3 Beta
Yes.Uri Blass wrote:I think that it may be interesting if Robert repeat the same matchs against the same opponents at significantly faster time control(I suggest 6+2 time control or 3+1 time control).
MM