Long TC matches with Houdini 3 Beta

geots · Post by **geots** » Fri Sep 28, 2012 2:26 pm

lkaufman wrote:
Vinvin wrote:
lkaufman wrote:
Vinvin wrote:
Houdini wrote:The second match just finished.

Final result of the Houdini 3 - Stockfish 2.3.1 match: +44 -16 =60
74-46 (+82 Elo ± 42 Elo).
Download Games

Final result of the Houdini 3 - Houdini 2.0c match: +48 -15 =57
76.5-43.5 (+94 Elo ± 42 Elo).
Download Games
I mixed both matches and I get approximately a perf +66 above Houdini 2. That's very acceptable for a new version. Congratulations again !
It's better to disregard the Houdini-Houdini match as self-play can seriously inflate rating gains, though this is not always true. In the case of Stockfish 2.3, a direct hyperspeed match I ran against 2.2.2 showed enormous gains, much of which went away against other opponents, and the rest seems to have disappeared at longer time controls. How do you calculate a rating gain, in the absence of long time control ratings for these versions? Are you perhaps using 40/40 CCRL ratings, as the closest available? That would put the rating gain over the best Houdini version on that list (1.5) at 24 elo based only on the SF match, if we assume 2.3.1 is no better than 2.2.2. as the IPON result suggests. But there are too many ifs here and too few games.
I took the unified list (around 1 min/move AMD 2Ghz) : http://home.scarlet.be/vincentlejeune/r ... 120910.txt
And I count SF2.3.1=SF2.2.2=3178.75; Houdini 2.0 64-bit= 3219.66
H3b = (3219.66+94+3178.75+82)/2 = 3287 (+67 over H2)
So disregarding the self-play it would be +41 over H2 based on that list, which would be a very respectable gain for a year if it turns out to be so.

This is just my advice- it and a buck will get you a coke. But I think you should make yourself scarce- whether true or not- it looks like you are of the mind "any port in a storm". I have no idea how Houdini 3 and Komodo 5 or 6 will end up. But if Houdini slaughters and quarters Komodo, you are going to look awful foolish. Best to say nothing and leave it to the pundits.
It doesn't matter if you are worried or not. You give the perception of being terribly worried. Doesn't matter if you are or not- that is moot- the perception of it is what people see and what is important. Let it go.

Best,

george

Don · Post by **Don** » Fri Sep 28, 2012 2:28 pm

Uri Blass wrote:
Don wrote:
Houdini wrote:
Don wrote:Hi Robert,

The results so far are pretty impressive. It looks like you have a nice gain here.

I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.

What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.

Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.

Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.

So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.

Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
Code: Select all
60+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4485.00 64 bit    3032.5    6.8    6.8    8867   56.7%  2972.4   42.6% 
   2 Komodo 4476.04 64 bit    3022.3    6.3    6.3   10206   54.6%  2982.3   44.1% 
   3 Komodo 4483.00 64 bit    3021.5    6.3    6.3   10181   54.7%  2979.3   44.0% 
   4 Komodo 4485.19 64 bit    3020.2    6.8    6.8    8848   54.2%  2983.2   44.8% 
   5 Komodo 4477.45 64 bit    3019.6    6.8    6.8    8714   54.8%  2977.5   44.5% 
   6 Komodo 4487.06 64 bit    3019.3    6.4    6.4    9859   54.1%  2983.2   43.9% 
   7 Houdini 1.5a x64         3018.0    3.7    3.7   30090   50.6%  3012.1   41.9% 
   8 Komodo 4481.02 64 bit    3017.7    6.9    6.9    8555   54.5%  2978.3   44.5% 
   9 Komodo 4482.02 64 bit    3017.3   11.0   11.0    3358   54.0%  2981.2   44.3% 
  10 Komodo 4481.00 64 bit    3017.0    5.8    5.8   12144   53.8%  2983.2   43.9% 
  11 Komodo 4477.08 64 bit    3015.6    9.0    9.0    5051   54.7%  2974.5   42.6% 
  12 Komodo 4479.00 64 bit    3014.6   11.0   11.0    3349   53.9%  2979.5   44.3% 
  13 Komodo 4477.15 64 bit    3013.8    5.4    5.4   13705   54.6%  2973.8   44.7% 
  14 Critter 1.4 64-bit SSE4  3000.0    3.0    3.0   44334   48.6%  3011.8   45.8% 
  15 Stockfish 2.2.2 JA       2931.7    3.0    3.0   44317   40.3%  3017.8   44.6% 


90+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4471.02 64 bit    3060.9   12.9   12.9    2530   57.9%  2990.0   35.4% 
   2 Komodo 4467.01 64 bit    3027.6    8.7    8.7    5300   54.4%  2990.0   42.0% 
   3 Houdini 1.5a x64         3025.2    7.2    7.2    7884   49.7%  3027.6   39.1% 
   4 Komodo 4468.00 64 bit    3024.8    8.7    8.7    5298   54.1%  2990.1   42.4% 
   5 Komodo 4471.01 64 bit    3021.6    8.6    8.6    5321   53.7%  2990.0   43.5% 
   6 Komodo 5 64 bit dev      3020.7    8.7    8.7    5313   53.7%  2990.1   43.1% 
   7 Critter 1.4 64-bit SSE4  3000.0    7.1    7.1    7957   46.7%  3027.6   44.4% 
   8 Stockfish 2.2.2 JA       2945.0    7.1    7.1    7921   40.4%  3027.6   42.4% 


120+2

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4467.01 64 bit    3036.1    8.5    8.5    5594   55.0%  2992.1   43.9% 
   2 Houdini 1.5a x64         3030.6    6.0    6.0   11500   50.4%  3027.7   42.1% 
   3 Komodo 4463.00 64 bit    3029.7    6.1    6.1   10939   54.3%  2992.3   44.6% 
   4 Komodo 4466.02 64 bit    3027.1    7.5    7.5    7127   53.9%  2992.2   45.0% 
   5 Komodo 5 64 bit dev      3021.9    6.1    6.1   10906   53.4%  2992.2   44.4% 
   6 Critter 1.4 64-bit SSE4  3000.0    5.9    5.9   11530   46.8%  3027.7   45.5% 
   7 Stockfish 2.2.2 JA       2946.2    5.9    5.9   11536   40.7%  3027.7   45.9% 
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.

We stopped testing it long ago because we knew it was superior - by the same token we no longer test Komodo 3 either. But we decided to put it on one of the tests even though we are not longer interested just to verify that we have made the progress we think we have. But we rarely go back and test version that we have long ago lost interest in simply because it's a huge waste of resources.

I wonder if it was really a big improvement relative to other versions of komodo or maybe there is some mistake in the data or some problem in the machine that tested it.

geots · Post by **geots** » Fri Sep 28, 2012 2:38 pm

It doesn't matter if you enter this discussion to only talk about the weather in Bora Bora, it is a no-no and a grave error for one of the top members of a competing team to be seen anywhere near this discussion.

gts

Uri Blass · Post by **Uri Blass** » Fri Sep 28, 2012 2:54 pm

Don wrote:
Uri Blass wrote:
Don wrote:
Houdini wrote:
Don wrote:Hi Robert,

The results so far are pretty impressive. It looks like you have a nice gain here.

I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.

What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.

Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.

Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.

So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.

Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
Code: Select all
60+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4485.00 64 bit    3032.5    6.8    6.8    8867   56.7%  2972.4   42.6% 
   2 Komodo 4476.04 64 bit    3022.3    6.3    6.3   10206   54.6%  2982.3   44.1% 
   3 Komodo 4483.00 64 bit    3021.5    6.3    6.3   10181   54.7%  2979.3   44.0% 
   4 Komodo 4485.19 64 bit    3020.2    6.8    6.8    8848   54.2%  2983.2   44.8% 
   5 Komodo 4477.45 64 bit    3019.6    6.8    6.8    8714   54.8%  2977.5   44.5% 
   6 Komodo 4487.06 64 bit    3019.3    6.4    6.4    9859   54.1%  2983.2   43.9% 
   7 Houdini 1.5a x64         3018.0    3.7    3.7   30090   50.6%  3012.1   41.9% 
   8 Komodo 4481.02 64 bit    3017.7    6.9    6.9    8555   54.5%  2978.3   44.5% 
   9 Komodo 4482.02 64 bit    3017.3   11.0   11.0    3358   54.0%  2981.2   44.3% 
  10 Komodo 4481.00 64 bit    3017.0    5.8    5.8   12144   53.8%  2983.2   43.9% 
  11 Komodo 4477.08 64 bit    3015.6    9.0    9.0    5051   54.7%  2974.5   42.6% 
  12 Komodo 4479.00 64 bit    3014.6   11.0   11.0    3349   53.9%  2979.5   44.3% 
  13 Komodo 4477.15 64 bit    3013.8    5.4    5.4   13705   54.6%  2973.8   44.7% 
  14 Critter 1.4 64-bit SSE4  3000.0    3.0    3.0   44334   48.6%  3011.8   45.8% 
  15 Stockfish 2.2.2 JA       2931.7    3.0    3.0   44317   40.3%  3017.8   44.6% 


90+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4471.02 64 bit    3060.9   12.9   12.9    2530   57.9%  2990.0   35.4% 
   2 Komodo 4467.01 64 bit    3027.6    8.7    8.7    5300   54.4%  2990.0   42.0% 
   3 Houdini 1.5a x64         3025.2    7.2    7.2    7884   49.7%  3027.6   39.1% 
   4 Komodo 4468.00 64 bit    3024.8    8.7    8.7    5298   54.1%  2990.1   42.4% 
   5 Komodo 4471.01 64 bit    3021.6    8.6    8.6    5321   53.7%  2990.0   43.5% 
   6 Komodo 5 64 bit dev      3020.7    8.7    8.7    5313   53.7%  2990.1   43.1% 
   7 Critter 1.4 64-bit SSE4  3000.0    7.1    7.1    7957   46.7%  3027.6   44.4% 
   8 Stockfish 2.2.2 JA       2945.0    7.1    7.1    7921   40.4%  3027.6   42.4% 


120+2

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4467.01 64 bit    3036.1    8.5    8.5    5594   55.0%  2992.1   43.9% 
   2 Houdini 1.5a x64         3030.6    6.0    6.0   11500   50.4%  3027.7   42.1% 
   3 Komodo 4463.00 64 bit    3029.7    6.1    6.1   10939   54.3%  2992.3   44.6% 
   4 Komodo 4466.02 64 bit    3027.1    7.5    7.5    7127   53.9%  2992.2   45.0% 
   5 Komodo 5 64 bit dev      3021.9    6.1    6.1   10906   53.4%  2992.2   44.4% 
   6 Critter 1.4 64-bit SSE4  3000.0    5.9    5.9   11530   46.8%  3027.7   45.5% 
   7 Stockfish 2.2.2 JA       2946.2    5.9    5.9   11536   40.7%  3027.7   45.9% 
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.
We stopped testing it long ago because we knew it was superior - by the same token we no longer test Komodo 3 either. But we decided to put it on one of the tests even though we are not longer interested just to verify that we have made the progress we think we have. But we rarely go back and test version that we have long ago lost interest in simply because it's a huge waste of resources.

I do not understand.
I understand that you stopped testing komodo3 because it is obviously inferior but this is not similiar because 4471.02 64 bit has a significantly higher rating based on your results.

I understand if you do not test it because you know that there was a bug in the test and the result does not make sense(this is what albert silver suggests) but your post does not say it.

bupalo · Post by **bupalo** » Fri Sep 28, 2012 3:12 pm

to all the programmers please stop all these engine matches and try to concentrate on position. i analyse with all the 5 top engines and I can tell you everyone sometime see things other don't see

Don · Post by **Don** » Fri Sep 28, 2012 4:00 pm

bupalo wrote:to all the programmers please stop all these engine matches and try to concentrate on position. i analyse with all the 5 top engines and I can tell you everyone sometime see things other don't see

We are definitely interested in positions too - so every time you run across one of this you would be doing us a kindness to make a database of them. When you get a few, post them for us with the move that you believe should have been played, or should have been avoided and we will look at them.

lkaufman · Post by **lkaufman** » Fri Sep 28, 2012 5:27 pm

Uri Blass wrote:
Don wrote:
Uri Blass wrote:
Don wrote:
Houdini wrote:
Don wrote:Hi Robert,

The results so far are pretty impressive. It looks like you have a nice gain here.

I think you have to run some long tests like you are doing here in order to really know that you have improved the program. I have noticed that a lot of programs are coming out with new versions that have impressive ELO gains until they are tested at "real" time controls. It's almost certainly a by-product of the fact that you are forced to test this fast to resolve small ELO improvements. It's more and more difficult to get big ELO improvements from a single change.

What makes your results impressive, ignoring the large error margin of course, is that at long time controls the relative ELO difference between programs tends to close up significantly.

Don
Don, thank you, we both know how hard work every Elo point gain is.
Before this run my slowest test match with Houdini 3 was at 2'+2" so I'm very happy that at about 30 times longer TC the gain is still significant.
Hopefully you'll catch up with Komodo, it's more fun for everyone if there's a good competition at the top.

Robert
Actually, I would not want everyone to lay down and die. I want the competition and I doubt Komodo would be very strong if everything had stagnated 5 years ago.

So yes, we are trying to catch Houdini - our current dev version is almost certainly better than Houdini 1.5 but it's difficult to catch a moving target so please sit still for a minute or two.

Here are some results based on my distributed tester, where volunteers use their machines to help me test at much longer time controls (the time controls are adjusted to the hardware where the stated time control represents a very fast overclocked machine.) In these tests Komodo never plays other versions of itself.
Code: Select all
60+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4485.00 64 bit    3032.5    6.8    6.8    8867   56.7%  2972.4   42.6% 
   2 Komodo 4476.04 64 bit    3022.3    6.3    6.3   10206   54.6%  2982.3   44.1% 
   3 Komodo 4483.00 64 bit    3021.5    6.3    6.3   10181   54.7%  2979.3   44.0% 
   4 Komodo 4485.19 64 bit    3020.2    6.8    6.8    8848   54.2%  2983.2   44.8% 
   5 Komodo 4477.45 64 bit    3019.6    6.8    6.8    8714   54.8%  2977.5   44.5% 
   6 Komodo 4487.06 64 bit    3019.3    6.4    6.4    9859   54.1%  2983.2   43.9% 
   7 Houdini 1.5a x64         3018.0    3.7    3.7   30090   50.6%  3012.1   41.9% 
   8 Komodo 4481.02 64 bit    3017.7    6.9    6.9    8555   54.5%  2978.3   44.5% 
   9 Komodo 4482.02 64 bit    3017.3   11.0   11.0    3358   54.0%  2981.2   44.3% 
  10 Komodo 4481.00 64 bit    3017.0    5.8    5.8   12144   53.8%  2983.2   43.9% 
  11 Komodo 4477.08 64 bit    3015.6    9.0    9.0    5051   54.7%  2974.5   42.6% 
  12 Komodo 4479.00 64 bit    3014.6   11.0   11.0    3349   53.9%  2979.5   44.3% 
  13 Komodo 4477.15 64 bit    3013.8    5.4    5.4   13705   54.6%  2973.8   44.7% 
  14 Critter 1.4 64-bit SSE4  3000.0    3.0    3.0   44334   48.6%  3011.8   45.8% 
  15 Stockfish 2.2.2 JA       2931.7    3.0    3.0   44317   40.3%  3017.8   44.6% 


90+1

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4471.02 64 bit    3060.9   12.9   12.9    2530   57.9%  2990.0   35.4% 
   2 Komodo 4467.01 64 bit    3027.6    8.7    8.7    5300   54.4%  2990.0   42.0% 
   3 Houdini 1.5a x64         3025.2    7.2    7.2    7884   49.7%  3027.6   39.1% 
   4 Komodo 4468.00 64 bit    3024.8    8.7    8.7    5298   54.1%  2990.1   42.4% 
   5 Komodo 4471.01 64 bit    3021.6    8.6    8.6    5321   53.7%  2990.0   43.5% 
   6 Komodo 5 64 bit dev      3020.7    8.7    8.7    5313   53.7%  2990.1   43.1% 
   7 Critter 1.4 64-bit SSE4  3000.0    7.1    7.1    7957   46.7%  3027.6   44.4% 
   8 Stockfish 2.2.2 JA       2945.0    7.1    7.1    7921   40.4%  3027.6   42.4% 


120+2

Rank Name                       Elo      +      -    games   score   oppo.   draws 
   1 Komodo 4467.01 64 bit    3036.1    8.5    8.5    5594   55.0%  2992.1   43.9% 
   2 Houdini 1.5a x64         3030.6    6.0    6.0   11500   50.4%  3027.7   42.1% 
   3 Komodo 4463.00 64 bit    3029.7    6.1    6.1   10939   54.3%  2992.3   44.6% 
   4 Komodo 4466.02 64 bit    3027.1    7.5    7.5    7127   53.9%  2992.2   45.0% 
   5 Komodo 5 64 bit dev      3021.9    6.1    6.1   10906   53.4%  2992.2   44.4% 
   6 Critter 1.4 64-bit SSE4  3000.0    5.9    5.9   11530   46.8%  3027.7   45.5% 
   7 Stockfish 2.2.2 JA       2946.2    5.9    5.9   11536   40.7%  3027.7   45.9% 
It seems based on the results that Komodo 4471.02 64 bit is significantly stronger than other versions but for some reason you tested it only in the 90+1 list.
We stopped testing it long ago because we knew it was superior - by the same token we no longer test Komodo 3 either. But we decided to put it on one of the tests even though we are not longer interested just to verify that we have made the progress we think we have. But we rarely go back and test version that we have long ago lost interest in simply because it's a huge waste of resources.
I do not understand.
I understand that you stopped testing komodo3 because it is obviously inferior but this is not similiar because 4471.02 64 bit has a significantly higher rating based on your results.

I understand if you do not test it because you know that there was a bug in the test and the result does not make sense(this is what albert silver suggests) but your post does not say it.

Disregard the high result for version 4471.02 as it was due to a flaw in the tester relating to the 50 move rule, which version 4471.02 revealed. The 60 + 1 test is with the corrected version of the tester.

carldaman · Post by **carldaman** » Sat Sep 29, 2012 4:38 am

Technically that would be 1/Phi.
Phi (cap P) = 1.618...

CL

Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta

Re: Long TC matches with Houdini 3 Beta