Komodo 4 on long time control

mwyoung · Post by **mwyoung** » Sat Dec 03, 2011 8:37 pm

Houdini wrote:
lkaufman wrote: I think you are wrong on this last point. I'm combining the blitz results of ccrl with the blitz results of cegt to get a larger sample, and then comparing the slow results of the two organizations to get a larger sample. So the margin of error should be divided by roughly the square root of two, bringing it back down to your original 20 elo estimate. Twenty is less than 25, so even if I accept your twenty value the chance that Houdini 1.5 scales better than 2.0 is about 99% based on this data.
No, you miss the point that you're using 4 ratings to compute the relative scaling. With individual rating errors of 20 Elo one simply cannot make any statistically sound conclusions about micro-differences of plus or minus 10 Elo. And that even ignores the fact that you're cherry-picking the rating lists to base your conclusions on...
From all the test results at my disposal (including some private rating list results with 6 CPU), there is no evidence that Houdini 2 scales any differently than Houdini 1.5.

Robert

This is easy to settle...

Don and Larry needs to meet one of their deadlines and release Komodo 4 for testing.

All they are waiting for is a version of Komodo 4 that can overtake Houdini in the rating lists. With all the trash talk from them about Houdini's problems. Why the delay?

mwyoung · Post by **mwyoung** » Sat Dec 03, 2011 8:54 pm

Houdini wrote:
lkaufman wrote: I think you are wrong on this last point. I'm combining the blitz results of ccrl with the blitz results of cegt to get a larger sample, and then comparing the slow results of the two organizations to get a larger sample. So the margin of error should be divided by roughly the square root of two, bringing it back down to your original 20 elo estimate. Twenty is less than 25, so even if I accept your twenty value the chance that Houdini 1.5 scales better than 2.0 is about 99% based on this data.
No, you miss the point that you're using 4 ratings to compute the relative scaling. With individual rating errors of 20 Elo one simply cannot make any statistically sound conclusions about micro-differences of plus or minus 10 Elo. And that even ignores the fact that you're cherry-picking the rating lists to base your conclusions on...
From all the test results at my disposal (including some private rating list results with 6 CPU), there is no evidence that Houdini 2 scales any differently than Houdini 1.5.

Robert

More data from CCRL was release today for Houdini 2.0 at 40/40.

Houdini 2.0 4cpu 3335 +/- 56
Houdini 1.5a 4cpu 3301 +/- 22

Houdini 2.0 4cpu results
103 games: 45 wins, 15 losses, 43 draws (41.7%), score: 64.6%

Results not supporting Don and Larry's scaling theory.

Houdini 2.0 beat Stockfish 2.1.1 by 64% at 40/4

Houdini 2.0 beat Stockfish 2.1.1 by 65.4% at 40/40

data for CCRL under complete list tab....

gerold · Post by **gerold** » Sat Dec 03, 2011 8:57 pm

mwyoung wrote:
Houdini wrote:
lkaufman wrote: I think you are wrong on this last point. I'm combining the blitz results of ccrl with the blitz results of cegt to get a larger sample, and then comparing the slow results of the two organizations to get a larger sample. So the margin of error should be divided by roughly the square root of two, bringing it back down to your original 20 elo estimate. Twenty is less than 25, so even if I accept your twenty value the chance that Houdini 1.5 scales better than 2.0 is about 99% based on this data.
No, you miss the point that you're using 4 ratings to compute the relative scaling. With individual rating errors of 20 Elo one simply cannot make any statistically sound conclusions about micro-differences of plus or minus 10 Elo. And that even ignores the fact that you're cherry-picking the rating lists to base your conclusions on...
From all the test results at my disposal (including some private rating list results with 6 CPU), there is no evidence that Houdini 2 scales any differently than Houdini 1.5.

Robert

More data from CCRL was release today for Houdini 2.0 at 40/40.

Houdini 2.0 4cpu 3335 +/- 56
Houdini 1.5a 4cpu 3301 +/- 22

Houdni 2.0 4cpu results
103 games: 45 wins, 15 losses, 43 draws (41.7%), score: 64.6%

Results not supporting Don and Larry's scaling theory.

Thanks for the update. If Houdini 2 keeps showing improvement over version 1.5 with more games played i may have to switch to version 2 for my testing.

beram · Post by **beram** » Sat Dec 03, 2011 9:15 pm

Don wrote:
Houdini wrote:
lkaufman wrote: We will almost surely release before mid-Dec. The delay has been in getting MP working, as Don gets sidetracked making program improvements. Today we made a nice gain that should put us too close to call with Houdini at 40/40 based on CCRL ratings, though we will probably not catch Houdini at blitz levels with this release. The CCRL and CEGT tests show us scaling much better than Houdini, and also show Houdini 2.0 scaling much worse than Houdini 1.5. In fact we may pass Houdini 2.0 on some lists without passing Houdini 1.5! This lower rating of 2.0 vs. 1.5 is why people assume that Houdini is static.
Most (in fact all except 2) rating lists show a definite improvement for Houdini 2, from single-core to 6 CPU, typically about 20 Elo points.
For example, take a look at the most recent result: the SWCR rating that was published today for Houdini. Houdini 2.0c shows +24 Elo and is 60 Elo ahead of Komodo 3.

SWCR is played at 40/10 with Ponder ON. This is not miles away from the 40/40 with ponder off. On the average the difference in search depth must be a single ply, there is no reason to expect any fundamental difference in performance. Why do you pretend that this would be the case?

The bottom-line is that you cherry-pick one or two rating list results - while dismissing all the other results as "blitz" - to make claims about the strength or scaling of Houdini - about which you know very little (as you admit yourself). Please stop the nonsense.

Robert
It has come to our attention that Komodo benefit's much more from the SSE instructions that other programs, and that in the SWCR testing Frank uses the lowest common denominator, which is the komodo binary with SSE off. Houdini automatically switches it on so it is not a fair test at all.

Don, is a difference of 7 points that much more ? (according to latest 40/40 CCRL list)
Assuming that others profit by 3-4 points, than Komodo benefits 3-4 points ELO more - WOW isn't that amazing

Code: Select all

Komodo 3 64-bit SSE	3216	+22	−22	54.5%	−29.2	45.7%	604
Komodo 3 64-bit	3209	+25	−24	62.1%	−80.7	41.1%	542

lkaufman · Post by **lkaufman** » Sat Dec 03, 2011 9:59 pm

mwyoung wrote:[
Don and Larry needs to meet one of their deadlines and release Komodo 4 for testing.

All they are waiting for is a version of Komodo 4 that can overtake Houdini in the rating lists. With all the trash talk from them about Houdini's problems. Why the delay?

Basically, our problem is that we have two simultaneous projects going on. One is to come out with an MP version of Komodo as of the date code was frozen for it, and the other is the continued improvement of the SP version, which will eventually migrate to the MP.

I can propose an experiment you or anyone else out there with some good hardware can do. Run a long match of Komodo 3 64-bit SSE against Houdini (1.5 or 2.0, take your pick) 32-bit at some fairly long time control, maybe something like 25 minutes plus 15 seconds increment (i.e. five times what IPON uses), preferably with Ponder off so you can run twice as many games at once. The point is that our latest Komodo version is roughly improved over Komodo 3 by the same amount that Houdini 64-bit outrates Houdini 32-bit. So whatever the result of such a match might be (I haven't run it so I don't know), it should be a good predictor for how our latest version will do against Houdini 64 bit at whatever time control you use. I know that Houdini 32-bit kills Komodo 3 64 bit at bullet chess (1' per game or anything comparably fast), so my expectation of a close result at long time controls rests squarely on my belief that Komodo scales much better than Houdini. I hope someone runs this test and posts the results here. But please, we need a few hundred games, not 50 or so. Of course if several people each run 50, that works fine!

lkaufman · Post by **lkaufman** » Sat Dec 03, 2011 10:01 pm

rvida wrote:
MM wrote:i see that Komodo and Rybka 4.1 are the engines that improve their level of play with longer TC (it's what i'm saying from ages about Komodo).
Or it might be other way around - their level of play decrease with faster TC. There are rumors (on the rybka forum) that R4.1 has particularly bad time management.

I'm not sure what the difference is. Either way, the point is that results at blitz underrate the performance at long time limits of the better scaling engine, whatever the reason.

rvida · Post by **rvida** » Sat Dec 03, 2011 10:14 pm

lkaufman wrote:
rvida wrote:
MM wrote:i see that Komodo and Rybka 4.1 are the engines that improve their level of play with longer TC (it's what i'm saying from ages about Komodo).
Or it might be other way around - their level of play decrease with faster TC. There are rumors (on the rybka forum) that R4.1 has particularly bad time management.
I'm not sure what the difference is. Either way, the point is that results at blitz underrate the performance at long time limits of the better scaling engine, whatever the reason.

My point was that maybe Komodo does not "scale better", but instead something is holding it back at very fast TC. For example if you spend some 10ms setting up search, initializing tables or whatever, it might be a serious handicap at ultra fast games but is negligible at long TC. Another reason might be overstepping allocated time by few ms at each move and running into time trouble. Engines checking more often for stop condition generally perform better at fast TC. After how many nodes do you check the time?

gerold · Post by **gerold** » Sat Dec 03, 2011 10:17 pm

lkaufman wrote:
mwyoung wrote:[
Don and Larry needs to meet one of their deadlines and release Komodo 4 for testing.

All they are waiting for is a version of Komodo 4 that can overtake Houdini in the rating lists. With all the trash talk from them about Houdini's problems. Why the delay?
Basically, our problem is that we have two simultaneous projects going on. One is to come out with an MP version of Komodo as of the date code was frozen for it, and the other is the continued improvement of the SP version, which will eventually migrate to the MP.

I can propose an experiment you or anyone else out there with some good hardware can do. Run a long match of Komodo 3 64-bit SSE against Houdini (1.5 or 2.0, take your pick) 32-bit at some fairly long time control, maybe something like 25 minutes plus 15 seconds increment (i.e. five times what IPON uses), preferably with Ponder off so you can run twice as many games at once. The point is that our latest Komodo version is roughly improved over Komodo 3 by the same amount that Houdini 64-bit outrates Houdini 32-bit. So whatever the result of such a match might be (I haven't run it so I don't know), it should be a good predictor for how our latest version will do against Houdini 64 bit at whatever time control you use. I know that Houdini 32-bit kills Komodo 3 64 bit at bullet chess (1' per game or anything comparably fast), so my expectation of a close result at long time controls rests squarely on my belief that Komodo scales much better than Houdini. I hope someone runs this test and posts the results here. But please, we need a few hundred games, not 50 or so. Of course if several people each run 50, that works fine!

Running a match now. With 5/6 TC. Perfect 12 abk. both engines.
Houdini 1.5 w32. vs. Komodo 3 -32. result so far. 9-0-9.
Houdini is killing Komodo after 18 games. Komodo has not won one game yet.

Best,
Gerold.

MM · Post by MM » Sat Dec 03, 2011 10:26 pm

gerold wrote:
lkaufman wrote:
mwyoung wrote:[
Don and Larry needs to meet one of their deadlines and release Komodo 4 for testing.

All they are waiting for is a version of Komodo 4 that can overtake Houdini in the rating lists. With all the trash talk from them about Houdini's problems. Why the delay?
Basically, our problem is that we have two simultaneous projects going on. One is to come out with an MP version of Komodo as of the date code was frozen for it, and the other is the continued improvement of the SP version, which will eventually migrate to the MP.

I can propose an experiment you or anyone else out there with some good hardware can do. Run a long match of Komodo 3 64-bit SSE against Houdini (1.5 or 2.0, take your pick) 32-bit at some fairly long time control, maybe something like 25 minutes plus 15 seconds increment (i.e. five times what IPON uses), preferably with Ponder off so you can run twice as many games at once. The point is that our latest Komodo version is roughly improved over Komodo 3 by the same amount that Houdini 64-bit outrates Houdini 32-bit. So whatever the result of such a match might be (I haven't run it so I don't know), it should be a good predictor for how our latest version will do against Houdini 64 bit at whatever time control you use. I know that Houdini 32-bit kills Komodo 3 64 bit at bullet chess (1' per game or anything comparably fast), so my expectation of a close result at long time controls rests squarely on my belief that Komodo scales much better than Houdini. I hope someone runs this test and posts the results here. But please, we need a few hundred games, not 50 or so. Of course if several people each run 50, that works fine!
Running a match now. With 5/6 TC. Perfect 12 abk. both engines.
Houdini 1.5 w32. vs. Komodo 3 -32. result so far. 9-0-9.
Houdini is killing Komodo after 18 games. Komodo has not won one game yet.

Best,
Gerold.

I understood Komodo 3 64 bit SSE - Houdini 1.5a 32 bit.

tomgdrums · Post by **tomgdrums** » Sat Dec 03, 2011 10:41 pm

mwyoung wrote:
Houdini wrote:
lkaufman wrote: I think you are wrong on this last point. I'm combining the blitz results of ccrl with the blitz results of cegt to get a larger sample, and then comparing the slow results of the two organizations to get a larger sample. So the margin of error should be divided by roughly the square root of two, bringing it back down to your original 20 elo estimate. Twenty is less than 25, so even if I accept your twenty value the chance that Houdini 1.5 scales better than 2.0 is about 99% based on this data.
No, you miss the point that you're using 4 ratings to compute the relative scaling. With individual rating errors of 20 Elo one simply cannot make any statistically sound conclusions about micro-differences of plus or minus 10 Elo. And that even ignores the fact that you're cherry-picking the rating lists to base your conclusions on...
From all the test results at my disposal (including some private rating list results with 6 CPU), there is no evidence that Houdini 2 scales any differently than Houdini 1.5.

Robert
This is easy to settle...

Don and Larry needs to meet one of their deadlines and release Komodo 4 for testing.

All they are waiting for is a version of Komodo 4 that can overtake Houdini in the rating lists. With all the trash talk from them about Houdini's problems. Why the delay?

+10

Don and Larry do indeed talk a lot of smack in the forum.

They seemed to have become obsessed with overtaking Houdini!! It is like watching the computer chess version of "Moby Dick"!!

Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control

Re: Komodo 4 on long time control