Scaling Study

lkaufman · Post by **lkaufman** » Fri Oct 25, 2013 7:56 pm

I decided to test the hypothesis that Komodo scales better than Houdini 3 with more time by running matches between the latest Komodo Dev. version against Houdini 3 at various time controls. By using my 20 core, my 16 core, and a fast quad I was able to get in a lot of games in a few days. All matches were run with HT off, using Little Blitzer, running one game per core, time limit reduced by 25% for the quad games for approximate equivalence. Here are the results, from Komodo's perspective:

1' + .5": Lost by 31 elo (4,860 games).
2' + 1" : Lost by 16 elo (5,000 games).
4' + 2": Lost by 7 elo (2,026 games).
40' + 20":Won by 13 elo (638 games).

Based on these results, Komodo gains about 8 elo relative to Houdini for each doubling of the time limit in the specified range. The "break even" point should be about 8' + 4".
These results are quite consistent with the rating lists, given that the version of Komodo tested here is about + 10 elo over Komodo 6.

Vinvin · Post by **Vinvin** » Sat Oct 26, 2013 2:13 am

That sounds nice for the future and long analysis

Please, give :
1) average depth for all TC.
2) NPS for Houdini 3 on each machine.

Thanks,
Vincent

lkaufman · Post by **lkaufman** » Sat Oct 26, 2013 4:13 am

Vinvin wrote:That sounds nice for the future and long analysis
Please, give :
1) average depth for all TC.
2) NPS for Houdini 3 on each machine.

Thanks,
Vincent

average depth for Houdini at the four time controls: 18.15, 19.63, 21.00, 26.28.
same for Komodo: 17.09, 18.47, 19.98, 24.90.
Houdini average kilinodes per second: 20 core 2415, 16 core 2578, quad 3232. I gave the quad 3/4 of the time for each test so as to make it equivalent to the 20 core. Komodo nodes per second about 68% of Houdini in each case.

Larry

beram · Post by **beram** » Sat Oct 26, 2013 5:24 pm

Perhaps they are consistent on your PC but not in ratinglists
Btw you always mentioned in june that in your tests Komodo 5.1 was already stronger than Houdini, that also wasnt to be found in ratinglists

When we look at Komodo 5 and Houdini 3 for which we have the most comparison material in the lists, we see the following differences in ELO between H3 and K5:

Code: Select all

CCRL 4/40 - 56
CCRL 40/40 - 51
CEGT 4/40 - 81
CEGT 40/20pb - 55
CEGT 40/120 - 59

When the lists for Komodo 6 become more reliable, I will make a comparison later on

Modern Times · Post by **Modern Times** » Sat Oct 26, 2013 5:52 pm

Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.

Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.

Don · Post by **Don** » Sat Oct 26, 2013 6:13 pm

Modern Times wrote:Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.

Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.

Most of the rating lists - or ANY test for that matter is unreliable and is based on specific test conditions which may or may not apply evenly. Larry's test is trustworthy, but applies specifically to the machines he used and his testing conditions such as time control, hash settings, etc. Your mileage may vary.

In the past our tests of foreign programs were biased, not purposely, but just because we didn't always understand how hardware impacted the results along with testing conditions. For example Komodo did very well when we overprovisioned our machines - something that always proved to be reliable when testing only Komodo versions.

I think you are correct about the Houdini issue, it seems to be a fact that we do well against strong programs but not quite as well against weak programs. If we tested 100 programs we probably would find that Houdini beats the weaker ones more decisively than Komodo, even if Komodo beats Houdini under those same conditions. Robert Houdart himself has stated that his contempt stuff is more than just draw score contempt and is designed to beat weaker programs and I assume it is a successful heuristic.

Houdini 3 is still a stronger program at the time controls used by the testing groups but I think it's pretty clear that we scale better - but perhaps your mileage will vary if you use different testing conditions/ hardware, etc...

Don

michiguel · Post by **michiguel** » Sat Oct 26, 2013 6:19 pm

Modern Times wrote:Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.

Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.

It is very relative: It could be that Komodo scales well, or it could be that Houdini scales bad, or it could mean that Komodo is weak or Houdini strong at fast TC **IF** the results converge at a constant number and the difference stalls.

It is certainly interesting.

Miguel

lkaufman · Post by **lkaufman** » Sat Oct 26, 2013 8:02 pm

Modern Times wrote:Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.

Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.

Some of my older results were wrong not only due to the HT issue but also to issues relating to using LittleBlitzer on the big machines. I'm confident that I now understand all the issues involved and am testing in the right way now. Due to some of these issues, my older tests on the big machines were favoring Komodo over Houdini so I wrongly thought we had reached parity at the faster levels. Now I'm getting consistent results and NPS ratios on all machines, so I fully trust results done this way regardless of the hardware. I'm only talking about Intel though, there may still be Intel vs. AMD issues.

I may try the same with Stockfish, although I don't expect to see much scaling difference between Komodo and Stockfish. But I have a problem with Stockfish losing too many games on time in the LittleBlitzer tester.

Modern Times · Post by **Modern Times** » Sun Oct 27, 2013 12:24 am

lkaufman wrote: But I have a problem with Stockfish losing too many games on time in the LittleBlitzer tester.

Increase the Emergency Base Time UCI parameter to at least 200. Martin uses 1000 for TCEC.

lkaufman · Post by **lkaufman** » Sun Oct 27, 2013 12:31 am

Modern Times wrote:
lkaufman wrote: But I have a problem with Stockfish losing too many games on time in the LittleBlitzer tester.
Increase the Emergency Base Time UCI parameter to at least 200. Martin uses 1000 for TCEC.

Thanks. is 200 the value used in CCRL, CEGT, and LightSpeed lists? I don't want to use a higher value than normal as it might be unfair to SF.

Scaling Study

Scaling Study

Re: Scaling Study

Re: Scaling Study

Re: Scaling Study

Re: Scaling Study

Re: Scaling Study

Re: Scaling Study

Re: Scaling Study

Re: Scaling Study

Re: Scaling Study