I decided to test the hypothesis that Komodo scales better than Houdini 3 with more time by running matches between the latest Komodo Dev. version against Houdini 3 at various time controls. By using my 20 core, my 16 core, and a fast quad I was able to get in a lot of games in a few days. All matches were run with HT off, using Little Blitzer, running one game per core, time limit reduced by 25% for the quad games for approximate equivalence. Here are the results, from Komodo's perspective:
1' + .5": Lost by 31 elo (4,860 games).
2' + 1" : Lost by 16 elo (5,000 games).
4' + 2": Lost by 7 elo (2,026 games).
40' + 20":Won by 13 elo (638 games).
Based on these results, Komodo gains about 8 elo relative to Houdini for each doubling of the time limit in the specified range. The "break even" point should be about 8' + 4".
These results are quite consistent with the rating lists, given that the version of Komodo tested here is about + 10 elo over Komodo 6.
Scaling Study
Moderator: Ras
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
-
Vinvin
- Posts: 5325
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: Scaling Study
That sounds nice for the future and long analysis 
Please, give :
1) average depth for all TC.
2) NPS for Houdini 3 on each machine.
Thanks,
Vincent
Please, give :
1) average depth for all TC.
2) NPS for Houdini 3 on each machine.
Thanks,
Vincent
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Scaling Study
average depth for Houdini at the four time controls: 18.15, 19.63, 21.00, 26.28.Vinvin wrote:That sounds nice for the future and long analysis
Please, give :
1) average depth for all TC.
2) NPS for Houdini 3 on each machine.
Thanks,
Vincent
same for Komodo: 17.09, 18.47, 19.98, 24.90.
Houdini average kilinodes per second: 20 core 2415, 16 core 2578, quad 3232. I gave the quad 3/4 of the time for each test so as to make it equivalent to the 20 core. Komodo nodes per second about 68% of Houdini in each case.
Larry
-
beram
- Posts: 1187
- Joined: Wed Jan 06, 2010 3:11 pm
Re: Scaling Study
Perhaps they are consistent on your PC but not in ratinglists
Btw you always mentioned in june that in your tests Komodo 5.1 was already stronger than Houdini, that also wasnt to be found in ratinglists
When we look at Komodo 5 and Houdini 3 for which we have the most comparison material in the lists, we see the following differences in ELO between H3 and K5:
When the lists for Komodo 6 become more reliable, I will make a comparison later on
Btw you always mentioned in june that in your tests Komodo 5.1 was already stronger than Houdini, that also wasnt to be found in ratinglists
When we look at Komodo 5 and Houdini 3 for which we have the most comparison material in the lists, we see the following differences in ELO between H3 and K5:
Code: Select all
CCRL 4/40 - 56
CCRL 40/40 - 51
CEGT 4/40 - 81
CEGT 40/20pb - 55
CEGT 40/120 - 59
-
Modern Times
- Posts: 3842
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Scaling Study
Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.
Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.
I'm also confident in Larry's results now that he has HT off on his machines.
Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Scaling Study
Most of the rating lists - or ANY test for that matter is unreliable and is based on specific test conditions which may or may not apply evenly. Larry's test is trustworthy, but applies specifically to the machines he used and his testing conditions such as time control, hash settings, etc. Your mileage may vary.Modern Times wrote:Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.
Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.
In the past our tests of foreign programs were biased, not purposely, but just because we didn't always understand how hardware impacted the results along with testing conditions. For example Komodo did very well when we overprovisioned our machines - something that always proved to be reliable when testing only Komodo versions.
I think you are correct about the Houdini issue, it seems to be a fact that we do well against strong programs but not quite as well against weak programs. If we tested 100 programs we probably would find that Houdini beats the weaker ones more decisively than Komodo, even if Komodo beats Houdini under those same conditions. Robert Houdart himself has stated that his contempt stuff is more than just draw score contempt and is designed to beat weaker programs and I assume it is a successful heuristic.
Houdini 3 is still a stronger program at the time controls used by the testing groups but I think it's pretty clear that we scale better - but perhaps your mileage will vary if you use different testing conditions/ hardware, etc...
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
michiguel
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Scaling Study
It is very relative: It could be that Komodo scales well, or it could be that Houdini scales bad, or it could mean that Komodo is weak or Houdini strong at fast TC **IF** the results converge at a constant number and the difference stalls.Modern Times wrote:Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.
Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.
It is certainly interesting.
Miguel
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Scaling Study
Some of my older results were wrong not only due to the HT issue but also to issues relating to using LittleBlitzer on the big machines. I'm confident that I now understand all the issues involved and am testing in the right way now. Due to some of these issues, my older tests on the big machines were favoring Komodo over Houdini so I wrongly thought we had reached parity at the faster levels. Now I'm getting consistent results and NPS ratios on all machines, so I fully trust results done this way regardless of the hardware. I'm only talking about Intel though, there may still be Intel vs. AMD issues.Modern Times wrote:Most rating lists don't have the large numbers of games that Larry has above.
I'm also confident in Larry's results now that he has HT off on his machines.
Those results are certainly very interesting, but it is just one opponent (Komodo vs Houdini) so it does't tell us much. If he repeated those same tests Komodo vs Stockfish, that would be fascinating I think.
I may try the same with Stockfish, although I don't expect to see much scaling difference between Komodo and Stockfish. But I have a problem with Stockfish losing too many games on time in the LittleBlitzer tester.
-
Modern Times
- Posts: 3842
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Scaling Study
Increase the Emergency Base Time UCI parameter to at least 200. Martin uses 1000 for TCEC.lkaufman wrote: But I have a problem with Stockfish losing too many games on time in the LittleBlitzer tester.
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Scaling Study
Thanks. is 200 the value used in CCRL, CEGT, and LightSpeed lists? I don't want to use a higher value than normal as it might be unfair to SF.Modern Times wrote:Increase the Emergency Base Time UCI parameter to at least 200. Martin uses 1000 for TCEC.lkaufman wrote: But I have a problem with Stockfish losing too many games on time in the LittleBlitzer tester.