Number 1 engine on long time controls

Uri Blass · Post by **Uri Blass** » Thu Mar 01, 2012 2:30 pm

diep wrote:Small question: how many years did your slow laptop take to play 2000 games at those time controls?

if 00 == 6 1 then level 05 == 192 minutes and some increment?

6 00
12 01
24 02
48 03
96 04
192 05

Better use a big cluster with hundreds of cores for the testing if you want to finish playing 2000 games like that within a few weeks.

I undertood that
level 0 =6 seconds per game +0.1 seconds increasement per move.
It is not 6 minutes per game+1 second per move.

2000 games at 6 seconds per game+0.1 seconds per move take less than 30 seconds per game that mean less than 1000 minutes for level 0.

2000 games at level 5 can be finished in some weeks.

Of course if you multiply the time control by 60 you need some years instead of some weeks.

Don · Post by **Don** » Thu Mar 01, 2012 2:34 pm

diep wrote:Small question: how many years did your slow laptop take to play 2000 games at those time controls?

To go through level 08 I figure it will take at least 4 months. However I changed things a bit with my new study because it's really a waste of CPU resources to play 00 vs 08, so now I am not allowing games between players more than 2 levels different. So 05 can play 04 and 03 but not 02. I had to restart the test of course.

if 00 == 6 1 then level 05 == 192 minutes and some increment?

6 00
12 01
24 02
48 03
96 04
192 05

Better use a big cluster with hundreds of cores for the testing if you want to finish playing 2000 games like that within a few weeks.

Don wrote:
Kingghidorah wrote:If you were asking about SP version then I would pick Komodo at those time controls on modern i7 or AMD hardware. Also, I would pick Houdini 1.5 over Houdini 2.0 at long time controls.
DOn why do u say that about 1.5 over 2.0, optimized better for longer time controls?
If you look at the lists you will notice that at the shorter time controls Houdini 2.0 is a big improvement, but at the longer time controls you see Houdini 1.5 catching up.

Anyone that pays attention can easily see that every program has different scaling characteristics. The Ippo clones clearly are dominant at fast time controls but stockfish starts to catch up at longer time controls. Our own internal testing makes this completely obvious - depending on the levels some program are sure to be stronger or weaker than others and it's consistent.

Here is an interesting study I'm doing, which shows Komodo scaling relative to Houdini 1.5:
Code: Select all
                                                                                                           
Level where 00 is 6 + 0.1 and each successive level is double.                                             
                                                                                                           
                     Komodo                                                                                
            HOUDINI   gains                                                                                
            -------  ------                                                                                
 Level 00 -  +143.3                                                                                        
 Leval 01 -   +97.0   +46.3                                                                                
 Leval 02 -   +74.6   +22.4                                                                                
 Level 03 -   +52.8   +21.8                                                                                
 Level 04 -   +39.5   +13.3                                                                                
 Level 05 -   +27.0   +12.5                                                                                
This is a long running test on a slow laptop. Komodo gains several ELO relatively to Houdini for each successful doubling. There is still significant error in 2000 games (each level is 2000 games) so it's hard to be precise, but after 3 more doubling's Komodo will be winning against Houdini 1.5 if it picks up 10 more ELO each time. But the amount it gains per doubling appears to drop a bit with each doubling too so it's really difficult to predict the level at which Komodo is superior.

This is a development version of Komodo which is a little bit stronger than our release version.

[/quote]

noctiferus · Post by **noctiferus** » Thu Mar 01, 2012 3:57 pm

Hi, Don.

I'm not taking any position in this discussion, up to now.
As a former professor of Probability, Statistics and Data Mining in my University, I thought a bit about your first test, and I had some perplexities about your conclusions, mainly due to the use of point estimates, not taking into account the confidence intervals of your predictions, that are fundamental in drawing correct statistical conclusions (I'm waiting for the licence of my preferred statistical software, expected in the next week, to make my own analyses and to share them with you), and due to the use of a linear or cubic model, proposed here, for interpolation-extrapolation (this choice basically excludes the possibility of reaching an asymptotic value, that is a relevant alternative, as could be done with an exponential model...It seems sort of "petitio principii").

Now I see you changed the test methodology. Would you please tell me if I understood well your planned experiment, and, in the case, correct me?
At generic level i:
Round robin among engines:
Houdini at time level i
Houdini at time level i-1
Houdini at time level i-2
Komodo at time level i
Komodo at time level i-1
Komodo at time level i-2
Every match among engines: 2000 games.
Did I understand well?
Thanks for your attention
Enrico

lkaufman · Post by **lkaufman** » Thu Mar 01, 2012 4:21 pm

SzG wrote:Is this thread about Komodo being better than Houdini if enough cores and enough time are used on a hardware no mortals have?

It seems the days bestowed on this debate could have been better used to develop Komodo so that it won't require stellar means to beat Houdini. Or is that hopeless?

It seems both from the study and from CEGT tests at 40/2 hours that the crossover level is somewhere around 40/2 hours with one core. If no mortals have time for 40/2 hours there would be no human chess tournaments. I expect that our next release will be competitive with Houdini at the CCRL control of 40/40'. Don't worry, development of Komodo is not delayed by this discussion!

Master Om · Post by **Master Om** » Thu Mar 01, 2012 5:16 pm

Can U give an answer of the question i asked above to don ?

lkaufman · Post by **lkaufman** » Thu Mar 01, 2012 5:51 pm

If you want to find the best chance for a draw in a bad position, set the value of drawscore to some high value (maybe to 100 or to whatever is the highest value it will accept).

Master Om · Post by **Master Om** » Thu Mar 01, 2012 5:54 pm

Ok Thanks.

Don · Post by **Don** » Thu Mar 01, 2012 10:48 pm

noctiferus wrote:Hi, Don.

I'm not taking any position in this discussion, up to now.
As a former professor of Probability, Statistics and Data Mining in my University, I thought a bit about your first test, and I had some perplexities about your conclusions, mainly due to the use of point estimates, not taking into account the confidence intervals of your predictions, that are fundamental in drawing correct statistical conclusions (I'm waiting for the licence of my preferred statistical software, expected in the next week, to make my own analyses and to share them with you), and due to the use of a linear or cubic model, proposed here, for interpolation-extrapolation (this choice basically excludes the possibility of reaching an asymptotic value, that is a relevant alternative, as could be done with an exponential model...It seems sort of "petitio principii").

I welcome your input, Probability and Statistics is not my forte and if there is anything wrong with my methodology I'm willing to hear about it.

Now I see you changed the test methodology. Would you please tell me if I understood well your planned experiment, and, in the case, correct me?
At generic level i:
Round robin among engines:
Houdini at time level i
Houdini at time level i-1
Houdini at time level i-2
Komodo at time level i
Komodo at time level i-1
Komodo at time level i-2
Every match among engines: 2000 games.
Did I understand well?
Thanks for your attention
Enrico

I am going to play Houdini vs Komodo at levels 00 through 08 where each level is a doubling in time. Level 00 is 6 seconds + 0.1 increment Fischer time control.

Since it's terribly inefficient playing 08 vs 01 I am confining all matches to within 2 levels. What that means is that 05 will play 04 and 03 but not 02. It will also play 06 and 07 but not 08. Komodo will play BOTH Houdini and other Komodo's at all relevant levels, i.e. k4 will play these programs: k2, k3, k5, k6, h2, h3, h4, h5 and h6.

I have not fixed the number of games in this test, I really intend to let this go as long as possible (several weeks or even months which it may take) and keep the levels balanced. If it's desirable statistically to define the exact test in advance I will do this on your recommendation but I would like the total games played for each version to be at least 2000 games, which means about 200-250 per pairing. The top 2 levels and bottom 2 levels will not have the same number of games due to the rule that no program plays more then 2 levels up or down. We could say 300 games per pairing, each of 150 different starting positions.

I want to understand this fully, I know that programs that scale differently tend to move away from each other with depth, if the scale the same they tend to approach each other. But when both are happening to an extent it's hard to separate the two things except perhaps by comparing self-play games for either program.

The plan is that when I get significant data I will plot 2 lines using gnuplot where the Y-AXIS is the level (00 - 08) and then I can Y adjust the lines for one of the programs to force them to cross over each other. Of course if the scalability is very similar they won't cross over but appear nearly on the same slope. Of course this can be treated statistically too without the plot but as they say, "a picture is worth a thousand words."

Don

noctiferus · Post by **noctiferus** » Fri Mar 02, 2012 12:09 pm

I'll think about it.
Txs for the answer.
Enrico

diep · Post by **diep** » Fri Mar 02, 2012 1:20 pm

Don wrote:
noctiferus wrote:Hi, Don.

I'm not taking any position in this discussion, up to now.
As a former professor of Probability, Statistics and Data Mining in my University, I thought a bit about your first test, and I had some perplexities about your conclusions, mainly due to the use of point estimates, not taking into account the confidence intervals of your predictions, that are fundamental in drawing correct statistical conclusions (I'm waiting for the licence of my preferred statistical software, expected in the next week, to make my own analyses and to share them with you), and due to the use of a linear or cubic model, proposed here, for interpolation-extrapolation (this choice basically excludes the possibility of reaching an asymptotic value, that is a relevant alternative, as could be done with an exponential model...It seems sort of "petitio principii").

I welcome your input, Probability and Statistics is not my forte and if there is anything wrong with my methodology I'm willing to hear about it.

Now I see you changed the test methodology. Would you please tell me if I understood well your planned experiment, and, in the case, correct me?
At generic level i:
Round robin among engines:
Houdini at time level i
Houdini at time level i-1
Houdini at time level i-2
Komodo at time level i
Komodo at time level i-1
Komodo at time level i-2
Every match among engines: 2000 games.
Did I understand well?
Thanks for your attention
Enrico
I am going to play Houdini vs Komodo at levels 00 through 08 where each level is a doubling in time. Level 00 is 6 seconds + 0.1 increment Fischer time control.

Since it's terribly inefficient playing 08 vs 01 I am confining all matches to within 2 levels. What that means is that 05 will play 04 and 03 but not 02. It will also play 06 and 07 but not 08. Komodo will play BOTH Houdini and other Komodo's at all relevant levels, i.e. k4 will play these programs: k2, k3, k5, k6, h2, h3, h4, h5 and h6.

I have not fixed the number of games in this test, I really intend to let this go as long as possible (several weeks or even months which it may take) and keep the levels balanced. If it's desirable statistically to define the exact test in advance I will do this on your recommendation but I would like the total games played for each version to be at least 2000 games, which means about 200-250 per pairing. The top 2 levels and bottom 2 levels will not have the same number of games due to the rule that no program plays more then 2 levels up or down. We could say 300 games per pairing, each of 150 different starting positions.

I want to understand this fully, I know that programs that scale differently tend to move away from each other with depth, if the scale the same they tend to approach each other. But when both are happening to an extent it's hard to separate the two things except perhaps by comparing self-play games for either program.

The plan is that when I get significant data I will plot 2 lines using gnuplot where the Y-AXIS is the level (00 - 08) and then I can Y adjust the lines for one of the programs to force them to cross over each other. Of course if the scalability is very similar they won't cross over but appear nearly on the same slope. Of course this can be treated statistically too without the plot but as they say, "a picture is worth a thousand words."

Don

6 seconds a game type time controls are giving zero information back as compared to alternative forms of testing.

Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls

Re: Number 1 engine on long time controls