Komodo 4 on long time control

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Komodo 4 on long time control

Post by lkaufman »

Houdini wrote:
lkaufman wrote:While I agree that differences between blitz results and tournament level results will not be enormous (as in 100 or more elo), I think your estimate of a limit of 10 is way too low. Without even comparing unrelated programs, just compare Houdini 1.5a and Houdini 2.0. All blitz tests on single core (CCRL, CEGT, IPON, and others) agree that Houdini 2.0 is stronger (ccrl by 12 elo, cegt by 4 elo, others a bit higher, so average a bit over 10 elo). This is based on thousands of . At 40/20 or 40/40 we find the opposite, H1.5 is stronger by 12 elo on CCRL and by 13 on CEGT, all with decent size samples. There is no slow data on 4 cores for 2.0 on either CCRL or CEGT that meets the minimum number of games to avoid being greyed out. So we have a net swing just going from blitz to an average of 40/30 of about 25 elo, just for two successive versions of the same program!! I know that some of this 25 could be sample error, but even if half of it is bogus it would indicate that going from blitz to 40/2 could swing relative ratings by 25 elo just in this one case. Surely with unrelated programs the swing could be much greater. I think that roughly 50 elo is the maximum likely swing in relative ratings going from blitz to 40/2.
Your whole "scaling" story is statistically unsound. Individual errors on heterogeneous rating lists are easily 20 Elo points. Comparing two engines implies that the error on the comparison will easily be 30 points (1.4 * 20). Here you're actually comparing 4 ratings results, the uncertainty of this is easily 40 Elo (2 * 20).

Robert
I think you are wrong on this last point. I'm combining the blitz results of ccrl with the blitz results of cegt to get a larger sample, and then comparing the slow results of the two organizations to get a larger sample. So the margin of error should be divided by roughly the square root of two, bringing it back down to your original 20 elo estimate. Twenty is less than 25, so even if I accept your twenty value the chance that Houdini 1.5 scales better than 2.0 is about 99% based on this data.
We know from our own work on Komodo that it is very easy to introduce changes that help at bullet speed but which hurt at longer time controls. We try hard to avoid this, but perhaps we too could fall into this trap in some future version.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Komodo 4 on long time control

Post by MM »

Uri Blass wrote:Here is the list of the top programs and I see only little difference between CCRL 40/4 and CCRL 40/40

program are ranked here based on average rating between CCRL 40/40 and CCRL 40/4(the last numebr in every line is the standard rating advantage of the program and usaully the good programs have negative standard rating advantage.

1)Houdini 1.5a 64-bit 4CPU 3332.5(-63)
2)Critter 1.2 64-bit 4CPU 3282.5(-47)
3)Rybka 4.1 64-bit 4CPU 3281(-34)
4)Rybka 4 64-bit 4CPU 3274.5(-33)
5)Stockfish 2.1.1 64-bit 4CPU 3263.5(-65)
6)Houdini 1.5a 64-bit 3260(-8)
7)Stockfish 2.0.1 64-bit 4CPU 3251.5(-21)
8)Rybka 3 64-bit 4CPU 3245.5(-33)
9)Stockfish 1.9.1 64-bit 4CPU 3234.5(-27)
10)Stockfish 1.7.1 64-bit 4CP 3224(-24)
11)Stockfish 1.8 64-bit 4CPU 3223.5(-19)
12)Rybka4 64-bit 2CPU 3216(-2)
13)Critter0.90 64-bit 4CPU 3209.5(-29)
14)Rybka 4.1 64-bit 3208.5(+7)
15)Citter1.2 64-bit 3205.5(-5)
16)Komodo 3 64-bit 3202.5(+13)
17)Rybka 4 64-bit 3198(-2)
18)Rybka 3 64-bit 2CPU 3195(-46)
19)Naum4.2 64-bit 4CPU 3187(-14)
20)Stockfish 2.1.1 64-bit 3184.5(-9)
Hi,

thanks but what do you mean with ''standard rating advantage''?

Regards
MM
lkaufman
Posts: 6279
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Komodo 4 on long time control

Post by lkaufman »

Uri Blass wrote:Here is the list of the top programs and I see only little difference between CCRL 40/4 and CCRL 40/40

program are ranked here based on average rating between CCRL 40/40 and CCRL 40/4(the last numebr in every line is the standard rating advantage of the program and usaully the good programs have negative standard rating advantage.

1)Houdini 1.5a 64-bit 4CPU 3332.5(-63)
2)Critter 1.2 64-bit 4CPU 3282.5(-47)
3)Rybka 4.1 64-bit 4CPU 3281(-34)
4)Rybka 4 64-bit 4CPU 3274.5(-33)
5)Stockfish 2.1.1 64-bit 4CPU 3263.5(-65)
6)Houdini 1.5a 64-bit 3260(-8)
7)Stockfish 2.0.1 64-bit 4CPU 3251.5(-21)
8)Rybka 3 64-bit 4CPU 3245.5(-33)
9)Stockfish 1.9.1 64-bit 4CPU 3234.5(-27)
10)Stockfish 1.7.1 64-bit 4CP 3224(-24)
11)Stockfish 1.8 64-bit 4CPU 3223.5(-19)
12)Rybka4 64-bit 2CPU 3216(-2)
13)Critter0.90 64-bit 4CPU 3209.5(-29)
14)Rybka 4.1 64-bit 3208.5(+7)
15)Citter1.2 64-bit 3205.5(-5)
16)Komodo 3 64-bit 3202.5(+13)
17)Rybka 4 64-bit 3198(-2)
18)Rybka 3 64-bit 2CPU 3195(-46)
19)Naum4.2 64-bit 4CPU 3187(-14)
20)Stockfish 2.1.1 64-bit 3184.5(-9)
I see two positive numbers on your list, +13 for Komodo 3 and +7 for Rybka 4.1. All of the Houdinis, Critters, and Stockfishes have negative numbers, four of them more than 45 negative. You can see why this leads me to believe that Komodo scales better than most other programs.
Uri Blass
Posts: 11100
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 4 on long time control

Post by Uri Blass »

MM wrote:
Uri Blass wrote:Here is the list of the top programs and I see only little difference between CCRL 40/4 and CCRL 40/40

program are ranked here based on average rating between CCRL 40/40 and CCRL 40/4(the last numebr in every line is the standard rating advantage of the program and usaully the good programs have negative standard rating advantage.

1)Houdini 1.5a 64-bit 4CPU 3332.5(-63)
2)Critter 1.2 64-bit 4CPU 3282.5(-47)
3)Rybka 4.1 64-bit 4CPU 3281(-34)
4)Rybka 4 64-bit 4CPU 3274.5(-33)
5)Stockfish 2.1.1 64-bit 4CPU 3263.5(-65)
6)Houdini 1.5a 64-bit 3260(-8)
7)Stockfish 2.0.1 64-bit 4CPU 3251.5(-21)
8)Rybka 3 64-bit 4CPU 3245.5(-33)
9)Stockfish 1.9.1 64-bit 4CPU 3234.5(-27)
10)Stockfish 1.7.1 64-bit 4CP 3224(-24)
11)Stockfish 1.8 64-bit 4CPU 3223.5(-19)
12)Rybka4 64-bit 2CPU 3216(-2)
13)Critter0.90 64-bit 4CPU 3209.5(-29)
14)Rybka 4.1 64-bit 3208.5(+7)
15)Citter1.2 64-bit 3205.5(-5)
16)Komodo 3 64-bit 3202.5(+13)
17)Rybka 4 64-bit 3198(-2)
18)Rybka 3 64-bit 2CPU 3195(-46)
19)Naum4.2 64-bit 4CPU 3187(-14)
20)Stockfish 2.1.1 64-bit 3184.5(-9)
Hi,

thanks but what do you mean with ''standard rating advantage''?

Regards
I mean standard ccrl rating minus blitz ccrl rating.
Most top programs have higher blitz ccrl rating
Uri Blass
Posts: 11100
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 4 on long time control

Post by Uri Blass »

lkaufman wrote:
Uri Blass wrote:Here is the list of the top programs and I see only little difference between CCRL 40/4 and CCRL 40/40

program are ranked here based on average rating between CCRL 40/40 and CCRL 40/4(the last numebr in every line is the standard rating advantage of the program and usaully the good programs have negative standard rating advantage.

1)Houdini 1.5a 64-bit 4CPU 3332.5(-63)
2)Critter 1.2 64-bit 4CPU 3282.5(-47)
3)Rybka 4.1 64-bit 4CPU 3281(-34)
4)Rybka 4 64-bit 4CPU 3274.5(-33)
5)Stockfish 2.1.1 64-bit 4CPU 3263.5(-65)
6)Houdini 1.5a 64-bit 3260(-8)
7)Stockfish 2.0.1 64-bit 4CPU 3251.5(-21)
8)Rybka 3 64-bit 4CPU 3245.5(-33)
9)Stockfish 1.9.1 64-bit 4CPU 3234.5(-27)
10)Stockfish 1.7.1 64-bit 4CP 3224(-24)
11)Stockfish 1.8 64-bit 4CPU 3223.5(-19)
12)Rybka4 64-bit 2CPU 3216(-2)
13)Critter0.90 64-bit 4CPU 3209.5(-29)
14)Rybka 4.1 64-bit 3208.5(+7)
15)Critter1.2 64-bit 3205.5(-5)
16)Komodo 3 64-bit 3202.5(+13)
17)Rybka 4 64-bit 3198(-2)
18)Rybka 3 64-bit 2CPU 3195(-46)
19)Naum4.2 64-bit 4CPU 3187(-14)
20)Stockfish 2.1.1 64-bit 3184.5(-9)
I see two positive numbers on your list, +13 for Komodo 3 and +7 for Rybka 4.1. All of the Houdinis, Critters, and Stockfishes have negative numbers, four of them more than 45 negative. You can see why this leads me to believe that Komodo scales better than most other programs.
I understand but note that 4 cpu programs have bigger negative numbers

If you look only in the one cpu programs in this list you get the following that suggest only a small advantage for komodo at longer time control of 10-20 elo relative to other programs

Houdini 1.5a 64-bit 3260(-8)
Rybka 4.1 64-bit 3208.5(+7)
Critter1.2 64-bit 3205.5(-5)
Komodo 3 64-bit 3202.5(+13)
Rybka 4 64-bit 3198(-2)
Stockfish 2.1.1 64-bit 3184.5(-9)
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Komodo 4 on long time control

Post by MM »

Uri Blass wrote:
MM wrote:
Uri Blass wrote:Here is the list of the top programs and I see only little difference between CCRL 40/4 and CCRL 40/40

program are ranked here based on average rating between CCRL 40/40 and CCRL 40/4(the last numebr in every line is the standard rating advantage of the program and usaully the good programs have negative standard rating advantage.

1)Houdini 1.5a 64-bit 4CPU 3332.5(-63)
2)Critter 1.2 64-bit 4CPU 3282.5(-47)
3)Rybka 4.1 64-bit 4CPU 3281(-34)
4)Rybka 4 64-bit 4CPU 3274.5(-33)
5)Stockfish 2.1.1 64-bit 4CPU 3263.5(-65)
6)Houdini 1.5a 64-bit 3260(-8)
7)Stockfish 2.0.1 64-bit 4CPU 3251.5(-21)
8)Rybka 3 64-bit 4CPU 3245.5(-33)
9)Stockfish 1.9.1 64-bit 4CPU 3234.5(-27)
10)Stockfish 1.7.1 64-bit 4CP 3224(-24)
11)Stockfish 1.8 64-bit 4CPU 3223.5(-19)
12)Rybka4 64-bit 2CPU 3216(-2)
13)Critter0.90 64-bit 4CPU 3209.5(-29)
14)Rybka 4.1 64-bit 3208.5(+7)
15)Citter1.2 64-bit 3205.5(-5)
16)Komodo 3 64-bit 3202.5(+13)
17)Rybka 4 64-bit 3198(-2)
18)Rybka 3 64-bit 2CPU 3195(-46)
19)Naum4.2 64-bit 4CPU 3187(-14)
20)Stockfish 2.1.1 64-bit 3184.5(-9)
Hi,

thanks but what do you mean with ''standard rating advantage''?

Regards
I mean standard ccrl rating minus blitz ccrl rating.
Most top programs have higher blitz ccrl rating
Thank you, now i understand and i can agree with Mr kaufman, or, to better say, i see that Komodo and Rybka 4.1 are the engines that improve their level of play with longer TC (it's what i'm saying from ages about Komodo).

Regards
MM
User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

Re: Komodo 4 on long time control

Post by rvida »

MM wrote:i see that Komodo and Rybka 4.1 are the engines that improve their level of play with longer TC (it's what i'm saying from ages about Komodo).
Or it might be other way around - their level of play decrease with faster TC. There are rumors (on the rybka forum) that R4.1 has particularly bad time management.
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Komodo 4 on long time control

Post by Houdini »

lkaufman wrote: I think you are wrong on this last point. I'm combining the blitz results of ccrl with the blitz results of cegt to get a larger sample, and then comparing the slow results of the two organizations to get a larger sample. So the margin of error should be divided by roughly the square root of two, bringing it back down to your original 20 elo estimate. Twenty is less than 25, so even if I accept your twenty value the chance that Houdini 1.5 scales better than 2.0 is about 99% based on this data.
No, you miss the point that you're using 4 ratings to compute the relative scaling. With individual rating errors of 20 Elo one simply cannot make any statistically sound conclusions about micro-differences of plus or minus 10 Elo. And that even ignores the fact that you're cherry-picking the rating lists to base your conclusions on...
From all the test results at my disposal (including some private rating list results with 6 CPU), there is no evidence that Houdini 2 scales any differently than Houdini 1.5.

Robert
User avatar
Houdini
Posts: 1471
Joined: Tue Mar 16, 2010 12:00 am

Re: Komodo 4 on long time control

Post by Houdini »

Uri Blass wrote:Houdini 1.5a 64-bit 3260(-8)
Rybka 4.1 64-bit 3208.5(+7)
Critter1.2 64-bit 3205.5(-5)
Komodo 3 64-bit 3202.5(+13)
Rybka 4 64-bit 3198(-2)
Stockfish 2.1.1 64-bit 3184.5(-9)
This very much demonstrates that all these engines "scale" exactly the same.
The tiny differences lie very comfortably within the statistical confidence interval of the rating list.

Robert
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: Komodo 4 on long time control

Post by MM »

rvida wrote:
MM wrote:i see that Komodo and Rybka 4.1 are the engines that improve their level of play with longer TC (it's what i'm saying from ages about Komodo).
Or it might be other way around - their level of play decrease with faster TC. There are rumors (on the rybka forum) that R4.1 has particularly bad time management.
Hi Mr Vida,

yes, it is a question of point of view.
As regards Rybka 4.1, i used it for a long time. It has two options to set the time management. But i think that default is good even if i think that it is not the best, but i don't think it should make a big difference in elo.

Regards
MM