Komodo run - Ingo list revisited

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Komodo run - Ingo list revisitied.

Post by Don »

Milos wrote: I pretty sure RH has the same kind of setup with avarage opponent rating as in Ingo list or CCRL, and after he's satisfied with contempt 0 strength of the engine, he then optimizes contempt to provide highest rating and than normalizes that one to 1 ;).
I think that is likely.

As has been discussed here before, not knowing who you are playing can be a serious disadvantage. If a program such as Komodo or Houdini is playing down 500 ELO and is playing black and comes out of the opening with -25 score it is going to force a draw right away if it is possible. But that is a horrible strategy if you are playing 500 ELO down. You will probably win even if you are half a pawn down.

So what is to be done? Unless you know the relative ELO of the engine you are playing compared to your own, I don't think there is a general solution. Even if a given program is the best today, what happens in a few years when it's playing tournaments and it no longer that good?

Komodo has a contempt of only 7 ELO - the idea is not so much to "beat up" on weaker programs but just to offset the white advantage a bit and give Komodo some incentive not to take a quick draw. I think in Houdini it is more than just contempt, Roberts contempt involves some asymmetrical evaluation and is designed to take advantage of weaker programs - which makes sense only for a top program.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: Komodo run - Ingo list revisited.

Post by IWB »

Ajedrecista wrote:
... this development version of Komodo has earned around 24 Elo plus/minus uncertainties (around ± 14 Elo taking into account a difference between two normal distributions of 3036 ± 10 and 3060 ± 10, writing from memory) since version 5.1r2 or similar. Am I right?


Ajedrecista.
Not quite:

Excerpt of the full list:

Code: Select all

   2 K113300                    3062   11   11  3000   78%  2838   30% 
   3 Komodo 6                   3042   10   10  3300   76%  2837   33% 
   4 Komodo CCT                 3035    9    9  3750   74%  2850   34% 
   5 Komodo 5.1                 3023   11   10  2850   74%  2839   34% 
Of course a winning Komdo over the No1 is hammering down the first spot! ...
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo run - Ingo list revisitied.

Post by lkaufman »

Don wrote: Komodo has a contempt of only 7 ELO - the idea is not so much to "beat up" on weaker programs but just to offset the white advantage a bit and give Komodo some incentive not to take a quick draw.

Don meant to say 7 centipawns, not 7 elo.

Larry
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Komodo run - Ingo list revisitied.

Post by Don »

lkaufman wrote:
Don wrote: Komodo has a contempt of only 7 ELO - the idea is not so much to "beat up" on weaker programs but just to offset the white advantage a bit and give Komodo some incentive not to take a quick draw.
Don meant to say 7 centipawns, not 7 elo.

Larry
Yes, that is what I meant.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Uri Blass
Posts: 10268
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo run - Ingo list revisited

Post by Uri Blass »

I think that good result against weak opponent can also be explained by lack of weaknesses so you do not lose against weak opponents.

I do not say that it is part of the explanation for houdini's good results and
I do not know.


For example a program that does heavy pruning may
lose games against weak programs because sometimes it miss a tactics that the weak program can see and having less pruning may be counter productive against strong programs but make it better relative to the weak programs in more positions.

I would like to know if the only reason that komodo score worse than houdini against weak opponents is having more draws against weak opponents(or maybe komodo has also more losses against weak opponents relative to houdini and in this case it suggests that the problem is not only contempt).
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Komodo run - Ingo list revisited

Post by Don »

Uri Blass wrote:I think that good result against weak opponent can also be explained by lack of weaknesses so you do not lose against weak opponents.

I do not say that it is part of the explanation for houdini's good results and
I do not know.


For example a program that does heavy pruning may
lose games against weak programs because sometimes it miss a tactics that the weak program can see and having less pruning may be counter productive against strong programs but make it better relative to the weak programs in more positions.

I would like to know if the only reason that komodo score worse than houdini against weak opponents is having more draws against weak opponents(or maybe komodo has also more losses against weak opponents relative to houdini and in this case it suggests that the problem is not only contempt).
Those are all interesting questions. Ingo has agreed to re-run this test but by setting Houdini's contempt to zero. I don't know what impact it will have but my prediction is that Houdini will indeed score better against Komodo, but will dip lower in absolute ELO.

This raises some issues about how testing should be done. I like Ingo's test because it is predictiable - a straight multi-round robin and thus there is no room for manipulation. But other lists are based on whatever the tester wants to test - and who you test against affects the results. So in addition to the normal statistical noise which is a bear to deal with you have this.

A possible solution to this is to run rating agencies with strict pairing rules - perhaps as a constant series of swiss tournaments - which is a more natural schedule. In human tournaments Carlsen never plays "Don Dailey" in a tournament but in these lists often very strong programs play very weak programs and a swiss tournament is more natural pairing arrangement. But I feel that it has it's issues too because if you run 1000 swiss tournaments you will find that the pairings are not all that balanced either, a natural consequence of swiss is that you will still get a weak program playing a strong program in early rounds but it will often be the SAME weak program. I have done simulations. You can mitigate this problem someone by ignoring the rating and randomizing the players and pretending you know nothing about each player.

The big round robin is probably the only really structured way to handle this despite the fact that you get a huge number of mis-matches.

Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
bupalo
Posts: 82
Joined: Fri Mar 16, 2012 2:04 pm

Re: Komodo run - Ingo list revisited

Post by bupalo »

HI Don, when komodo 7? After the nTCEC?
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Komodo run - Ingo list revisited

Post by Don »

bupalo wrote:HI Don, when komodo 7? After the nTCEC?
We do not have a plan yet, but we don't like to release too soon and we like the ELO gain to at least be reasonable.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Komodo run - Ingo list revisited

Post by Milos »

Uri Blass wrote:I think that good result against weak opponent can also be explained by lack of weaknesses so you do not lose against weak opponents.

I do not say that it is part of the explanation for houdini's good results and
I do not know.
Statistically you are wrong. When playing opponent that is 400Elo weaker chance to loose is almost neglidgable.
The effect of contempt is to convert certain percentage of drawn games into decided ones. The proporation of the decided games will again be much more on the side of a stronger engine.
So numerical example.
You played 100 games and original result is 60+/37=/-3, i.e. 78.5% or 225Elo.
Winning/lossing chances ratio is 20:1. Than you increased the contepemt and got only 30 draws this time. 7 more games are decided and lets say now weaker engine improved its chances to 6:1 so the final result is 66+/30=/4-, i.e. 81% or 252Elo.

The thing is unless you blow your contempt out of proportions, in additionally decided games stronger engine will always win much more games than weaker and will benefit from it.

I hope you see the point.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Komodo run - Ingo list revisited

Post by Milos »

Another example just to illustrate how it backfires against equally strong engine.
So you play 100 games with 0 contempt and the result is: 20+/60=/20-.
Winning chances in decided games are 1:1.
Now you increase the contempt to 1, and 12 more games are decided, but now the other engine has better chances i.e. your winning chances are now 1:3. So the final result is now 23+/48=/29- or 47% i.e. you lose 21Elo.

That is how SF and K6 are stronger than H3 in testing.