Komodo run - Ingo list revisited

Don · Post by **Don** » Fri Nov 08, 2013 2:52 pm

Milos wrote: I pretty sure RH has the same kind of setup with avarage opponent rating as in Ingo list or CCRL, and after he's satisfied with contempt 0 strength of the engine, he then optimizes contempt to provide highest rating and than normalizes that one to 1 .

I think that is likely.

As has been discussed here before, not knowing who you are playing can be a serious disadvantage. If a program such as Komodo or Houdini is playing down 500 ELO and is playing black and comes out of the opening with -25 score it is going to force a draw right away if it is possible. But that is a horrible strategy if you are playing 500 ELO down. You will probably win even if you are half a pawn down.

So what is to be done? Unless you know the relative ELO of the engine you are playing compared to your own, I don't think there is a general solution. Even if a given program is the best today, what happens in a few years when it's playing tournaments and it no longer that good?

Komodo has a contempt of only 7 ELO - the idea is not so much to "beat up" on weaker programs but just to offset the white advantage a bit and give Komodo some incentive not to take a quick draw. I think in Houdini it is more than just contempt, Roberts contempt involves some asymmetrical evaluation and is designed to take advantage of weaker programs - which makes sense only for a top program.

IWB · Post by **IWB** » Fri Nov 08, 2013 3:33 pm

Ajedrecista wrote:
... this development version of Komodo has earned around 24 Elo plus/minus uncertainties (around ± 14 Elo taking into account a difference between two normal distributions of 3036 ± 10 and 3060 ± 10, writing from memory) since version 5.1r2 or similar. Am I right?

Ajedrecista.

Not quite:

Excerpt of the full list:

Code: Select all

   2 K113300                    3062   11   11  3000   78%  2838   30% 
   3 Komodo 6                   3042   10   10  3300   76%  2837   33% 
   4 Komodo CCT                 3035    9    9  3750   74%  2850   34% 
   5 Komodo 5.1                 3023   11   10  2850   74%  2839   34%

Of course a winning Komdo over the No1 is hammering down the first spot! ...

lkaufman · Post by **lkaufman** » Fri Nov 08, 2013 4:23 pm

Don wrote: Komodo has a contempt of only 7 ELO - the idea is not so much to "beat up" on weaker programs but just to offset the white advantage a bit and give Komodo some incentive not to take a quick draw.

Don meant to say 7 centipawns, not 7 elo.

Larry

Don · Post by **Don** » Fri Nov 08, 2013 4:33 pm

lkaufman wrote:
Don wrote: Komodo has a contempt of only 7 ELO - the idea is not so much to "beat up" on weaker programs but just to offset the white advantage a bit and give Komodo some incentive not to take a quick draw.
Don meant to say 7 centipawns, not 7 elo.

Larry

Yes, that is what I meant.

Uri Blass · Post by **Uri Blass** » Fri Nov 08, 2013 5:03 pm

I think that good result against weak opponent can also be explained by lack of weaknesses so you do not lose against weak opponents.

I do not say that it is part of the explanation for houdini's good results and
I do not know.

For example a program that does heavy pruning may
lose games against weak programs because sometimes it miss a tactics that the weak program can see and having less pruning may be counter productive against strong programs but make it better relative to the weak programs in more positions.

I would like to know if the only reason that komodo score worse than houdini against weak opponents is having more draws against weak opponents(or maybe komodo has also more losses against weak opponents relative to houdini and in this case it suggests that the problem is not only contempt).

Don · Post by **Don** » Fri Nov 08, 2013 5:17 pm

Uri Blass wrote:I think that good result against weak opponent can also be explained by lack of weaknesses so you do not lose against weak opponents.

I do not say that it is part of the explanation for houdini's good results and
I do not know.

For example a program that does heavy pruning may
lose games against weak programs because sometimes it miss a tactics that the weak program can see and having less pruning may be counter productive against strong programs but make it better relative to the weak programs in more positions.

I would like to know if the only reason that komodo score worse than houdini against weak opponents is having more draws against weak opponents(or maybe komodo has also more losses against weak opponents relative to houdini and in this case it suggests that the problem is not only contempt).

Those are all interesting questions. Ingo has agreed to re-run this test but by setting Houdini's contempt to zero. I don't know what impact it will have but my prediction is that Houdini will indeed score better against Komodo, but will dip lower in absolute ELO.

This raises some issues about how testing should be done. I like Ingo's test because it is predictiable - a straight multi-round robin and thus there is no room for manipulation. But other lists are based on whatever the tester wants to test - and who you test against affects the results. So in addition to the normal statistical noise which is a bear to deal with you have this.

A possible solution to this is to run rating agencies with strict pairing rules - perhaps as a constant series of swiss tournaments - which is a more natural schedule. In human tournaments Carlsen never plays "Don Dailey" in a tournament but in these lists often very strong programs play very weak programs and a swiss tournament is more natural pairing arrangement. But I feel that it has it's issues too because if you run 1000 swiss tournaments you will find that the pairings are not all that balanced either, a natural consequence of swiss is that you will still get a weak program playing a strong program in early rounds but it will often be the SAME weak program. I have done simulations. You can mitigate this problem someone by ignoring the rating and randomizing the players and pretending you know nothing about each player.

The big round robin is probably the only really structured way to handle this despite the fact that you get a huge number of mis-matches.

Don

bupalo · Post by **bupalo** » Fri Nov 08, 2013 5:21 pm

HI Don, when komodo 7? After the nTCEC?

Don · Post by **Don** » Fri Nov 08, 2013 5:36 pm

bupalo wrote:HI Don, when komodo 7? After the nTCEC?

We do not have a plan yet, but we don't like to release too soon and we like the ELO gain to at least be reasonable.

Milos · Post by **Milos** » Fri Nov 08, 2013 5:55 pm

Uri Blass wrote:I think that good result against weak opponent can also be explained by lack of weaknesses so you do not lose against weak opponents.

I do not say that it is part of the explanation for houdini's good results and
I do not know.

Statistically you are wrong. When playing opponent that is 400Elo weaker chance to loose is almost neglidgable.
The effect of contempt is to convert certain percentage of drawn games into decided ones. The proporation of the decided games will again be much more on the side of a stronger engine.
So numerical example.
You played 100 games and original result is 60+/37=/-3, i.e. 78.5% or 225Elo.
Winning/lossing chances ratio is 20:1. Than you increased the contepemt and got only 30 draws this time. 7 more games are decided and lets say now weaker engine improved its chances to 6:1 so the final result is 66+/30=/4-, i.e. 81% or 252Elo.

The thing is unless you blow your contempt out of proportions, in additionally decided games stronger engine will always win much more games than weaker and will benefit from it.

I hope you see the point.

Milos · Post by **Milos** » Fri Nov 08, 2013 6:16 pm

Another example just to illustrate how it backfires against equally strong engine.
So you play 100 games with 0 contempt and the result is: 20+/60=/20-.
Winning chances in decided games are 1:1.
Now you increase the contempt to 1, and 12 more games are decided, but now the other engine has better chances i.e. your winning chances are now 1:3. So the final result is now 23+/48=/29- or 47% i.e. you lose 21Elo.

That is how SF and K6 are stronger than H3 in testing.

Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisited.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisitied.

Re: Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisited

Re: Komodo run - Ingo list revisited