Komodo 8 results summary

Discussion of computer chess matches and engine tournaments.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
lkaufman
Posts: 3760
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Komodo 8 results summary

Post by lkaufman » Sun Sep 07, 2014 5:09 pm

I compiled a list of all the ratings so far on lists that have both Komodo 7(a) and Komodo 8 as well as Stockfish 5, all with at least 400 games against many opponents. I included results reported but not yet included in the actual rating lists. For each list I define Komodo 7(a) as the reference engine with a zero rating. Here is what I have so far:

On 4 CPUs, only CCRL 40/4, which has K8 at 53, SF5 at 46, H4 at 17.

On 1 CPU, IPON has K8 at 41, SF5 at 37, H4 at 26.
Frank Q. has k8 at 33, SF5 at minus 2, no rating for h4, and SF Dev. (Aug 3) at 28.
CEGT 40/4 has K8 at 60, SF5 at 46, H4 at 22.
CEGT 40/20 has K8 at 42, SF5 at 32, H4 at 10.
CEGT 5' + 3" has K8 at 48, SF5 at 35, H4 at 23.

Remarkably good agreement overall; every list has K8 first, SF5 second, H4 third. Average of all the single core results is plus 45 for K8 (I predicted just forty), with K8 15 above SF5 and 22 above H4. On the only quad test, the gain over K7 is 53 (I predicted 50 on quad), with K8 7 above SF5 and 36 above H4. Combining all the results has K8 at 47i, with a 13 elo lead over SF5 and a 25 elo lead over H4. We also have a 5 elo lead over recent SF Dev. in the only list above that includes it.
So it's pretty clear that Komodo is number one as far as official releases go. If the latest SF Dev. version is about +15 over SF8 (that's my best guess from available data) it would make it virtually a tie for the lead based on the above.
If I have omitted any lists that meet the above criteria please let me know.

JJJ
Posts: 1287
Joined: Sat Apr 19, 2014 11:47 am

Re: Komodo 8 results summary

Post by JJJ » Sun Sep 07, 2014 5:42 pm

That's a very nice update engine. I m really happy for you and it's gonna be really nice to see in next TCEC. And I m sure, in the meantime, Komodo will be better again.

nabildanial
Posts: 104
Joined: Thu Jun 05, 2014 3:29 am
Location: Malaysia

Re: Komodo 8 results summary

Post by nabildanial » Sun Sep 07, 2014 6:18 pm

Congratulations to Larry Kaufman and Mark Lefler for the remarkable gain of strength of Komodo 8. I have yet to buy it, but after these results, the thought of having it is so tempting.

I have a few questions for the Komodo team. I think it would be nice for a commercial program like Komodo to implement some non-strength related features, such as tactical mode (ala Houdini) and Chess960.

Do you have any plan to implement tactical mode and/or Chess960 into Komodo? If so, when? I just thought that it nice to have Chess960 support in Komodo, as it might improve the development, especially in the opening, as well as being a great tool for a FRC fan like me.

I know in the previous versions of Komodo (dont know about 8) have the option to turn on/off null move and LMR to improve on tactics, but it would be really nice to have a special tactical mode in Komodo.

Based on most of the rating lists available, Stockfish seems to have a slight advantage against Komodo, but lacking the ability to grind out results against inferior programs. It is the Houdini situation all over again. Perhaps Komodo has a better contempt setting for rating lists, or Stockfish need to implement one for default setting. As the TCEC is just around the corner, do you think Komodo needs a change to its contempt setting, to be able to grind out results vs Stockfish?

Thanks...

User avatar
Leto
Posts: 2028
Joined: Thu May 04, 2006 1:40 am
Location: Dune

Re: Komodo 8 results summary

Post by Leto » Sun Sep 07, 2014 6:19 pm

Congratulations to the Komodo team, it seems there's a new king up to four cores.

Modern Times
Posts: 2421
Joined: Thu Jun 07, 2012 9:02 pm

Re: Komodo 8 results summary

Post by Modern Times » Sun Sep 07, 2014 6:48 pm

Nice summary Larry. And you didn't mention the "AMD" word even once ! :D

User avatar
Dr.Wael Deeb
Posts: 9635
Joined: Wed Mar 08, 2006 7:44 pm
Location: Amman,Jordan

Re: Komodo 8 results summary

Post by Dr.Wael Deeb » Sun Sep 07, 2014 7:01 pm

Modern Times wrote:Nice summary Larry. And you didn't mention the "AMD" word even once ! :D
:lol:

:wink:
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….

mjlef
Posts: 1429
Joined: Thu Mar 30, 2006 12:08 pm
Contact:

Re: Komodo 8 results summary

Post by mjlef » Sun Sep 07, 2014 7:02 pm

nabildanial wrote:Congratulations to Larry Kaufman and Mark Lefler for the remarkable gain of strength of Komodo 8. I have yet to buy it, but after these results, the thought of having it is so tempting.

I have a few questions for the Komodo team. I think it would be nice for a commercial program like Komodo to implement some non-strength related features, such as tactical mode (ala Houdini) and Chess960.

Do you have any plan to implement tactical mode and/or Chess960 into Komodo? If so, when? I just thought that it nice to have Chess960 support in Komodo, as it might improve the development, especially in the opening, as well as being a great tool for a FRC fan like me.

I know in the previous versions of Komodo (dont know about 8) have the option to turn on/off null move and LMR to improve on tactics, but it would be really nice to have a special tactical mode in Komodo.

Based on most of the rating lists available, Stockfish seems to have a slight advantage against Komodo, but lacking the ability to grind out results against inferior programs. It is the Houdini situation all over again. Perhaps Komodo has a better contempt setting for rating lists, or Stockfish need to implement one for default setting. As the TCEC is just around the corner, do you think Komodo needs a change to its contempt setting, to be able to grind out results vs Stockfish?

Thanks...
Komodo 8 retains the same UCI parameters as Komodo 7, so you can indeed turn off LMR and null move, for example. I have not yet started looking at a "tactical version". I looked into 960, but there was not enough time to add and test it before the Komodo 8 release. I have a list of things to work on, given time. But we tend to spend most of our development time on making the core program better at standard chess. But we always listen to customer requests, and they do influence what we work on.

Mark

lkaufman
Posts: 3760
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Komodo 8 results summary

Post by lkaufman » Sun Sep 07, 2014 7:31 pm

Modern Times wrote:Nice summary Larry. And you didn't mention the "AMD" word even once ! :D
I always claimed that AMD was just an advantage for Houdini, but now that we have clearly passsed Houdini on all hardware and have only SF as a rival, it's no longer of concern to me.

lkaufman
Posts: 3760
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Komodo 8 results summary

Post by lkaufman » Sun Sep 07, 2014 7:36 pm

nabildanial wrote:Congratulations to Larry Kaufman and Mark Lefler for the remarkable gain of strength of Komodo 8. I have yet to buy it, but after these results, the thought of having it is so tempting.

I have a few questions for the Komodo team. I think it would be nice for a commercial program like Komodo to implement some non-strength related features, such as tactical mode (ala Houdini) and Chess960.

Do you have any plan to implement tactical mode and/or Chess960 into Komodo? If so, when? I just thought that it nice to have Chess960 support in Komodo, as it might improve the development, especially in the opening, as well as being a great tool for a FRC fan like me.

I know in the previous versions of Komodo (dont know about 8) have the option to turn on/off null move and LMR to improve on tactics, but it would be really nice to have a special tactical mode in Komodo.

Based on most of the rating lists available, Stockfish seems to have a slight advantage against Komodo, but lacking the ability to grind out results against inferior programs. It is the Houdini situation all over again. Perhaps Komodo has a better contempt setting for rating lists, or Stockfish need to implement one for default setting. As the TCEC is just around the corner, do you think Komodo needs a change to its contempt setting, to be able to grind out results vs Stockfish?

Thanks...
Yes, contempt helps us a bit vs. weaker engines and hurts a bit vs. SF, but the effect is pretty small. For straight up matches with SF I would set it to zero. In TCEC we leave contempt on until the semi-final or even the final, then we use zero.
Maybe we'll have 960 support in next release, no promise though. As for tactical mode, we could even add an adaptable parameter that makes Komodo as tactical as you want, and it would only take a few minutes work by Mark. But it might not be the best way to implement tactical mode. No reason we couldn't have this for next release.

Frank Quisinsky
Posts: 4852
Joined: Wed Nov 18, 2009 6:16 pm
Location: Trier, Germany
Contact:

Re: Komodo 8 results summary

Post by Frank Quisinsky » Sun Sep 07, 2014 9:30 pm

Hi Larry,

only this one:

Engines, based on the same ...
Will be produced similar results!
Not new, allways the same to see in testing.

Not sure in the case "Rybka / Naum" but all stats I do to these topic are the same.

You have to play vs. three human students of Dvoretsky.
You dislike the style from Dvoretsky, maybe your Nemesis (an example only).

or ...

You have to play vs. three humans students of Shirov,
You like the style of Shirov because your results ae better vs. player's with tactical skills.

You produced a rating list ...

1. After all what I read based Equinox on IPP too.
2. After all what I read based the newest Fritz / Pandix on IPP too.
3. We know the work the Houdini programmer used for v1.
4. We know that newer Critter versions are more similar to IPP too.
5. We know the work the Rybka programmer used for v1.

If you test an engine vs. this group of opponents and K8 like the IPP style Elo of K8 can be 60 points stronger.

1. Rybka
2. Naum

Two programs again with the same skills.

I am looking here:
http://users.telenet.be/chesslogik//images/6xyk.jpg
And try to used completly different engines and for me nice to see in each stat too.

You can produced 10.000 games for each of 16 programs but the result will be unclear if too many of this programs have the same basics.

K8 to K7a can be 40 or 50 or 60 Elo better ... not a question of quantity of games, more a question of quantity of engines with same basic strengths.!

---

You know the results from K8 from my tourney with 1.000 games.
Now Fizbo replaced Zappa ... Zappa is very old and Fizbo new. I will test newer engines.

Have a look here (results vs. Zappa):
Komodo 8 x64 : 50 (+ 0,= 12,- 38), 12.0 %
Stockfish 03.08.14 BMI2 x64 : 50 (+ 0,= 22,- 28), 22.0 %

5 points more for K8 if Zappa in the group of opponents.
Now SF is 3 Elo stronger in my continuous tourney if Zappa is eliminated.

Means I switched only 1 of 21 engines and we can see very easy and nice such a modification.

And now ...
4x the same engine in one tourney!
WOW!!

I am happy that you calculated the average between all the results. It's clear that K8 is stronger as K7a and nice to see. But unfortunately, the most other tester using also a lot of IPPs.

I am thinking we all should more looking in detail in the results. Most things have nothing to do with the "ErrBar myth" or better "Ligthning theory".

On the other hand ...
We don't have enough opponents for programs like SF or Komodo. In such situation we are working with "cloning" but the results will be not the best.

Best
Frank

PS: In the beginning, after opening moves K8 is around 100 Elo stronger als K7. I add my stats in the other part of this forum. Made a mistake, should be stand here ... Ok, in that stat ... ErrorBar myth is right but clearly to see that K8 is improved more after the opening book moves with many pieces on the board and stronger with tactics.

:-(
I like computer chess!

Post Reply