Some Komodo 9 results

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Some Komodo 9 results

Post by lkaufman »

The following results are for Komodo 1416.00, which will become Komodo 9 except possibly for cosmetic changes and/or MP changes which would not affect single-core play.
At 2' plus 1" in direct matches vs. Komodo 8, on single core K9 is plus 52 elo after 842 games (more to follow); on 2 cores K9 is plus 60 elo after almost 4,000 games; and on 4 cores K9 is plus 61 elo after just over 3100 games.
Ingo Bauer has run his IPON test on version 1416.00 (5' + 3", single core test, 3300 games). The result is a gain over Komodo 8 of 50 elo points, and a lead over Stockfish 6 (the previous number 1) of 14 elo on his Top 16 (round robin) list. Since no further changes are planned that would affect single-core play, Ingo will post these results as Komodo 9 results once it is released.
Komodo rules!
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Some Komodo 9 results

Post by JJJ »

That is excellent result ! Congratulation for your great work !
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Some Komodo 9 results

Post by JJJ »

And I told you to aim at least 50 Elo. You answered it was maybe too much, and you were aiming 40. But now you get at least 50 on 1CPU and 60 on more ! It's great.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Some Komodo 9 results

Post by JJJ »

Komodo 9 is tested with drawscore 0 or -7 ?
Frank Quisinsky
Posts: 6811
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Some Komodo 9 results

Post by Frank Quisinsky »

Hi Larry,

the result vs. more participants with more modern hardware Ingo used will be only "around" the same. Furthermore, a lot of equal engines in the list of Ingo, different others missed.

I think if Ingo have the results + 50 Elo in reality vs. more different engines the result is +35-40. Same as for Komodo 8 ...

30 to Stockfish 6 BMI2 ...
But Stockfish current dev. is around 15-20 Elo stronger.
Sure here, let us wait of my final result.

On the end of the test from my view, Stockfish dev. should be around 5-10 Elo stronger.

An other much more interesting point ...
With more pieces on the board Komodo 8 is stronger as Stockfish 6 because Stockfish produced to many draw games. In late middlegame and endgame Stockfish 6 is clearly stronger as Komodo 8. Here it's interesting to compare both programs after my test run. I am sure that the good stats in the beginning of the games Komodo 8 have isn't to TOP.

Elo alone isn't important, more the strength and weaknesses for stronger players you are for an example :-)

BTW: Stockfish have the same problem Komodo have. Elo strength go up if more participant (not important how many games you produced).

I wrote today in my condition page:

"FCP Rating List: I created two versions. I am thinking that a rating list included all played games isn't exactly enough. That can't be right because more games will not indicate more exactly ratings. Easy, a lot of versions from same engines are included. That can be compare if I would like to added all the derivate engines in one rating system only. Simple, I know that different readers like that and for that reason I created two versions. In my opinion is much more important to have a lot of different partipants (playing styles) for really good ratings with lesser quantity of games. All Elo calculation tools seems to forget such a simple and logical rule because ErrBar is the same with different quantity of opponents / same quantity of games."

And I have each time since many years the same result and opinion about it.

Biggest problem in comparing Komodo with Stockfish is that both programs are playing in an other World. Means, we need more and stronger opponents to test the Elo differences in detail. Each produced results in comparing Komodo with Stockfish isn't exactly for that reason. But we can find out strength and weaknesses.

Good example:
Thinker 5.4d Inert x64 ...

Have a look on my page in detail (still running test-run for the 6-years-old WB engine Thinker).

Clearly stronger vs. newer modern engines as vs. older available engines. Complete other situation with Sjeng. Better results vs. more modern engines.

And we are speaking here from Thinker and Sjeng only. Thats what I mean, a strong list of a lot of different engines will give us better results but main problem in testing Komodo or Stockfish ... we haven't many strong opponents. Different programmers are on the way and perhaps in two years ... I am sure ... we have more engines which are clearly stronger.

I hope so ...
Important that we powered younger talents as to look only on the own development. Computer chess can be very boring if we try to say to others, please use Komodo and Stockfish only. Absolutely clear, that the most here are searching best material. But I am thinking often that the most here will never see all the nice strengths other engines have ... if the focus is Komodo and Stockfish only.

I gave a Grandmaster for around two month the Information: For your opening analyzes you have to used Spark ... Often the Grandmaster is very happy with Spark and used Spark in combination with Houdini (he bought Houdini).

That is the reason that I am not a bigger fan from private test versions. In the time to test such things other programmers need more help and the focus goes in a bad direction.

Best
Frank
lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some Komodo 9 results

Post by lkaufman »

JJJ wrote:Komodo 9 is tested with drawscore 0 or -7 ?
As usual all mentioned tests were in default mode, meaning drawscore of -7. Using zero would produce slightly better results vs. SF but slightly worse vs. everyone else; the effect is pretty small though. It was a big deal with Houdini because Houdini modified piece values for comtempt; we don't do that.
Results were surprisingly good at these longer time controls, better than at bullet or faster. It seems our changes scale well..
Komodo rules!
lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some Komodo 9 results

Post by lkaufman »

Frank Quisinsky wrote:Hi Larry,

the result vs. more participants with more modern hardware Ingo used will be only "around" the same. Furthermore, a lot of equal engines in the list of Ingo, different others missed.

I think if Ingo have the results + 50 Elo in reality vs. more different engines the result is +35-40. Same as for Komodo 8 ...

30 to Stockfish 6 BMI2 ...
But Stockfish current dev. is around 15-20 Elo stronger.
Sure here, let us wait of my final result.

On the end of the test from my view, Stockfish dev. should be around 5-10 Elo stronger.

An other much more interesting point ...
With more pieces on the board Komodo 8 is stronger as Stockfish 6 because Stockfish produced to many draw games. In late middlegame and endgame Stockfish 6 is clearly stronger as Komodo 8. Here it's interesting to compare both programs after my test run. I am sure that the good stats in the beginning of the games Komodo 8 have isn't to TOP.

Elo alone isn't important, more the strength and weaknesses for stronger players you are for an example :-)

BTW: Stockfish have the same problem Komodo have. Elo strength go up if more participant (not important how many games you produced).

I wrote today in my condition page:

"FCP Rating List: I created two versions. I am thinking that a rating list included all played games isn't exactly enough. That can't be right because more games will not indicate more exactly ratings. Easy, a lot of versions from same engines are included. That can be compare if I would like to added all the derivate engines in one rating system only. Simple, I know that different readers like that and for that reason I created two versions. In my opinion is much more important to have a lot of different partipants (playing styles) for really good ratings with lesser quantity of games. All Elo calculation tools seems to forget such a simple and logical rule because ErrBar is the same with different quantity of opponents / same quantity of games."

And I have each time since many years the same result and opinion about it.

Biggest problem in comparing Komodo with Stockfish is that both programs are playing in an other World. Means, we need more and stronger opponents to test the Elo differences in detail. Each produced results in comparing Komodo with Stockfish isn't exactly for that reason. But we can find out strength and weaknesses.

Good example:
Thinker 5.4d Inert x64 ...

Have a look on my page in detail (still running test-run for the 6-years-old WB engine Thinker).

Clearly stronger vs. newer modern engines as vs. older available engines. Complete other situation with Sjeng. Better results vs. more modern engines.

And we are speaking here from Thinker and Sjeng only. Thats what I mean, a strong list of a lot of different engines will give us better results but main problem in testing Komodo or Stockfish ... we haven't many strong opponents. Different programmers are on the way and perhaps in two years ... I am sure ... we have more engines which are clearly stronger.

I hope so ...
Important that we powered younger talents as to look only on the own development. Computer chess can be very boring if we try to say to others, please use Komodo and Stockfish only. Absolutely clear, that the most here are searching best material. But I am thinking often that the most here will never see all the nice strengths other engines have ... if the focus is Komodo and Stockfish only.

I gave a Grandmaster for around two month the Information: For your opening analyzes you have to used Spark ... Often the Grandmaster is very happy with Spark and used Spark in combination with Houdini (he bought Houdini).

That is the reason that I am not a bigger fan from private test versions. In the time to test such things other programmers need more help and the focus goes in a bad direction.

Best
Frank

I would expect that Komodo 9 would have a larger lead over Stockfish 6 on your list than it does on Ingo's list, because the wider range of strenth of the opponents on your list should favor the engine that makes less draws against weaker opponents, which you imply would be Komodo. Do you agree? Of course we'll find out soon enough.

Best,
Larry
Komodo rules!
Frank Quisinsky
Posts: 6811
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Some Komodo 9 results

Post by Frank Quisinsky »

Hi Larry,

normely yes!
But have a look in my results for Komodo 8 and the results of others in times Komodo 8 was released. I have around +30, all others with lesser participants and more clones +40.

I believe that Komodo lost a bit in comparing with Stockfish in endgames. A lot of clearly weaker opponents can hold with strong endgames easy an draw vs. clearly stronger opponents. Stockfish made the advantage here before the endgame started and can make more points vs. weaker opponents.

I am very interesting in the question:
Where Komodo can make +40-50 Elo. I hope in the transposition into endgame ... in this case it will be a hard fight around place 1 between Komodo and Stockfish if both have many participants.

Indeed we will see it later ...
Very interesting event ...

My wish is that Komodo 9 is on position 1 because it's better if a two-men Team hold place 1 as a team with 200 helpers. Easy, others will not lost interest in developing of an engine if a two-men team go with a good example in front.

Have a nice evening!

Best
Frank
lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Some Komodo 9 results

Post by lkaufman »

lkaufman wrote:The following results are for Komodo 1416.00, which will become Komodo 9 except possibly for cosmetic changes and/or MP changes which would not affect single-core play.
At 2' plus 1" in direct matches vs. Komodo 8, on single core K9 is plus 52 elo after 842 games (more to follow); on 2 cores K9 is plus 60 elo after almost 4,000 games; and on 4 cores K9 is plus 61 elo after just over 3100 games.
Ingo Bauer has run his IPON test on version 1416.00 (5' + 3", single core test, 3300 games). The result is a gain over Komodo 8 of 50 elo points, and a lead over Stockfish 6 (the previous number 1) of 14 elo on his Top 16 (round robin) list. Since no further changes are planned that would affect single-core play, Ingo will post these results as Komodo 9 results once it is released.
Update on single-core k9 vs k8 match at 2'+1": +54 after almost 4800 games.
Komodo rules!
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Some Komodo 9 results

Post by Dr.Wael Deeb »

lkaufman wrote:
Frank Quisinsky wrote:Hi Larry,

the result vs. more participants with more modern hardware Ingo used will be only "around" the same. Furthermore, a lot of equal engines in the list of Ingo, different others missed.

I think if Ingo have the results + 50 Elo in reality vs. more different engines the result is +35-40. Same as for Komodo 8 ...

30 to Stockfish 6 BMI2 ...
But Stockfish current dev. is around 15-20 Elo stronger.
Sure here, let us wait of my final result.

On the end of the test from my view, Stockfish dev. should be around 5-10 Elo stronger.

An other much more interesting point ...
With more pieces on the board Komodo 8 is stronger as Stockfish 6 because Stockfish produced to many draw games. In late middlegame and endgame Stockfish 6 is clearly stronger as Komodo 8. Here it's interesting to compare both programs after my test run. I am sure that the good stats in the beginning of the games Komodo 8 have isn't to TOP.

Elo alone isn't important, more the strength and weaknesses for stronger players you are for an example :-)

BTW: Stockfish have the same problem Komodo have. Elo strength go up if more participant (not important how many games you produced).

I wrote today in my condition page:

"FCP Rating List: I created two versions. I am thinking that a rating list included all played games isn't exactly enough. That can't be right because more games will not indicate more exactly ratings. Easy, a lot of versions from same engines are included. That can be compare if I would like to added all the derivate engines in one rating system only. Simple, I know that different readers like that and for that reason I created two versions. In my opinion is much more important to have a lot of different partipants (playing styles) for really good ratings with lesser quantity of games. All Elo calculation tools seems to forget such a simple and logical rule because ErrBar is the same with different quantity of opponents / same quantity of games."

And I have each time since many years the same result and opinion about it.

Biggest problem in comparing Komodo with Stockfish is that both programs are playing in an other World. Means, we need more and stronger opponents to test the Elo differences in detail. Each produced results in comparing Komodo with Stockfish isn't exactly for that reason. But we can find out strength and weaknesses.

Good example:
Thinker 5.4d Inert x64 ...

Have a look on my page in detail (still running test-run for the 6-years-old WB engine Thinker).

Clearly stronger vs. newer modern engines as vs. older available engines. Complete other situation with Sjeng. Better results vs. more modern engines.

And we are speaking here from Thinker and Sjeng only. Thats what I mean, a strong list of a lot of different engines will give us better results but main problem in testing Komodo or Stockfish ... we haven't many strong opponents. Different programmers are on the way and perhaps in two years ... I am sure ... we have more engines which are clearly stronger.

I hope so ...
Important that we powered younger talents as to look only on the own development. Computer chess can be very boring if we try to say to others, please use Komodo and Stockfish only. Absolutely clear, that the most here are searching best material. But I am thinking often that the most here will never see all the nice strengths other engines have ... if the focus is Komodo and Stockfish only.

I gave a Grandmaster for around two month the Information: For your opening analyzes you have to used Spark ... Often the Grandmaster is very happy with Spark and used Spark in combination with Houdini (he bought Houdini).

That is the reason that I am not a bigger fan from private test versions. In the time to test such things other programmers need more help and the focus goes in a bad direction.

Best
Frank

I would expect that Komodo 9 would have a larger lead over Stockfish 6 on your list than it does on Ingo's list, because the wider range of strenth of the opponents on your list should favor the engine that makes less draws against weaker opponents, which you imply would be Komodo. Do you agree? Of course we'll find out soon enough.

Best,
Larry
That's exactly what happens in my private rating list....
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….