A fairly serious bug was found by one of our alert beta-testers in the beta version of Komodo 5.1 MP. It has been fixed in time for Monday's release. Here are new results with the bugfixed version, all against Stockfish 3.0 at 3' + 2", 256 meg hash, Noomen test set, sse4 machines.
On 4 cores, score now is 60-49 (+34 elo)
On 12 cores, score now is 50-28 (+100 elo).
This difference is so large that even given the small samples the likelihood of superiority for the second result must be quite high, so much to our surprise it seems we scale better than SF with more cores. Normally in view of the small samples I would combine the results and quote the average elo difference, but in view of the huge disparity between the two results this might not be appropriate. Both results are better than with the previous version, the 12 core one dramatically so.
Komodo bugfix results
Moderator: Ras
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
-
Daniel Shawul
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
-
michiguel
- Posts: 6401
- Joined: Thu Mar 09, 2006 8:30 pm
- Location: Chicago, Illinois, USA
Re: Komodo bugfix results
Hi Larry,lkaufman wrote:A fairly serious bug was found by one of our alert beta-testers in the beta version of Komodo 5.1 MP. It has been fixed in time for Monday's release. Here are new results with the bugfixed version, all against Stockfish 3.0 at 3' + 2", 256 meg hash, Noomen test set, sse4 machines.
On 4 cores, score now is 60-49 (+34 elo)
On 12 cores, score now is 50-28 (+100 elo).
This difference is so large that even given the small samples the likelihood of superiority for the second result must be quite high, so much to our surprise it seems we scale better than SF with more cores. Normally in view of the small samples I would combine the results and quote the average elo difference, but in view of the huge disparity between the two results this might not be appropriate. Both results are better than with the previous version, the 12 core one dramatically so.
Sounds like if I do not make a comment in any thread related to statistical significance, I cannot make it somewhere else. So, here I go. I do not think that you can make those conclusions with the number of games you played (particularly with the scaling 4 to 12 cores). Some people can help you with BayesElo. I can help you with Ordo if you are using it to get an idea of the errors.
Miguel
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Komodo bugfix results
I don't know how to calculate the likelihood that on 12 cores Komodo is stronger relative to Stockfish than on four cores, but my sense is that it is around 90% from the data given. If someone knows how to do this calculation, please do so here. If it is around 90%, it would be reasonable to say that Komodo "probably" scales better than Stockfish with more cores. I also had a better result on 12 cores than on 4 for the buggy version, so there is some extra reason to believe this conclusion. By the way, I have no commercial reason to make this claim, as very few buyers have more than 6 cores.michiguel wrote:Hi Larry,lkaufman wrote:A fairly serious bug was found by one of our alert beta-testers in the beta version of Komodo 5.1 MP. It has been fixed in time for Monday's release. Here are new results with the bugfixed version, all against Stockfish 3.0 at 3' + 2", 256 meg hash, Noomen test set, sse4 machines.
On 4 cores, score now is 60-49 (+34 elo)
On 12 cores, score now is 50-28 (+100 elo).
This difference is so large that even given the small samples the likelihood of superiority for the second result must be quite high, so much to our surprise it seems we scale better than SF with more cores. Normally in view of the small samples I would combine the results and quote the average elo difference, but in view of the huge disparity between the two results this might not be appropriate. Both results are better than with the previous version, the 12 core one dramatically so.
Sounds like if I do not make a comment in any thread related to statistical significance, I cannot make it somewhere else. So, here I go. I do not think that you can make those conclusions with the number of games you played (particularly with the scaling 4 to 12 cores). Some people can help you with BayesElo. I can help you with Ordo if you are using it to get an idea of the errors.
Miguel
-
Mike S.
- Posts: 1480
- Joined: Thu Mar 09, 2006 5:33 am
Re: Komodo bugfix results
12 cores data indeed is of academic interest, only. Nevertheless, certainly more than a dozen power users will be interested, as it seems to indicate the effectiveness of the multiprocessing implementation. It looks good!
It seems that if we count engine series only, Komodo 5.1 will be on second rank. But the tough question is now: In typical MP (e.g. 4 cores/4 threads or even 8 threads on Intel), is it stronger than Houdini 1.5a?
As for me, this question is moot though as I am now very satisfied with Komodo CCT on my dualcore CPU. Thanks!
It seems that if we count engine series only, Komodo 5.1 will be on second rank. But the tough question is now: In typical MP (e.g. 4 cores/4 threads or even 8 threads on Intel), is it stronger than Houdini 1.5a?
As for me, this question is moot though as I am now very satisfied with Komodo CCT on my dualcore CPU. Thanks!
Regards, Mike
-
Ajedrecista
- Posts: 2177
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Komodo bugfix results.
Hello Larry:

I agree with other posters that the number of games is somewhat small to draw an early conclusion, so trying to raise the number of played games is a good start.
Regarding Statistics... well, I am not the best here, of course! I did some Fortran 95 programmes that can be downloaded for free through the link of my signature. Just take the results with tons of care. Inside this pack, LOS_and_Elo_uncertainties_calculator is the programme you should run (it is valid between matches of only two engines).
Here is a thread that might be of help with your request:
Math Test 4 All
There are some interesting answers. Miguel explained there how to do a simulation with Ordo. I also did my clumsy math in a 'trial and error' mode. I hope that you find valuable answers in that topic.
------------------------
If you want to calculate LLR (Log Likelihood Ratio) in the same way than SF Testing Framework does, here is other programme by me:
LLR_calculator_for_chess.rar (0.6 MB)
(This link will dead after 30 days since the last download). It is no more than a copy of a part of a Python file of FishTest, as I give credit both in the Readme and source code files. Once again, take the given results with care.
My tools only work in Windows (sorry to Linux and other OS users).
Good luck for the inminent release!
Regards from Spain.
Ajedrecista.
I am glad to see that you are in good form again.lkaufman wrote:I don't know how to calculate the likelihood that on 12 cores Komodo is stronger relative to Stockfish than on four cores, but my sense is that it is around 90% from the data given. If someone knows how to do this calculation, please do so here. If it is around 90%, it would be reasonable to say that Komodo "probably" scales better than Stockfish with more cores. I also had a better result on 12 cores than on 4 for the buggy version, so there is some extra reason to believe this conclusion. By the way, I have no commercial reason to make this claim, as very few buyers have more than 6 cores.michiguel wrote:Hi Larry,lkaufman wrote:A fairly serious bug was found by one of our alert beta-testers in the beta version of Komodo 5.1 MP. It has been fixed in time for Monday's release. Here are new results with the bugfixed version, all against Stockfish 3.0 at 3' + 2", 256 meg hash, Noomen test set, sse4 machines.
On 4 cores, score now is 60-49 (+34 elo)
On 12 cores, score now is 50-28 (+100 elo).
This difference is so large that even given the small samples the likelihood of superiority for the second result must be quite high, so much to our surprise it seems we scale better than SF with more cores. Normally in view of the small samples I would combine the results and quote the average elo difference, but in view of the huge disparity between the two results this might not be appropriate. Both results are better than with the previous version, the 12 core one dramatically so.
Sounds like if I do not make a comment in any thread related to statistical significance, I cannot make it somewhere else. So, here I go. I do not think that you can make those conclusions with the number of games you played (particularly with the scaling 4 to 12 cores). Some people can help you with BayesElo. I can help you with Ordo if you are using it to get an idea of the errors.
Miguel
I agree with other posters that the number of games is somewhat small to draw an early conclusion, so trying to raise the number of played games is a good start.
Regarding Statistics... well, I am not the best here, of course! I did some Fortran 95 programmes that can be downloaded for free through the link of my signature. Just take the results with tons of care. Inside this pack, LOS_and_Elo_uncertainties_calculator is the programme you should run (it is valid between matches of only two engines).
Here is a thread that might be of help with your request:
Math Test 4 All
There are some interesting answers. Miguel explained there how to do a simulation with Ordo. I also did my clumsy math in a 'trial and error' mode. I hope that you find valuable answers in that topic.
------------------------
If you want to calculate LLR (Log Likelihood Ratio) in the same way than SF Testing Framework does, here is other programme by me:
LLR_calculator_for_chess.rar (0.6 MB)
(This link will dead after 30 days since the last download). It is no more than a copy of a part of a Python file of FishTest, as I give credit both in the Readme and source code files. Once again, take the given results with care.
My tools only work in Windows (sorry to Linux and other OS users).
Good luck for the inminent release!
Regards from Spain.
Ajedrecista.
-
Werewolf
- Posts: 2064
- Joined: Thu Sep 18, 2008 10:24 pm
Re: Komodo bugfix results
I just want to say that as a user who has 16 and 12 core machines, I very much appreciate your efforts to work on this equipment.lkaufman wrote:I don't know how to calculate the likelihood that on 12 cores Komodo is stronger relative to Stockfish than on four cores, but my sense is that it is around 90% from the data given. If someone knows how to do this calculation, please do so here. If it is around 90%, it would be reasonable to say that Komodo "probably" scales better than Stockfish with more cores. I also had a better result on 12 cores than on 4 for the buggy version, so there is some extra reason to believe this conclusion. By the way, I have no commercial reason to make this claim, as very few buyers have more than 6 cores.michiguel wrote:Hi Larry,lkaufman wrote:A fairly serious bug was found by one of our alert beta-testers in the beta version of Komodo 5.1 MP. It has been fixed in time for Monday's release. Here are new results with the bugfixed version, all against Stockfish 3.0 at 3' + 2", 256 meg hash, Noomen test set, sse4 machines.
On 4 cores, score now is 60-49 (+34 elo)
On 12 cores, score now is 50-28 (+100 elo).
This difference is so large that even given the small samples the likelihood of superiority for the second result must be quite high, so much to our surprise it seems we scale better than SF with more cores. Normally in view of the small samples I would combine the results and quote the average elo difference, but in view of the huge disparity between the two results this might not be appropriate. Both results are better than with the previous version, the 12 core one dramatically so.
Sounds like if I do not make a comment in any thread related to statistical significance, I cannot make it somewhere else. So, here I go. I do not think that you can make those conclusions with the number of games you played (particularly with the scaling 4 to 12 cores). Some people can help you with BayesElo. I can help you with Ordo if you are using it to get an idea of the errors.
Miguel
My hardware is fairly tied up but if I can help run a quick test for you let me know.
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Komodo bugfix results
I completed 200 games against Stockfish 3 on four cores for the release version, and the result is 108.5 to 91.5, which is +30 elo for Komodo. The difference between this and the +100 elo result on 12 cores after 78 games must be highly significant now.
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Komodo bugfix results
You have a good sense with your 90%. Rule of thumb calculation (besides being rude, I would need the number of draws) is as follows:lkaufman wrote:I don't know how to calculate the likelihood that on 12 cores Komodo is stronger relative to Stockfish than on four cores, but my sense is that it is around 90% from the data given. If someone knows how to do this calculation, please do so here. If it is around 90%, it would be reasonable to say that Komodo "probably" scales better than Stockfish with more cores. I also had a better result on 12 cores than on 4 for the buggy version, so there is some extra reason to believe this conclusion. By the way, I have no commercial reason to make this claim, as very few buyers have more than 6 cores.michiguel wrote:Hi Larry,lkaufman wrote:A fairly serious bug was found by one of our alert beta-testers in the beta version of Komodo 5.1 MP. It has been fixed in time for Monday's release. Here are new results with the bugfixed version, all against Stockfish 3.0 at 3' + 2", 256 meg hash, Noomen test set, sse4 machines.
On 4 cores, score now is 60-49 (+34 elo)
On 12 cores, score now is 50-28 (+100 elo).
This difference is so large that even given the small samples the likelihood of superiority for the second result must be quite high, so much to our surprise it seems we scale better than SF with more cores. Normally in view of the small samples I would combine the results and quote the average elo difference, but in view of the huge disparity between the two results this might not be appropriate. Both results are better than with the previous version, the 12 core one dramatically so.
Sounds like if I do not make a comment in any thread related to statistical significance, I cannot make it somewhere else. So, here I go. I do not think that you can make those conclusions with the number of games you played (particularly with the scaling 4 to 12 cores). Some people can help you with BayesElo. I can help you with Ordo if you are using it to get an idea of the errors.
Miguel
2SD for first match is 480/sqrt(109) points ~= 46 points
2SD for second match is 480/sqrt(78) points ~= 54 points
2SD for their difference is ~= sqrt(46^2+54^2) ~= 70 points
You have 66 points difference, and it's a bit smaller than 2SD error of 70 points, so the LOS is a bit smaller than 95%, say 90%, as your excellent sense told you. Maybe Miguel can show that in Ordo, or you feed Bayeselo with PGN to get LOS matrix.
-
yanquis1972
- Posts: 1766
- Joined: Wed Jun 03, 2009 12:14 am
Re: Komodo bugfix results
larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?