Komodo bugfix results

Uri Blass · Post by **Uri Blass** » Mon Jun 17, 2013 10:29 pm

yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?

I have a theory that this drastic improvement is result of the fact that komodo is not optimized for small number of cores and when the komodo team fix the problem komodo may be stronger with less cores and keep the same playing strength with 12 cores.

My theory is that Komodo5.1 MP has a super linear improvement and
komodo MP with 12 cores can beat komodo with 1 core with 12:1 time handicap and that it is possible to use it to make komodo stronger with less cores.

yanquis1972 · Post by **yanquis1972** » Mon Jun 17, 2013 10:41 pm

this would be fantastic news indeed. i do seem to recall someone saying that komodo was not scaling as well as expected, but that seems to not be a problem on manycore systems. if the issue could be fixed for dual & quad core, i believe it would immediately put komodo's strength nearly on par with houdini's.

lkaufman · Post by **lkaufman** » Tue Jun 18, 2013 12:04 am

yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?

It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.

Uri Blass · Post by **Uri Blass** » Tue Jun 18, 2013 5:06 am

lkaufman wrote:
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.

I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.

My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.

It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.

Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.

Don · Post by **Don** » Tue Jun 18, 2013 5:12 am

Uri Blass wrote:
lkaufman wrote:
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.

My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.

It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.

Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.

It's not possible to have consistently super-linear improvement unless there is something really broken about your implementation. If you COULD get superlinear improvement then you simple set up additional threads even when using 1 core to see a benefit.

Don

Don · Post by **Don** » Tue Jun 18, 2013 5:20 am

Uri Blass wrote:
lkaufman wrote:
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.

My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.

It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.

Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.

Now that we have MP I am a lot more interested in these MP issues. I know of at least 1 tester who believe strongly that Critter and Stockfish scale poorly (with more cores) compared to Houdini, Sjeng, Zappa and Rondo. But I don't really understand this. I know that doing SOME TESTS we do scale better than Critter on 4 cores. But I do not know if that would necessarily apply at all time controls.

The scaling with cores issue is MUCH harder to get a handle on because testing goes so much slower, in fact about an order of magnitude slower.

Don

lkaufman · Post by **lkaufman** » Tue Jun 18, 2013 5:31 am

Uri Blass wrote:
lkaufman wrote:
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.

My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.

It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.

Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.

We did do a little testing on 8 cores and were pleased but not shocked at the gain over 4 cores. My opinion is that the good result you mention for Stockfish against Houdini was due to Stockfish scaling better with more time than Houdini, not to scaling better with more cores. There is ample evidence for the former but none that I know of for the latter.

Uri Blass · Post by **Uri Blass** » Tue Jun 18, 2013 5:32 am

Don wrote:
Uri Blass wrote:
lkaufman wrote:
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.

My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.

It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.

Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.
It's not possible to have consistently super-linear improvement unless there is something really broken about your implementation. If you COULD get superlinear improvement then you simple set up additional threads even when using 1 core to see a benefit.

Don

I do not think that it is clear that there is nothing broken about your implementation and of course you can improve komodo with less cores if you have super linear improvement(even if not from 1 core to 12 cores but only from 4 cores to 12 cores).

Komodo bugfix results

Re: Komodo bugfix results

Re: Komodo bugfix results

Re: Komodo bugfix results

Re: Komodo bugfix results

Re: Komodo bugfix results

Re: Komodo bugfix results

Re: Komodo bugfix results

Re: Komodo bugfix results