yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
I have a theory that this drastic improvement is result of the fact that komodo is not optimized for small number of cores and when the komodo team fix the problem komodo may be stronger with less cores and keep the same playing strength with 12 cores.
My theory is that Komodo5.1 MP has a super linear improvement and
komodo MP with 12 cores can beat komodo with 1 core with 12:1 time handicap and that it is possible to use it to make komodo stronger with less cores.
this would be fantastic news indeed. i do seem to recall someone saying that komodo was not scaling as well as expected, but that seems to not be a problem on manycore systems. if the issue could be fixed for dual & quad core, i believe it would immediately put komodo's strength nearly on par with houdini's.
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.
My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.
It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.
Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.
My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.
It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.
Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.
It's not possible to have consistently super-linear improvement unless there is something really broken about your implementation. If you COULD get superlinear improvement then you simple set up additional threads even when using 1 core to see a benefit.
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.
My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.
It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.
Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.
Now that we have MP I am a lot more interested in these MP issues. I know of at least 1 tester who believe strongly that Critter and Stockfish scale poorly (with more cores) compared to Houdini, Sjeng, Zappa and Rondo. But I don't really understand this. I know that doing SOME TESTS we do scale better than Critter on 4 cores. But I do not know if that would necessarily apply at all time controls.
The scaling with cores issue is MUCH harder to get a handle on because testing goes so much slower, in fact about an order of magnitude slower.
Don
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.
My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.
It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.
Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.
We did do a little testing on 8 cores and were pleased but not shocked at the gain over 4 cores. My opinion is that the good result you mention for Stockfish against Houdini was due to Stockfish scaling better with more time than Houdini, not to scaling better with more cores. There is ample evidence for the former but none that I know of for the latter.
yanquis1972 wrote:larry, i am not sure if you or don (or either) would be the one to ask, but do you think this drastic result on manycore systems is a result of your access to such hardware? do you have any theories as to how such a drastic improvement is possible? would something like stockfishes default split depth come into play?
It is not a result of our access to super hardware, we mainly tested for 4 cores. My best guess is that Stockfish just doesn't benefit very much going from 4 to 12 cores, for whatever reason, so we look good by comparison. Our MP is quite different from theirs, so it is quite plausible that their method is a bit better for 4 cores and ours for 12. Probably some of the 70 elo difference in scaling is just sample error. One thing I can say is that I don't think we will have much trouble improving the MP performance in the future; at least that should be much easier than improving the underlying engine. And for Uri, I appreciate the thought, but no, we don't get close to linear (never mind superlinear) scaling, maybe just closer than some other engines.
I simply remember the good result of stockfish 12 cores against houdini(losing only 25-23 in TCEC) and thought that maybe at least part of the reason is simply that stockfish is better than houdini in earning from more cores.
My thought was that if stockfish has almost linear improvement from more cores and komodo is only slightly better with 4 cores and significantly better with 12 cores,
than komodo may have super linear improvement.
It is of course possible that my assumption that stockfish has almost linear improvement from more cores is clearly wrong and the main reason for stockfish good result in TCEC is the long time control and the number of cores is not the factor that helped stockfish.
Note that I understand that you did not test with 12 cores so it is still possible to have super linear improvement from 4 cores to 12 cores
and if somebody with 12 cores can test 12 cores against 4 cores(ponder off) with 3:1 time handicap it may be productive.
It's not possible to have consistently super-linear improvement unless there is something really broken about your implementation. If you COULD get superlinear improvement then you simple set up additional threads even when using 1 core to see a benefit.
Don
I do not think that it is clear that there is nothing broken about your implementation and of course you can improve komodo with less cores if you have super linear improvement(even if not from 1 core to 12 cores but only from 4 cores to 12 cores).