question on performance of DTS

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: question on performance of DTS

Post by Zach Wegner »

Daniel Shawul wrote:I think you got it wrong. The numbers mentioned are time to complete a task (f.i fixed depth = 20), and not nps scaling.
To the OP : Neither stockfish nor crafty do DTS. I know ZCT uses it.
Indeed it does, but it's rather broken. I couldn't get it to run stably on 8 cores.

Also, the speedup really sucks, at least compared to what it should be. One thing I did was comment out all the statistic gathering. I have no idea what I was thinking back then, the statistics were shared and volatile. So tons of cache invalidations. Commenting those out increased NPS by something like 30% IIRC on four cores.

So ZCT should probably be useful as a reference, but measuring its speedup isnt very worthwhile at all. If you're going to write a DTS search, look at it, but think carefully about each design decision, and don't necessarily follow ZCT...
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: question on performance of DTS

Post by Zach Wegner »

bob wrote:
mcostalba wrote:
liuzy wrote:I found this table in Bob's website.
+---------------+-----+-----+-----+-----+------+
|# processors | 1 | 2 | 4 | 8 | 16 |
+---------------+-----+-----+-----+-----+------+
|speedup | 1.0 | 2.0 | 3.7 | 6.6 | 11.1 |
+---------------+-----+-----+-----+-----+------+

Where can I find such data of stockfish ?
Nowhere :-)

Nobody has ever built up such a table for SF, as far as I know.


BTW although I concede that such a table, based on nodes/sec on a given hardware, has some validity I also think could be misleading because does not reflect a corresponding ELO increase speed-up.
So if a program runs 1.7x faster on 2 processors, that won't affect the Elo the same way as running on a single CPU that is 1.7x faster? None of my speedup data is about NPS. It is all about time to a specific depth, which is a real performance measurement that does predict Elo accurately.
I have to agree with Marco here. The trees searched by the two versions are not the same, so you can't assume that a speedup of X is equivalent to making the SP version X times faster. With modern programs that LMR a lot, this difference can be pretty big. If you do a test at fixed depth between SP and MP, I'd bet that MP wins.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: question on performance of DTS

Post by mcostalba »

bob wrote: You are _very_ naive to make that statement about something being naive. :)
I have already said that I mistakenly was thinking to nps instead of time-to-depth.

You are beating a dead horse....enjoy ! :-)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: question on performance of DTS

Post by bob »

Zach Wegner wrote:
bob wrote:
mcostalba wrote:
liuzy wrote:I found this table in Bob's website.
+---------------+-----+-----+-----+-----+------+
|# processors | 1 | 2 | 4 | 8 | 16 |
+---------------+-----+-----+-----+-----+------+
|speedup | 1.0 | 2.0 | 3.7 | 6.6 | 11.1 |
+---------------+-----+-----+-----+-----+------+

Where can I find such data of stockfish ?
Nowhere :-)

Nobody has ever built up such a table for SF, as far as I know.


BTW although I concede that such a table, based on nodes/sec on a given hardware, has some validity I also think could be misleading because does not reflect a corresponding ELO increase speed-up.
So if a program runs 1.7x faster on 2 processors, that won't affect the Elo the same way as running on a single CPU that is 1.7x faster? None of my speedup data is about NPS. It is all about time to a specific depth, which is a real performance measurement that does predict Elo accurately.
I have to agree with Marco here. The trees searched by the two versions are not the same, so you can't assume that a speedup of X is equivalent to making the SP version X times faster. With modern programs that LMR a lot, this difference can be pretty big. If you do a test at fixed depth between SP and MP, I'd bet that MP wins.
I don't think so, and don't see any reason to suspect this would be true. The same thing happens in a serial search (notably hash grafting between branches). I have done a lot of SMP testing on our cluster on occasion, and in general, the Elo pretty well tracks what you would expect for hardware that is about 1.7-1.8x faster. I have made the usual cluster runs, then reran the same test except to tell crafty to use 2 processors (this on our dual-cpu per node cluster...

The trees are not the same, but overall the depth is the key. Simple things like history counters, killer moves, hash replacement strategy, etc all change the shape of the tree in random and strange ways that can't be predicted. Yet this does not seem to affect Elo at all.