Hmmm, this is interesting - statistical noise for you.
Default SF code:
- Max: 129285 NPS - +0%
- Min: 92192 NPS - +0%
- Avg: 121421 NPS - +0%
Gerd #1 code:
- Max: 129196 NPS - -0.069%
- Min: 119863 NPS - +30.015%
- Avg: 125903 NPS - +3.691%
Gerd #2 minimalist code:
- Max: 129999 NPS - +0.552%
- Min: 110383 NPS - +19.732%
- Avg: 125726 NPS - +3.546%
Gerd #3 (almost) branchless code:
- Max: 129936 NPS - +0.504%
- Min: 103873 NPS - +12.67%
- Avg: 123752 NPS - +1.92%
Tomorrow I'll submit Gerd #1 for addition to SF. Thanks!
Matthew:out
Leading Zero Count Question
Moderators: hgm, Dann Corbit, Harvey Williamson
-
ZirconiumX
- Posts: 1334
- Joined: Sun Jul 17, 2011 11:14 am
Re: Leading Zero Count Question
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.
-
Gerd Isenberg
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Leading Zero Count Question
Thanks for the assembly lesson and link. Nice and simple to understand mnemonics. I like the third routine best, except the leading beqlr, smallest Max-Min difference. Are the routines called with empty sets? Returning zero in that case is wrong anyway. How many runs did you make to measure Max, Min and Avg?
And please don't call the routines Gerd
And please don't call the routines Gerd
-
mcostalba
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Leading Zero Count Question
I'd think what it counts in this case is the max value becuase min could be due to some background noise. And max values are almost the same. The point is that this routine is not so performace critical because it is used only rarely and only in non super critical paths so I don't see a compelling reason to add some platform specific code in this case. Instead trailing zero (bsf) is very critical and that's why there are both software versions and versions that use intrinsics.ZirconiumX wrote:Hmmm, this is interesting - statistical noise for you.
Default SF code:
- Max: 129285 NPS - +0%
- Min: 92192 NPS - +0%
- Avg: 121421 NPS - +0%
Gerd #1 code:
- Max: 129196 NPS - -0.069%
- Min: 119863 NPS - +30.015%
- Avg: 125903 NPS - +3.691%
Gerd #2 minimalist code:
- Max: 129999 NPS - +0.552%
- Min: 110383 NPS - +19.732%
- Avg: 125726 NPS - +3.546%
Gerd #3 (almost) branchless code:
- Max: 129936 NPS - +0.504%
- Min: 103873 NPS - +12.67%
- Avg: 123752 NPS - +1.92%
Tomorrow I'll submit Gerd #1 for addition to SF. Thanks!
Matthew:out
Anyhow a nice and well thought out analysys: good job Matthew
-
ZirconiumX
- Posts: 1334
- Joined: Sun Jul 17, 2011 11:14 am
Re: Leading Zero Count Question
The test routine was ./stockfish bench > /dev/null repeated five times for each method.Gerd Isenberg wrote:Thanks for the assembly lesson and link. Nice and simple to understand mnemonics. I like the third routine best, except the leading beqlr, smallest Max-Min difference. Are the routines called with empty sets? Returning zero in that case is wrong anyway. How many runs did you make to measure Max, Min and Avg?
And please don't call the routines Gerd
Replacement 1 was the most noisy, going 119 KNPS to 127 knps to 129 knps, then 124 knps and finally 128 knps.
Replacement #2 was the most consistent, going 110 knps to 129 knps, staying like that for another 2 runs, then dropping slightly to 128 knps.
Replacement #3 was the replacement hardest hit by branch prediction being reset, starting off at 103 knps, the lowest of the three replacements, but it was quite consistent afterward, going to 128 knps, then 129 knps, staying there for another run, before dropping slightly to 126 knps.
As for Marco, iirc msb() is used in the hash table, making it quite important in my opinion.
I did try to be scientific, but I probably failed.
Matthew:out
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.
-
ZirconiumX
- Posts: 1334
- Joined: Sun Jul 17, 2011 11:14 am
Re: Leading Zero Count Question
IMO lzcnt is a feature that should be implemented in SF for those platforms with slow/no bitscan reverse, such as AMD K8/K10 or ARM (or even Crays).
Just my 2 pennies.
Matthew:out
Just my 2 pennies.
Matthew:out
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.