The first generation of Ryzen processors are extremely slow at executing the BMI2 instruction set. Does anyone know if this has been corrected in Ryzen 2 chips?
- Steve
Ryzen 2 and BMI2?
Moderators: hgm, Rebel, chrisw
-
- Posts: 1222
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Ryzen 2 and BMI2?
http://www.chessprogramming.net - Maverick Chess Engine
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Ryzen 2 and BMI2?
Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.
I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
-
- Posts: 5569
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Ryzen 2 and BMI2?
Please make Sjeng use PEXTGian-Carlo Pascutto wrote: ↑Tue May 15, 2018 9:04 am Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.
I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: Ryzen 2 and BMI2?
I wanted to use PEXT for a branchless UTF-8 parser, but unfortunately, the instruction is too slow for it to be a win over straight-up code. (I know others have tried and come to the same conslusion.)
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: Ryzen 2 and BMI2?
No, you'd have to use an intrinsic or inline assembler. The former is fairly portable across compilers; at least MSVC, GCC, Clang and ICC all tend to support the Intel intrinsic style (_pext_u64 in this case) with some coaxing.
-
- Posts: 1568
- Joined: Thu Jul 16, 2009 10:47 am
- Location: Almere, The Netherlands
Re: Ryzen 2 and BMI2?
PEXT and his counterparty PDEP are both incredible slow on AMD Zen hardware because AMD was lazy and implemented these instructions in microcode instead of logic.Gian-Carlo Pascutto wrote: ↑Tue May 15, 2018 9:04 am Ryzen 1 was really fast at BMI2, it was just slow at a single instruction, i.e. PEXT.
I wouldn't expect this to change. Nothing uses PEXT, aside from some chess engine movegens.
On intel processors you can really make very good use of PEXT in your evaluation function, for instance to index pawn patterns (or any other pattern) in a very fast way. In the pawn evaluator I'm currently working on I use PEXT throughout, using PEXT it runs about twice as fast as what I can get without using PEXT, unfortunately this doesn't work on AMD processors, on AMD is the old vintage way of calculating indices the only solution.
I'm pretty sure that AMD didn't fix this for Zen+ either, maybe they will fix it next year when Zen2 arrives, who knows? Until this is fixed I won't consider buying an AMD processor because it is unusable for the things I want to do, I'd rather wait for Intel Cascade Lake that arrives by the end of the year.
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: Ryzen 2 and BMI2?
Right but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.
The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources.
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: Ryzen 2 and BMI2?
Obviously an Intel-specific instruction will not be applicable to PowerPC, indeed.
FWIW, bsf maps fairly well to the ffs() call in POSIX.
FWIW, bsf maps fairly well to the ffs() call in POSIX.
-
- Posts: 5569
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Ryzen 2 and BMI2?
Did Crafty as included in SPEC CPU2000 not use any of those on platforms where they were available? (Probably not, I guess...)Gian-Carlo Pascutto wrote: ↑Tue May 15, 2018 10:29 pmRight but that's not doable in a benchmark that also has to run on ARM and Power etc and has to be "fair", i.e. what Roland was referring to.
The versions in SPEC don't even use BSF/LZCNT/POPCNT because of the same reasons. Althought it wouldn't surprise me if Intel C++ generates them anyway, as long as you use the SPEC sources.
Apparently SPEC CPU2017 not includes Deep Sjeng but also Leela. Nice