Ryzen 2 and BMI2?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto »

syzygy wrote: Wed May 16, 2018 1:08 am Did Crafty as included in SPEC CPU2000 not use any of those on platforms where they were available? (Probably not, I guess...)
You have to ask Bob (I don't have a SPEC2000 license) but it's hard to imagine that non-Intel and non-AMD SPEC members wouldn't object to that. Using generic intrinsics like those of GCC (builtin_ffs or what's it called) doesn't work either because it needs to be compilable by pretty much every ages old proprietary compiler out there too.
Apparently SPEC CPU2017 not includes Deep Sjeng but also Leela. Nice :)
Yeah. I regret that SPEC CPU2017 allows parallelism in "speed" benchmarks though (even if only xz uses it). But it's an impossible situation given increasing core counts and how boost speeds influence these benchmarks.
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: Ryzen 2 and BMI2?

Post by yurikvelo »

Still true for Zen 2.
BMI2 compiles are much slower than POPCNT
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Ryzen 2 and BMI2?

Post by Joost Buijs »

yurikvelo wrote: Mon May 18, 2020 9:59 am Still true for Zen 2.
BMI2 compiles are much slower than POPCNT
I'm still waiting for the Intel 10980XE that I want to use for a new workstation, in the mean time I bought an AMD 3970X because I didn't want to wait any longer. It's a nice processor as long as you don't overclock it with precision boost (otherwise it runs extremely hot), PEXT and PDEP are unusable, maybe even worse as Zen 1. I tried to emulate PEXT in software and that runs faster as the native CPU instruction. AVX2 on the AMD is slow too, and it misses AVX-512.

Maybe scatter-gather is not so important for a chess engine, but there are other applications in which it is very useful.

I will keep the AMD 3970X for bulk applications, but as soon as the 10980XE is readily available I will use that one for a new workstation.
Gian-Carlo Pascutto
Posts: 1243
Joined: Sat Dec 13, 2008 7:00 pm

Re: Ryzen 2 and BMI2?

Post by Gian-Carlo Pascutto »

I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Ryzen 2 and BMI2?

Post by bob »

A. You are correct. no intel asm or intel-specific stuff was allowed.

B. The reason I stopped being in SpecInt was stupidity. They decided they wanted to move crafty to the parallel benchmarks. I told them "bad idea" and explained the non-determniism problem. They said "no problem." I replied "node counts will not match, so nobody can verify their test results are correct. They said "no problem." A month or two later, I received a call. "Crafty doesn't produce node counts that match for each run with the same data." I replied "go back and look at all the emails we swapped about this." I got a short "ahhh... that is what you were talking about. sheesh.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Ryzen 2 and BMI2?

Post by Joost Buijs »

Gian-Carlo Pascutto wrote: Fri May 29, 2020 10:40 pm
I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?

The software implementation is very basic and unusable slow too, the only thing that strucks me is that it runs somewhat faster than the native CPU implementation (at least for the problem I used it on, a pattern evaluation routine).

Code: Select all


// PEXT emulation
inline bb_t PEXT(uint64_t src, uint64_t mask)
{
	uint64_t result = 0;

	for (uint64_t bit = 1; mask != 0; bit += bit)
	{
		if (src & mask & -(int64_t)mask)
			result |= bit;

		mask &= mask - 1;
	}

	return result;
}

// PDEP emulation
inline bb_t PDEP(uint64_t src, uint64_t mask)
{
	uint64_t result = 0;

	for (uint64_t bit = 1; mask != 0; bit += bit)
	{
		if (src & bit)
			result |= mask & -(int64_t)mask;

		mask &= mask - 1;
	}

	return result;
}

Maybe there are ways to make something better with AVX2, but that's not general purpose too.

I really hope that AMD will fix these instructions someday, otherwise the Zen2 is a nice processor, unfortunately it has some weaknesses.
User avatar
Ozymandias
Posts: 1532
Joined: Sun Oct 25, 2009 2:30 am

Re: Ryzen 2 and BMI2?

Post by Ozymandias »

Joost Buijs wrote: Sat May 30, 2020 8:36 amZen2 is a nice processor, unfortunately it has some weaknesses.
Biggest weak spot so far: price.
  • Black Friday 2018: Ryzen 7 1700 for 165.99€ at Amazon.
  • Balck Friday 2019: Ryzen 7 2700 for 149.99€ at Amazon.
  • Black Friday 2020: Ryzen 7 3700x for a similar price? If so, weakness removed.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Ryzen 2 and BMI2?

Post by Gerd Isenberg »

Gian-Carlo Pascutto wrote: Fri May 29, 2020 10:40 pm
I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?
https://www.chessprogramming.org/BMI2

The serial implementation of PEXT and PDEP look quite similar.
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: Ryzen 2 and BMI2?

Post by yurikvelo »

Ozymandias wrote: Sat May 30, 2020 9:59 am
  • Ryzen 7 1700 = 166€ = 4.8B transistors
  • Ryzen 7 2700 = 150€ = 4.9B transistors
  • Ryzen 7 3700x = ???€ = 19.2B transistors
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: Ryzen 2 and BMI2?

Post by Joost Buijs »

Gerd Isenberg wrote: Sat May 30, 2020 10:00 am
Gian-Carlo Pascutto wrote: Fri May 29, 2020 10:40 pm
I tried to emulate PEXT in software and that runs faster as the native CPU instruction.
Do you have a fast implementation that you'd want to make public domain?
https://www.chessprogramming.org/BMI2

The serial implementation of PEXT and PDEP look quite similar.

Very well possible, but I'm sure I didn't get it from CPW because I never look there.

About five years ago I got this specific algorithm from somebody who has no connection with computer-chess at all and claimed to be the original author, so I wonder what it's origins are.

I only meant to say that the PEXT implementation of the Zen 2 is so bad that even software emulation runs faster. I'ts a pity because on the AMD I have to replace PEXT with a series of mask and shifts which performs clearly worse.
Last edited by Joost Buijs on Sat May 30, 2020 5:00 pm, edited 1 time in total.