As the upcoming 13th generation INTEL CPU's are doing away with AVX-512 all together and the upcoming AMD Ryzen (both to be released about the same time) are set to included the AVX 512 instruction set, I am wondering just what the different is in strength of analysis ( 3 sec, 5 min, overnight, etc) of a move as concerns engines like Dragon / Komodo / Stockfish.
I seem to recall Larry Kaufman (a year or so ago) thinking the difference between 256 and 512 was not very much, but I do not think I have ever seen anyone try to quantify just what the differences might be. I know 'more is better', but...not sure about how that corresponds to real world chess analysis.
Curious as I am about to get a new computer...and expect to run engines on it quite often for analysis of lines (primarily) or games. I am currently looking at a set up with a Ryzen 5950x - CPU
AVX v AVX 2 vs AVX 512 - Engine Analysis
Moderator: Ras
-
- Posts: 646
- Joined: Mon Jun 20, 2022 4:08 am
- Full name: Brian D. Smith
-
- Posts: 476
- Joined: Sun Mar 17, 2019 12:00 pm
- Full name: Henk Drost
Re: AVX v AVX 2 vs AVX 512 - Engine Analysis
Use whatever's fastest.
-
- Posts: 5685
- Joined: Wed Sep 05, 2018 2:16 am
- Location: Moving
- Full name: Jorge Picado
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: AVX v AVX 2 vs AVX 512 - Engine Analysis
We won't be able to tell until benchmarks are out, with optimized engines, cos in the past there were underclocking issues with broader bit-width and we don't know for sure if only the AVX-512 instruction set is supported via AVX2-256 or via a real 512-bit wide vector-unit, time will tell. Sopel et al. could estimate the max NPS speedup possible by AVX-512, maybe 1.25x to 1.5x, depends on engine and implementation.
--
Srdja
--
Srdja
-
- Posts: 195
- Joined: Thu Feb 04, 2021 10:24 pm
- Full name: Arnold Magnum
Re: AVX v AVX 2 vs AVX 512 - Engine Analysis
It’s your self compiled version.
Or use a simple MacBook Pro M1 MAX.
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: AVX v AVX 2 vs AVX 512 - Engine Analysis
Anandtech on AMD Zen 4 Ryzen and AVX-512:
https://www.anandtech.com/show/17552/am ... ng-sept-27
--
Srdja
https://www.anandtech.com/show/17552/am ... ng-sept-27
As the article mentions, NNUE inference might profit from new insructions but the SIMD-width did not double.[...]
Papermaster also confirmed for the first time that Zen 4 – including the Ryzen 7000 series – will support AVX-512 instructions. AVX-512 is a bit of a mess of standards, so besides the foundation (AVX-512F) instructions, it’s still not entirely clear which subsets of AVX-512 AMD will support. But Papermaster did explicitly mention Vector Neural Network Instructions (VNNI) as among the additional subsets supported.
Critically, however, AMD is diverging from Intel in one important aspect: whereas Intel built a true, 512-bit wide SIMD machine for executing AVX-512 instructions, AMD did not. Instead, AMD will be executing these instructions over two cycles. This means AMD’s implementation still benefits from all of the additional instructions, register file space, and other technical improvements that came as part of AVX-512, but they won’t gain the innate doubling in SIMD throughput.
In discussing the rationale for AMD’s decision, Papermaster cited the extreme power requirements for a true 512-bit SIMD block as the biggest impetus for keeping AMD’s SIMD design at 256-bits. As we’ve already seen in Intel chips with AVX-512 support, the massive throughput of a 512-bit SIMD combined with its high density results in a hard spike in power consumption when using it, requiring Intel’s chips to downclock on AVX-512 workloads (sometimes severely) in order to keep power and thermals in check. Using a narrower 256-bit SIMD means that AMD won’t need to light up nearly as many transistors at once, which will in turn make it easier to keep clockspeeds and power consumption more consistent. At the same time, I don’t think AMD minds that the die space requirements for a 256-bit SIMD are significantly less than a 512-bit SIMD; a full 512-bit SIMD is a lot of transistors to build, and a lot of transistors to fire up during heavy workloads.
[...]
--
Srdja
-
- Posts: 646
- Joined: Mon Jun 20, 2022 4:08 am
- Full name: Brian D. Smith
Re: AVX v AVX 2 vs AVX 512 - Engine Analysis
Even then...isn't letting non-AVX run a couple of seconds more on a position, essentially the same as AVX-512? I ask because I've never seen any data about how much 'faster' 512 really is.smatovic wrote: ↑Tue Aug 30, 2022 9:21 am Anandtech on AMD Zen 4 Ryzen and AVX-512:
https://www.anandtech.com/show/17552/am ... ng-sept-27
As the article mentions, NNUE inference might profit from new insructions but the SIMD-width did not double.[...]
Papermaster also confirmed for the first time that Zen 4 – including the Ryzen 7000 series – will support AVX-512 instructions. AVX-512 is a bit of a mess of standards, so besides the foundation (AVX-512F) instructions, it’s still not entirely clear which subsets of AVX-512 AMD will support. But Papermaster did explicitly mention Vector Neural Network Instructions (VNNI) as among the additional subsets supported.
Critically, however, AMD is diverging from Intel in one important aspect: whereas Intel built a true, 512-bit wide SIMD machine for executing AVX-512 instructions, AMD did not. Instead, AMD will be executing these instructions over two cycles. This means AMD’s implementation still benefits from all of the additional instructions, register file space, and other technical improvements that came as part of AVX-512, but they won’t gain the innate doubling in SIMD throughput.
In discussing the rationale for AMD’s decision, Papermaster cited the extreme power requirements for a true 512-bit SIMD block as the biggest impetus for keeping AMD’s SIMD design at 256-bits. As we’ve already seen in Intel chips with AVX-512 support, the massive throughput of a 512-bit SIMD combined with its high density results in a hard spike in power consumption when using it, requiring Intel’s chips to downclock on AVX-512 workloads (sometimes severely) in order to keep power and thermals in check. Using a narrower 256-bit SIMD means that AMD won’t need to light up nearly as many transistors at once, which will in turn make it easier to keep clockspeeds and power consumption more consistent. At the same time, I don’t think AMD minds that the die space requirements for a 256-bit SIMD are significantly less than a 512-bit SIMD; a full 512-bit SIMD is a lot of transistors to build, and a lot of transistors to fire up during heavy workloads.
[...]
--
Srdja
In the end, for analysis, you are not in a 'race' situation...so the extra seconds really do not matter, just the engine software.
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: AVX v AVX 2 vs AVX 512 - Engine Analysis
Hmm, if you run your chess analysis only for a couple of sencods it really does not matter if AVX-512 gives a NPS speedup of maybe 1.25x or not.CornfedForever wrote: ↑Tue Aug 30, 2022 2:46 pm Even then...isn't letting non-AVX run a couple of seconds more on a position, essentially the same as AVX-512? I ask because I've never seen any data about how much 'faster' 512 really is.
In the end, for analysis, you are not in a 'race' situation...so the extra seconds really do not matter, just the engine software.
--
Srdja
-
- Posts: 3169
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: AVX v AVX 2 vs AVX 512 - Engine Analysis
Just for the files:
AMD Zen 3 supports AVX2 via 256b vector unit, Idk how many pipelines (ie for HyperThreading/SMT) per core.
AMD Zen 4 supports AVX-512 via 2x256b vector unit:
With four 256b vector units per core, probably for HyperThreading/SMT.
AMD Zen 5 has additional 512-bit data path:
Idk with how many pipelines in total.
And, Stockfish 16.1 gains more by wider vector unit than SF 14.1, here probably Amdahl's Law steps in, cos of bigger net size:
https://ipmanchess.yolasite.com/amd--in ... ckfish.php
viewtopic.php?p=967471#p967471
--
Srdja
AMD Zen 3 supports AVX2 via 256b vector unit, Idk how many pipelines (ie for HyperThreading/SMT) per core.
AMD Zen 4 supports AVX-512 via 2x256b vector unit:
https://en.wikipedia.org/wiki/Zen_4Zen 4 is the first AMD microarchitecture to support AVX-512 instruction set extension. Most 512-bit vector instructions are split in two and executed by the 256-bit SIMD execution units internally. The two halves execute in parallel on a pair of execution units and are still tracked as a single micro-OP (except for stores), which means the execution latency isn't doubled compared to 256-bit vector instructions. There are four 256-bit execution units, which gives a maximum throughput of two 512-bit vector instructions per clock cycle, e.g. one multiplication and one addition. The maximum number of instructions per clock cycle is doubled for vectors of 256 bits or less. Load and store units are also 256 bits each, retaining the throughput of up to two 256-bit loads or one store per cycle that was supported by Zen 3. This translates to up to one 512-bit load per cycle or one 512-bit store per two cycles.[10][12][13]
With four 256b vector units per core, probably for HyperThreading/SMT.
AMD Zen 5 has additional 512-bit data path:
https://en.wikipedia.org/wiki/Zen_5Zen 4 introduced AVX-512 instructions. AVX-512 capabilities have been expanded with Zen 5 with a doubling of the floating point pipe width to a native 512-bit floating point datapath. The AVX-512 datapath is configurable depending on the product. Ryzen 9000 series desktop processors and EPYC 9005 server processors feature the full 512-bit datapath but Ryzen AI 300 mobile processors feature a 256-bit datapath in order to reduce power consumption. AVX-512 instruction has been extended to VNNI/VEX instructions. Additionally, there is greater bfloat16 throughput which is beneficial for AI workloads.
Idk with how many pipelines in total.
And, Stockfish 16.1 gains more by wider vector unit than SF 14.1, here probably Amdahl's Law steps in, cos of bigger net size:
https://ipmanchess.yolasite.com/amd--in ... ckfish.php
viewtopic.php?p=967471#p967471
--
Srdja