GPU rumors 2021

smatovic · Post by **smatovic** » Fri Apr 16, 2021 9:02 am

###############################################################################
### GPU ###
###############################################################################
- Nvidia Lovelace
Either the gamer arch split of Nvidia for 2022, or an in between step before
Hopper arch, maybe TSMC 5nm. Doubled amount of cores expected. SM design with
FP32/FP16/INT32 and TensorCores partitioning unknown.

- Nvidia Hopper
Either the server arch gpu of Nvidia for 2022, or the the followup arch of
Lovelace, maybe MCM, multi-chip-modules, design.

- AMD RDNA3
Planed for 2021/2022, maybe TSMC 7nm or 5nm, doubled amount of cores expected,
maybe ~20% performance/watt increase again, maybe MCM, multi-chip-modules,
design.

- Intel Xe
Intel Xe arch is currently present as Xe MAX for notebooks and Iris Xe for
the entry-level desktop. Xe-HPG, high-performance-gaming, (external fab?) has
yet to follow.

###############################################################################
### CPU ###
###############################################################################
- Nvidia Grace
ARM based server-class CPU with NVLink4 planned for 2023, optimized for CPU-
GPU data thoughput.

- AMD Zen4
Planned for 2022, TSMC 5nm, mabye ~25% IPC increase, maybe AVX3-512 support for
NNUE inference.

- Intel Xeon Sapphire Rappids
New CPU design planned for 2021, Intel 10nm (compareable to TSMC 7nm?), DDR5 or
HBM, PCIe5, with AMX (Advanced Matrix eXtensions) for maxtrix-multiplications
used for neural-networks on CPU.

- ARMv9 SVE2
ARM released SVE2 (Scalable Vector Extension 2) with up to 2048 bits vector
width as an replacement for NEON (128b), SVE with 512b was already used in the
A64FX CPU of the Fugaku super-computer, so we might expect broader bit width
for NNUE inference in upcoming ARM based silicon.

###############################################################################
### MISC ###
###############################################################################
- Alps super-computer
Planned for 2023, 20 Exaflops in AI, with Nvidia Grace CPU + Nvidia GPU.

- TensorCores
The TensorCores of Nvidia RTX series are used in gaming for DLSS, some kind
of neural-network based image sampling, it is unknown if this technique will
prevail on the market imo, AMD added Matrix-Cores to its server-class CDNA
arch, Intel will add XMX cores to its Xe-HPC server class gpus, so we have
here a split of gamer-gpu and sever-gpu architecture, unknown what move Nvidia
will make, maybe they will split their gamer and server brands too.

- CPU-GPU coherent memory
With Nvidia moving into the CPU realm we have a tight coupling of CPU-GPU arch
for HPC incoming. IBM dropped NVLink support in their POWER10 series, so all
HPC-GPU vendors will come up with a solution for coherent memory between CPU
and GPU, maybe an open standard like CXL over PCIe, maybe something proprietary
like NVLink and Infinity Fabric, unknown if and how this descents to the gamer
gpu market.

--
Srdja

smatovic · Post by **smatovic** » Fri Apr 16, 2021 9:50 am

Ah, I missed one rumor:

- Nvidia RTX 30xx Super series refresh on TSMC 7nm in 2021?

https://www.thefpsreview.com/2020/10/09 ... m-process/

Dunno how this chip shortage thingy will play out in all of this.

--
Srdja

smatovic · Post by **smatovic** » Fri Apr 16, 2021 5:06 pm

Missed one about Apple silicon for the serious Pro models in 2021/2022...

- up to 128 core GPU (8 core on M1 with > 2 TFLOPS peak)
- up to 32 high-performance-core CPU

https://www.tomshardware.com/news/apple ... ics-report

--
Srdja

smatovic · Post by **smatovic** » Thu Apr 29, 2021 10:01 am

One big player is missing in the above, IBM, will they come up with an own gpu
arch or revive their PowerXCell? I doubt that, with POWER10 they went another
path, up to 16 core SMT8 design CPU, basically 128 cores each with own ALU,
FPU, branch prediction, load/store and 128b SIMD unit. Instead to offload tasks
to external GPU they put the horse-power back into CPU with up to 1 TB/s IO per
socket to external memory controller with up to 16 sockets in total, also
notable 4 MMA (matrix math assist) units per core for NN inference stuff.

https://en.wikipedia.org/wiki/POWER10

--
Srdja

Milos · Post by **Milos** » Thu Apr 29, 2021 12:39 pm

smatovic wrote: ↑Thu Apr 29, 2021 10:01 am One big player is missing in the above, IBM, will they come up with an own gpu
arch or revive their PowerXCell? I doubt that, with POWER10 they went another
path, up to 16 core SMT8 design CPU, basically 128 cores each with own ALU,
FPU, branch prediction, load/store and 128b SIMD unit. Instead to offload tasks
to external GPU they put the horse-power back into CPU with up to 1 TB/s IO per
socket to external memory controller with up to 16 sockets in total, also
notable 4 MMA (matrix math assist) units per core for NN inference stuff.

https://en.wikipedia.org/wiki/POWER10

--
Srdja

Hehe, so wrong.
Can't disclose details but you can check this one:
https://www.ibm.com/blogs/research/2021 ... n-scaling/
It will be part of next IBM products.
Power is yesterday's news, Z systems is to look to (like the upcoming z16).

smatovic · Post by **smatovic** » Sat Aug 28, 2021 4:11 pm

Intel announced the Xe-HPG series 'Arc Alchemist' for Q1 2022, build in an TSMC N6 process, with Matrix Engines onboard which could be used to boost Lc0's NN.

https://fudzilla.com/news/graphics/5340 ... hemist-gpu

--
Srdja

smatovic · Post by **smatovic** » Sat Aug 28, 2021 5:32 pm

Milos wrote: ↑Thu Apr 29, 2021 12:39 pm
smatovic wrote: ↑Thu Apr 29, 2021 10:01 am One big player is missing in the above, IBM, will they come up with an own gpu
arch or revive their PowerXCell? I doubt that, with POWER10 they went another
path, up to 16 core SMT8 design CPU, basically 128 cores each with own ALU,
FPU, branch prediction, load/store and 128b SIMD unit. Instead to offload tasks
to external GPU they put the horse-power back into CPU with up to 1 TB/s IO per
socket to external memory controller with up to 16 sockets in total, also
notable 4 MMA (matrix math assist) units per core for NN inference stuff.

https://en.wikipedia.org/wiki/POWER10

--
Srdja
Hehe, so wrong.
Can't disclose details but you can check this one:
https://www.ibm.com/blogs/research/2021 ... n-scaling/
It will be part of next IBM products.
Power is yesterday's news, Z systems is to look to (like the upcoming z16).

Must be the IBM Telum processor:

https://www.anandtech.com/show/16901/ho ... ire-rapids

https://www.ibm.com/blogs/systems/ibm-t ... -linuxone/

6 TFLOPs per chip (precision?vector/matrix?) with 2 chips per socket, with 4-way sockets up to 32 chip-configurations.

--
Srdja

towforce · Post by **towforce** » Sat Aug 28, 2021 8:01 pm

smatovic wrote: ↑Sat Aug 28, 2021 5:32 pmMust be the IBM Telum processor:

{snip}

https://www.ibm.com/blogs/systems/ibm-t ... -linuxone/

6 TFLOPs per chip (precision?vector/matrix?) with 2 chips per socket, with 4-way sockets up to 32 chip-configurations.

--
Srdja

That is ASTONISHING on a single chip. The first computer to achieve that speed overall wasn't built until 2001 - link.

I know that's not comparing like with like (different precisions, ability to do traditional Linpack test etc), but it's still impressive in a world in which Moore's Law has come to an end!

Much of the work of supercomputers is fluid dynamics (nuclear explosions, aerodynamics, weather etc), but NNs are a serious threat to this income, because they do the job so well with many orders of magnitude less computing. The age of the supercomputer isn't over yet, but I personally wouldn't be investing.

Milos · Post by **Milos** » Mon Aug 30, 2021 12:07 pm

smatovic wrote: ↑Sat Aug 28, 2021 5:32 pm
Milos wrote: ↑Thu Apr 29, 2021 12:39 pm
smatovic wrote: ↑Thu Apr 29, 2021 10:01 am One big player is missing in the above, IBM, will they come up with an own gpu
arch or revive their PowerXCell? I doubt that, with POWER10 they went another
path, up to 16 core SMT8 design CPU, basically 128 cores each with own ALU,
FPU, branch prediction, load/store and 128b SIMD unit. Instead to offload tasks
to external GPU they put the horse-power back into CPU with up to 1 TB/s IO per
socket to external memory controller with up to 16 sockets in total, also
notable 4 MMA (matrix math assist) units per core for NN inference stuff.

https://en.wikipedia.org/wiki/POWER10

--
Srdja
Hehe, so wrong.
Can't disclose details but you can check this one:
https://www.ibm.com/blogs/research/2021 ... n-scaling/
It will be part of next IBM products.
Power is yesterday's news, Z systems is to look to (like the upcoming z16).
Must be the IBM Telum processor:

https://www.anandtech.com/show/16901/ho ... ire-rapids

https://www.ibm.com/blogs/systems/ibm-t ... -linuxone/

6 TFLOPs per chip (precision?vector/matrix?) with 2 chips per socket, with 4-way sockets up to 32 chip-configurations.

--
Srdja

Yes it's IBM Telum, this is now official PR.
Btw. my work was in one of those slides in HotChips presentation

.
All matmul operations are FP16 (IBM's format) with FP32 accumulation.

towforce · Post by **towforce** » Mon Aug 30, 2021 12:42 pm

Milos wrote: ↑Mon Aug 30, 2021 12:07 pmYes it's IBM Telum, this is now official PR.
Btw. my work was in one of those slides in HotChips presentation .
All matmul operations are FP16 (IBM's format) with FP32 accumulation.

I should wait until Telum launches, but in case I forget or something, let me congratulate you now on being a part of what promises to be an outstanding product!

GPU rumors 2021

GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021

Re: GPU rumors 2021