Milos wrote: ↑Mon Aug 30, 2021 12:07 pm
Yes it's IBM Telum, this is now official PR.
Btw. my work was in one of those slides in HotChips presentation .
All matmul operations are FP16 (IBM's format) with FP32 accumulation.
Milos, looking now back, with GPGPU, ML, NN, Cloud computing, what's your private opinion, was there a chance for IBM to continue the PowerXCell in 2009 and compete with Nvidia?
I'm heavily sceptical. Problem is that PowerXCell architecture was nothing but, for that time, advanced SIMD. Yes it used some nice tricks for memory access and caching and instruction set extension was nice (basically equivalent to AVX2 on x86 that will appear 6 years later). However, there were no real MVM (matrix-vector multiplication) capabilities, no systolic architecture of MAC units, nothing that would really enable them to perform some decent ML throughput.
Plus it was a recession time (that Sam Palmisano started as CEO and was later finished by the first female CEO that basically ruined IBM) when semiconductor part of the company (IBM microelectronics) started noticeable downsizing (that will finally end up in 2015 by completely destroying it and selling it to GF). So political/strategic climate was bad.
A year ago, it was nearly impossible to buy a GeForce GPU for its intended retail price. Now, the company has the opposite problem. From a report:
Nvidia CEO Jensen Huang said during the company's Q2 2023 earnings call yesterday that the company is dealing with "excess inventory" of RTX 3000-series GPUs ahead of its next-gen RTX 4000 series release later this year. To deal with this, according to Huang, Nvidia will reduce the number of GPUs it sells to manufacturers of graphics cards and laptops so that those manufacturers can clear out their existing inventory. Huang also says Nvidia has "instituted programs to price position our current products to prepare for next-generation products."
When translated from C-suite to English, this means the company will be cutting the prices of current-generation GPUs to make more room for next-generation ones. Those price cuts should theoretically be passed along to consumers somehow, though that will be up to Nvidia's partners. Nvidia announced earlier this month that it would be missing its quarterly projections by $1.4 billion, mainly due to decreased demand for its gaming GPUs. Huang said that "sell-through" of GPUs, or the number of cards being sold to users, had still "increased 70 percent since pre-COVID," though the company still expects year-over-year revenue from GPUs to decline next quarter.
RTX 4000 series release might happen in September, AMD RDNA3 is planned for this fall (Nov?), and Intel Arc series (entry/midrange) is available.
Release planned for December 13th, TSMC 5nm/6nm, multi-chip-modules, L3 cache, and AMD added dedicated AI accelerators for AI tensor ops, can be used for upsampling video games or running CNNs as in Lc0. Nvidia has TensorCores since Volta/Turing, Intel added Matrix Engines in Arc Alchemist, now AMD is also in with RDNA3.
Hey Milos, AFAIK you are working under the hood on the these things, now we see GPUs with dedicated MMAC units, matrix-multiply-accumulate, mobile SoCs have a neural engine, Intel Xeon will contain AMX, what would you prefer in CPU design, multiple, broader general purpose vector-units, or additional dedicated MMAC units? What will prevail in the market in your opinion?
Considering that Nvidia's data center sales now exceed sales of client-oriented products by 2.4 times, the green company can be officially called a data center company, or rather an AI company like the company has preferred in recent years.
GPGPU was driven at first by gamer gpu sales, now it might revert. Shrinking of transistor size was some time driven by mobile SoCs, now the AI sector might take over, or alike.
Let me finish this thread with some (historical) alternative architectures:
There was for example the IBM PowerXCell 8i, used in the IBM Roadrunner super-computer from 2008, the first heterogeneous petaFLOP, a smaller version ran in the PlayStation 3: https://en.wikipedia.org/wiki/Cell_(pro ... erXCell_8i
There is still the NEC SX Aurora (>=2017), a vector-processor on a PCIe card, descendant from the NEC SX super-computer series used e.g. in the Earth Simulator super-computer: https://en.wikipedia.org/wiki/NEC_SX-Aurora_TSUBASA