Hmm, Nvidia announces GB10 Superchip in a desktop for 3000$
Project DIGITS features the new NVIDIA GB10 Grace Blackwell Superchip, offering a petaflop of AI computing performance for prototyping, fine-tuning and running large AI models.
GB10 features an NVIDIA Blackwell GPU with latest-generation CUDA® cores and fifth-generation Tensor Cores, connected via NVLink®-C2C chip-to-chip interconnect to a high-performance NVIDIA Grace™ CPU, which includes 20 power-efficient cores built with the Arm architecture. MediaTek, a market leader in Arm-based SoC designs, collaborated on the design of GB10, contributing to its best-in-class power efficiency, performance and connectivity.
The GB10 Superchip enables Project DIGITS to deliver powerful performance using only a standard electrical outlet. Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage. With the supercomputer, developers can run up to 200-billion-parameter large language models to supercharge AI innovation. In addition, using NVIDIA ConnectX® networking, two Project DIGITS AI supercomputers can be linked to run up to 405-billion-parameter models.
With the Grace Blackwell architecture, enterprises and researchers can prototype, fine-tune and test models on local Project DIGITS systems running Linux-based NVIDIA DGX OS, and then deploy them seamlessly on NVIDIA DGX Cloud™, accelerated cloud instances or data center infrastructure.
Perspective: the new workstation in the linked article will offer a petaflop for $3,000 . The world's fastest supercomputer in 2008-2009 was the IBM Roadrunner (link) which clocked in at a petaflop.
To be fair to the big iron, this new device's petaflop will be at FP4 precision (sign: 1 bit, exponent: 2 bits, mantissa: 1 bit). At just 4 bits, I would seriously consider using a hardware lookup table for multiplication - there would be just 2^8=256 different possible outcomes.
Want to attract exceptional people? Be exceptional.
Yes, interesting, Lc0 benchmarks will tell, probably better with BT series network than T series, cos Transformers can utilize better the new TensorCores than CNNs....with the step from RTX 20xx series to 30xx we had good looking numbers on paper, but it did not translate to a big jump in NPS for Lc0 for multiple reasons.
Jouni wrote: ↑Wed Jan 08, 2025 11:13 am
Have You looked at TCEC? GPU: 8x NVIDIA RTX 4090 is quite useless for Lc0. Loses almost all games with black to SF!
I don't think this follows at all.
My Ford was upgraded from a 2 litre engine to a 3 litre engine and is now much faster, but it is still slower than a Ferrari.
So what? The upgrade is still worth it.
Yes, interesting, Lc0 benchmarks will tell, probably better with BT series network than T series, cos Transformers can utilize better the new TensorCores than CNNs....with the step from RTX 20xx series to 30xx we had good looking numbers on paper, but it did not translate to a big jump in NPS for Lc0 for multiple reasons.
--
Srdja
It was a great shame about BT5, but hopefully they'll make a breakthrough.
One of the important aspects of Intel's Panther Lake is that it will be produced on the company's 18A process technology (1.8nm-class), a make-or-break production node.
...new CEO mentioned that 18A process is essential to Intel, if they don't succeed Intel plans to outsource their foundry branch.