syzygy wrote: ↑Tue Dec 08, 2020 12:29 am
The Apple ARM chips were underhyped. They beat all expectations.
Do you think one or two x86 die shrinks will give 10x better performance per Watt?
If you buy Apple, you obviously pay for the brand name, but you also pay for a build quality that you won't find in an Asus gaming laptop. Apple laptops aren't much more expensive than Dell XPS laptops.
I won't be buying an Apple M1 laptop though, as I wouldn't be able to run Linux on it. But an Apple M1 Mac mini would make an interesting toy.
Even Apple apologists don't claim more than the 3-4x perf-per-watt that you get when comparing a mobile M1 designed for low power consumption with a stock desktop x86 CPU designed for max performance. 10x is fantasy.
And this way of comparing perf-per-watt is completely wrong when it comes to architectural comparisons.
This is like when Nvidia compared an undervolted and underclocked RTX 3080 to a RTX 2080 at iso-performance, putting them at very different points of their efficiency curve, to claim a 2x perf-per-watt advantage for Ampere compared to Turing. The real improvement from architectural changes and process nodes was somewhere around 20% (GPUs going wider make it hard to normalize).
The proper measure of performance per watt is performance at iso-wattage. A laptop chip or a server chip, for example, aren't designed around a target level of performance with then an adjusted wattage. They are designed around a target wattage and try to get the highest level of performance possible within these power consumption limit, which often involves finding ways to reduce power consumption.
This is even more relevant for CPUs where normalization is easy by looking at single-core performance. Comparison at iso-wattage is much more resilient to the distortions of perf/watt introduced by the efficiency curve. Increasing clocks increases power consumption by itself, but also increases the required voltage increasing power consumption once again. Chasing the last few percents of performance make power consumption explode.
A Zen 3 core with reduced clocks and voltage to use similar power to a M1 core ends up having 20-25% less performance (25-33% perf advantage for the M1). A sizable advantage for Apple, but not a mindblowing difference that would condemn x86 CPUs to irrelevance. A lot of Apple's advantage is the fruit of them hiring a lot of very highly skilled engineers and giving them the transistor budget to do something great (Apple designs are not focused on minimizing die space), this is not by itself a proof that ARM is significantly better than x86.
It would be tempting to correct for Apple's 5nm vs 7nm node advantage by applying the TSMC claim of 30% power reduction to the Zen3 core, getting a similar perf/watt, but that would be a mistake because it would break the iso-power comparison. The performance of a Zen3 core with a third more power than a M1 core is a better comparison, and the Zen3 core still loses, but by a thinner margin than when not correcting for process node.
You'll notice that Apple's own Icestorm cores have a better perf/watt than their Firestorm cores, yet the hype comes from the big Firestorm cores. This is also a reminder that the absolute level of performance reached is, for many workloads, very important. The sole perf/watt is not the only relevant factor for a responsiveness-oriented workload rather than an instance-parallelizable throughput-oriented workload like many server tasks are.
If one designs peaks at 1.00 perf at 5W but can't reach more performance at 15W, and another design gets 0.9 perf at 5W but peaks at 1.1 perf at 15W, the first design is the better choice if you have a 5W budget but the second one is better if you have a 15W budget.