Can Stockfish make use of present/future NPUs?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

smatovic
Posts: 2873
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Can Stockfish make use of present/future NPUs?

Post by smatovic »

Did read the article about NPU, neural processing unit:

https://en.wikipedia.org/wiki/AI_accelerator

Apple has ~38 TOPS in M4, Intel has ~48 and AMD ~50 TOPS in their new mobile processors, ARM has SVE2 for mat-mul and Microsoft Copilot+ requires now 40 TOPS NPU:
Microsoft requires an NPU with performance rated at 40 trillion operations per second (TOPS), a high-level performance figure that Microsoft, Qualcomm, Apple, and others use for NPU performance comparisons. Right now, that requirement can only be met by a single chip in the Windows PC ecosystem, one that isn't even quite available yet: Qualcomm's Snapdragon X Elite and X Plus, launching in the new Surface and a number of PCs from the likes of Dell, Lenovo, HP, Asus, Acer, and other major PC OEMs in the next couple of months. All of those chips have NPUs capable of 45 TOPS, just a shade more than Microsoft's minimum requirement.
https://arstechnica.com/gadgets/2024/05 ... l-and-amd/

What is the SF devs take? Can Stockfish make use of 40+ TOPS NPU for NNUE, or maybe switch to an CNN architecture?

If Microsoft is the driver, there must be a unified way to program these?

--
Srdja
Werewolf
Posts: 1888
Joined: Thu Sep 18, 2008 10:24 pm

Re: Can Stockfish make use of present/future NPUs?

Post by Werewolf »

smatovic wrote: Sun Jun 16, 2024 5:57 pm Did read the article about NPU, neural processing unit:

https://en.wikipedia.org/wiki/AI_accelerator

Apple has ~38 TOPS in M4, Intel has ~48 and AMD ~50 TOPS in their new mobile processors, ARM has SVE2 for mat-mul and Microsoft Copilot+ requires now 40 TOPS NPU:
Microsoft requires an NPU with performance rated at 40 trillion operations per second (TOPS), a high-level performance figure that Microsoft, Qualcomm, Apple, and others use for NPU performance comparisons. Right now, that requirement can only be met by a single chip in the Windows PC ecosystem, one that isn't even quite available yet: Qualcomm's Snapdragon X Elite and X Plus, launching in the new Surface and a number of PCs from the likes of Dell, Lenovo, HP, Asus, Acer, and other major PC OEMs in the next couple of months. All of those chips have NPUs capable of 45 TOPS, just a shade more than Microsoft's minimum requirement.
https://arstechnica.com/gadgets/2024/05 ... l-and-amd/

What is the SF devs take? Can Stockfish make use of 40+ TOPS NPU for NNUE, or maybe switch to an CNN architecture?

If Microsoft is the driver, there must be a unified way to program these?

--
Srdja
If this is possible, can’t we just go to the top of the tree with Nvidia who have hundreds of TOPS?
smatovic
Posts: 2873
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Can Stockfish make use of present/future NPUs?

Post by smatovic »

smatovic wrote: Sun Jun 16, 2024 5:57 pm ARM has SVE2 for mat-mul
Typo, I meant SME2.

--
Srdja
smatovic
Posts: 2873
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Can Stockfish make use of present/future NPUs?

Post by smatovic »

Werewolf wrote: Sun Jun 16, 2024 9:52 pm If this is possible, can’t we just go to the top of the tree with Nvidia who have hundreds of TOPS?
Therefore you have Lc0 with batches on GPU:

https://www.chessprogramming.org/GPU#Ho ... _Latencies

I assume the offload latencies for NPU in CPU are much lower?

So maybe a merger of Lc0 + SF running on NPU in CPU.

AB search + CNN eval on CPU+NPU, instead of AB search + NNUE eval on CPU+SIMD?

Something like this.

Or maybe you can run a bigger NNUE on NPU, Idk.

--
Srdja
smatovic
Posts: 2873
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Can Stockfish make use of present/future NPUs?

Post by smatovic »

What is an NPU: the new AI chips explained
https://www.techradar.com/computing/cpu/what-is-an-npu

--
Srdja
smatovic
Posts: 2873
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Can Stockfish make use of present/future NPUs? Lc0?

Post by smatovic »

What do the Lc0 guys say? Lc0 CNN on a NPU?

Nvidia RTX 2080 has ~80 TOPS (Tensor FP16)*, NPU has 40+ TOPS (INT8?).

* https://en.wikipedia.org/wiki/List_of_N ... _20_series

--
Srdja