Can Stockfish make use of present/future NPUs?

smatovic · Post by **smatovic** » Sun Jun 16, 2024 5:57 pm

Did read the article about NPU, neural processing unit:

https://en.wikipedia.org/wiki/AI_accelerator

Apple has ~38 TOPS in M4, Intel has ~48 and AMD ~50 TOPS in their new mobile processors, ARM has SVE2 for mat-mul and Microsoft Copilot+ requires now 40 TOPS NPU:

Microsoft requires an NPU with performance rated at 40 trillion operations per second (TOPS), a high-level performance figure that Microsoft, Qualcomm, Apple, and others use for NPU performance comparisons. Right now, that requirement can only be met by a single chip in the Windows PC ecosystem, one that isn't even quite available yet: Qualcomm's Snapdragon X Elite and X Plus, launching in the new Surface and a number of PCs from the likes of Dell, Lenovo, HP, Asus, Acer, and other major PC OEMs in the next couple of months. All of those chips have NPUs capable of 45 TOPS, just a shade more than Microsoft's minimum requirement.

https://arstechnica.com/gadgets/2024/05 ... l-and-amd/

What is the SF devs take? Can Stockfish make use of 40+ TOPS NPU for NNUE, or maybe switch to an CNN architecture?

If Microsoft is the driver, there must be a unified way to program these?

--
Srdja

Werewolf · Post by **Werewolf** » Sun Jun 16, 2024 9:52 pm

smatovic wrote: ↑Sun Jun 16, 2024 5:57 pm Did read the article about NPU, neural processing unit:

https://en.wikipedia.org/wiki/AI_accelerator

Apple has ~38 TOPS in M4, Intel has ~48 and AMD ~50 TOPS in their new mobile processors, ARM has SVE2 for mat-mul and Microsoft Copilot+ requires now 40 TOPS NPU:

Microsoft requires an NPU with performance rated at 40 trillion operations per second (TOPS), a high-level performance figure that Microsoft, Qualcomm, Apple, and others use for NPU performance comparisons. Right now, that requirement can only be met by a single chip in the Windows PC ecosystem, one that isn't even quite available yet: Qualcomm's Snapdragon X Elite and X Plus, launching in the new Surface and a number of PCs from the likes of Dell, Lenovo, HP, Asus, Acer, and other major PC OEMs in the next couple of months. All of those chips have NPUs capable of 45 TOPS, just a shade more than Microsoft's minimum requirement.
https://arstechnica.com/gadgets/2024/05 ... l-and-amd/

What is the SF devs take? Can Stockfish make use of 40+ TOPS NPU for NNUE, or maybe switch to an CNN architecture?

If Microsoft is the driver, there must be a unified way to program these?

--
Srdja

If this is possible, can’t we just go to the top of the tree with Nvidia who have hundreds of TOPS?

smatovic · Post by **smatovic** » Mon Jun 17, 2024 3:07 am

smatovic wrote: ↑Sun Jun 16, 2024 5:57 pm ARM has SVE2 for mat-mul

Typo, I meant SME2.

--
Srdja

smatovic · Post by **smatovic** » Mon Jun 17, 2024 3:14 am

Werewolf wrote: ↑Sun Jun 16, 2024 9:52 pm If this is possible, can’t we just go to the top of the tree with Nvidia who have hundreds of TOPS?

Therefore you have Lc0 with batches on GPU:

https://www.chessprogramming.org/GPU#Ho ... _Latencies

I assume the offload latencies for NPU in CPU are much lower?

So maybe a merger of Lc0 + SF running on NPU in CPU.

AB search + CNN eval on CPU+NPU, instead of AB search + NNUE eval on CPU+SIMD?

Something like this.

Or maybe you can run a bigger NNUE on NPU, Idk.

--
Srdja

smatovic · Post by **smatovic** » Mon Jun 17, 2024 3:28 am

What is an NPU: the new AI chips explained
https://www.techradar.com/computing/cpu/what-is-an-npu

--
Srdja

smatovic · Post by **smatovic** » Thu Sep 12, 2024 10:02 am

What do the Lc0 guys say? Lc0 CNN on a NPU?

Nvidia RTX 2080 has ~80 TOPS (Tensor FP16)*, NPU has 40+ TOPS (INT8?).

* https://en.wikipedia.org/wiki/List_of_N ... _20_series

--
Srdja

Hai · Post by **Hai** » Tue Jun 03, 2025 1:16 am

smatovic wrote: ↑Thu Sep 12, 2024 10:02 am What do the Lc0 guys say? Lc0 CNN on a NPU?

Nvidia RTX 2080 has ~80 TOPS (Tensor FP16)*, NPU has 40+ TOPS (INT8?).

* https://en.wikipedia.org/wiki/List_of_N ... _20_series

--
Srdja

LC0 on NPU is a great idea.

One person on discord mentioned that it's already the case.
Probably we will see some tests and results in the next time.

Can Stockfish make use of present/future NPUs?

Can Stockfish make use of present/future NPUs?

Re: Can Stockfish make use of present/future NPUs?

Re: Can Stockfish make use of present/future NPUs?

Re: Can Stockfish make use of present/future NPUs?

Re: Can Stockfish make use of present/future NPUs?

Re: Can Stockfish make use of present/future NPUs? Lc0?

Re: Can Stockfish make use of present/future NPUs? Lc0?