Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

smatovic · Post by **smatovic** » Sun Jul 05, 2020 12:22 pm

Some further info and spec is out:

https://fuse.wikichip.org/news/3600/the ... re-rapids/

https://software.intel.com/content/www/ ... rence.html

In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...

https://en.wikichip.org/wiki/x86/amx#Instructions

As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.

--
Srdja

Daniel Shawul · Post by **Daniel Shawul** » Sun Jul 05, 2020 3:13 pm

The NNUE stuff probably could probably benefit a lot from it.
But I have my doubts with how far you can with that approach with such a tiny net.
At some point, you have to increase the net size to improve further, which then necessitates use of MCTS over AB ..
I don't have the energy to implement NNUE in my own engine right now but it would be interesting to
see it implemented in another engine than Stockfish.

Daniel Shawul · Post by **Daniel Shawul** » Sun Jul 05, 2020 3:37 pm

Indeed that could definitely help some.
I actually do it for the endgame when the number of pieces <= 14.
With many divisions other than opening/midgame/endgame though, there is going to be lots of redundant
knowledge encoded in each net.
Handwritten evals seem replaceable with NN on the CPU now, but how far you can go with them remains to be seen.

smatovic · Post by **smatovic** » Sun Jul 05, 2020 3:44 pm

Sorry, I deleted the post you referred to....here it is...

A simple trick to increase accumulated NN size could be to split the NN into opening, middle and endgame, up to 30 distinct NNs indexed by piece count for example.

--
Srdja

Daniel Shawul wrote: ↑Sun Jul 05, 2020 3:37 pm Indeed that could definitely help some.
I actually do it for the endgame when the number of pieces <= 14.
With many divisions other than opening/midgame/endgame though, there is going to be lots of redundant
knowledge encoded in each net.
Handwritten evals seem replaceable with NN on the CPU now, but how far you can go with them remains to be seen.

Milos · Post by **Milos** » Mon Jul 06, 2020 2:09 am

smatovic wrote: ↑Sun Jul 05, 2020 12:22 pm Some further info and spec is out:

https://fuse.wikichip.org/news/3600/the ... re-rapids/

https://software.intel.com/content/www/ ... rence.html

In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...

https://en.wikichip.org/wiki/x86/amx#Instructions

As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.

--
Srdja

Here is much more useful info:
https://software.intel.com/sites/landin ... VX512_BF16

smatovic · Post by **smatovic** » Mon Jul 06, 2020 6:12 am

Milos wrote: ↑Mon Jul 06, 2020 2:09 am
smatovic wrote: ↑Sun Jul 05, 2020 12:22 pm Some further info and spec is out:

https://fuse.wikichip.org/news/3600/the ... re-rapids/

https://software.intel.com/content/www/ ... rence.html

In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...

https://en.wikichip.org/wiki/x86/amx#Instructions

As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.

--
Srdja
Here is much more useful info:
https://software.intel.com/sites/landin ... VX512_BF16

AVX512_BF16 != AMX-BF16

--
Srdja

Tony P. · Post by **Tony P.** » Tue Jul 07, 2020 7:58 am

Thanks for sharing the exciting news! Now I've started dreaming of competitions held on a single CPU server again to end the debate about the hardware bias. It will be interesting to see how long it will take anyone to beat SF (both classical and EUNN) on its home turf.

Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)