Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Post by smatovic »

Some further info and spec is out:

https://fuse.wikichip.org/news/3600/the ... re-rapids/

https://software.intel.com/content/www/ ... rence.html

In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...

https://en.wikichip.org/wiki/x86/amx#Instructions

As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.


--
Srdja
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Post by Daniel Shawul »

The NNUE stuff probably could probably benefit a lot from it.
But I have my doubts with how far you can with that approach with such a tiny net.
At some point, you have to increase the net size to improve further, which then necessitates use of MCTS over AB ..
I don't have the energy to implement NNUE in my own engine right now but it would be interesting to
see it implemented in another engine than Stockfish.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Post by Daniel Shawul »

Indeed that could definitely help some.
I actually do it for the endgame when the number of pieces <= 14.
With many divisions other than opening/midgame/endgame though, there is going to be lots of redundant
knowledge encoded in each net.
Handwritten evals seem replaceable with NN on the CPU now, but how far you can go with them remains to be seen.
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Post by smatovic »

Sorry, I deleted the post you referred to....here it is...

A simple trick to increase accumulated NN size could be to split the NN into opening, middle and endgame, up to 30 distinct NNs indexed by piece count for example.

--
Srdja
Daniel Shawul wrote: Sun Jul 05, 2020 3:37 pm Indeed that could definitely help some.
I actually do it for the endgame when the number of pieces <= 14.
With many divisions other than opening/midgame/endgame though, there is going to be lots of redundant
knowledge encoded in each net.
Handwritten evals seem replaceable with NN on the CPU now, but how far you can go with them remains to be seen.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Post by Milos »

smatovic wrote: Sun Jul 05, 2020 12:22 pm Some further info and spec is out:

https://fuse.wikichip.org/news/3600/the ... re-rapids/

https://software.intel.com/content/www/ ... rence.html

In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...

https://en.wikichip.org/wiki/x86/amx#Instructions

As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.


--
Srdja
Here is much more useful info:
https://software.intel.com/sites/landin ... VX512_BF16
smatovic
Posts: 2645
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Post by smatovic »

Milos wrote: Mon Jul 06, 2020 2:09 am
smatovic wrote: Sun Jul 05, 2020 12:22 pm Some further info and spec is out:

https://fuse.wikichip.org/news/3600/the ... re-rapids/

https://software.intel.com/content/www/ ... rence.html

In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...

https://en.wikichip.org/wiki/x86/amx#Instructions

As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.


--
Srdja
Here is much more useful info:
https://software.intel.com/sites/landin ... VX512_BF16
AVX512_BF16 != AMX-BF16

--
Srdja
Tony P.
Posts: 216
Joined: Sun Jan 22, 2017 8:30 pm
Location: Russia

Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)

Post by Tony P. »

Thanks for sharing the exciting news! Now I've started dreaming of competitions held on a single CPU server again to end the debate about the hardware bias. It will be interesting to see how long it will take anyone to beat SF (both classical and EUNN) on its home turf.