Some further info and spec is out:
https://fuse.wikichip.org/news/3600/the ... re-rapids/
https://software.intel.com/content/www/ ... rence.html
In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...
https://en.wikichip.org/wiki/x86/amx#Instructions
As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.
--
Srdja
Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)
Moderators: hgm, Rebel, chrisw
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)
The NNUE stuff probably could probably benefit a lot from it.
But I have my doubts with how far you can with that approach with such a tiny net.
At some point, you have to increase the net size to improve further, which then necessitates use of MCTS over AB ..
I don't have the energy to implement NNUE in my own engine right now but it would be interesting to
see it implemented in another engine than Stockfish.
But I have my doubts with how far you can with that approach with such a tiny net.
At some point, you have to increase the net size to improve further, which then necessitates use of MCTS over AB ..
I don't have the energy to implement NNUE in my own engine right now but it would be interesting to
see it implemented in another engine than Stockfish.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)
Indeed that could definitely help some.
I actually do it for the endgame when the number of pieces <= 14.
With many divisions other than opening/midgame/endgame though, there is going to be lots of redundant
knowledge encoded in each net.
Handwritten evals seem replaceable with NN on the CPU now, but how far you can go with them remains to be seen.
I actually do it for the endgame when the number of pieces <= 14.
With many divisions other than opening/midgame/endgame though, there is going to be lots of redundant
knowledge encoded in each net.
Handwritten evals seem replaceable with NN on the CPU now, but how far you can go with them remains to be seen.
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)
Sorry, I deleted the post you referred to....here it is...
A simple trick to increase accumulated NN size could be to split the NN into opening, middle and endgame, up to 30 distinct NNs indexed by piece count for example.
--
Srdja
A simple trick to increase accumulated NN size could be to split the NN into opening, middle and endgame, up to 30 distinct NNs indexed by piece count for example.
--
Srdja
Daniel Shawul wrote: ↑Sun Jul 05, 2020 3:37 pm Indeed that could definitely help some.
I actually do it for the endgame when the number of pieces <= 14.
With many divisions other than opening/midgame/endgame though, there is going to be lots of redundant
knowledge encoded in each net.
Handwritten evals seem replaceable with NN on the CPU now, but how far you can go with them remains to be seen.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)
Here is much more useful info:smatovic wrote: ↑Sun Jul 05, 2020 12:22 pm Some further info and spec is out:
https://fuse.wikichip.org/news/3600/the ... re-rapids/
https://software.intel.com/content/www/ ... rence.html
In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...
https://en.wikichip.org/wiki/x86/amx#Instructions
As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.
--
Srdja
https://software.intel.com/sites/landin ... VX512_BF16
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)
AVX512_BF16 != AMX-BF16Milos wrote: ↑Mon Jul 06, 2020 2:09 amHere is much more useful info:smatovic wrote: ↑Sun Jul 05, 2020 12:22 pm Some further info and spec is out:
https://fuse.wikichip.org/news/3600/the ... re-rapids/
https://software.intel.com/content/www/ ... rence.html
In short, you can define tiles in registers and run matrix math via the TMUL on
them, currently dot-product for BF16 and INT8 are defined...
https://en.wikichip.org/wiki/x86/amx#Instructions
As mentioned in another thread, this stuff could be interesting for NNs with
lower latencies => AB search + NN eval both on CPU+AVX+AMX.
--
Srdja
https://software.intel.com/sites/landin ... VX512_BF16
--
Srdja
-
- Posts: 216
- Joined: Sun Jan 22, 2017 8:30 pm
- Location: Russia
Re: Intel AMX with TMUL on Xeon Sapphire Rapids (2021?)
Thanks for sharing the exciting news! Now I've started dreaming of competitions held on a single CPU server again to end the debate about the hardware bias. It will be interesting to see how long it will take anyone to beat SF (both classical and EUNN) on its home turf.