The Next Big Thing in Computer Chess?

Dann Corbit · Post by **Dann Corbit** » Sat May 20, 2023 7:31 am

syzygy wrote: ↑Fri May 19, 2023 10:13 pm
alvinypeng wrote: ↑Wed May 17, 2023 2:51 pmMy original statement only claimed using a Lc0-type net in AB was possible, and nothing about the strength of such an engine. However, I do believe it would be at least at superhuman level, and maybe even be able to beat SF8. Whether or not that qualifies as playing "well" depends on your definition of "well" I guess.
I told you the reason why it isn't done. I don't know why you are arguing.

There is already a new architecture (available for the zillion dollar data center cards) that allows transparent access to memory for both the CPU and GPU. IOW, the CPU can directly address video ram and the access is controlled the same as regular RAM memory. Eventually, this will become available on consumer cards. At that point we won't have to copy anything back and forth to the GPU and latency will not be the same kind of issue.

Nobody knows exactly when this will happen, of course.

syzygy · Post by **syzygy** » Tue May 23, 2023 10:17 pm

Dann Corbit wrote: ↑Sat May 20, 2023 7:31 am
syzygy wrote: ↑Fri May 19, 2023 10:13 pm
alvinypeng wrote: ↑Wed May 17, 2023 2:51 pmMy original statement only claimed using a Lc0-type net in AB was possible, and nothing about the strength of such an engine. However, I do believe it would be at least at superhuman level, and maybe even be able to beat SF8. Whether or not that qualifies as playing "well" depends on your definition of "well" I guess.
I told you the reason why it isn't done. I don't know why you are arguing.
There is already a new architecture (available for the zillion dollar data center cards) that allows transparent access to memory for both the CPU and GPU. IOW, the CPU can directly address video ram and the access is controlled the same as regular RAM memory. Eventually, this will become available on consumer cards. At that point we won't have to copy anything back and forth to the GPU and latency will not be the same kind of issue.

Nobody knows exactly when this will happen, of course.

Even then I wonder if it will be suitable for AB search (although it will obviously help).

As far as I understand, Lc0 lets a GPU evaluate a whole batch of positions at a time because this is what the GPU is good at (default batch size seems to be 256). A single-threaded AB engine would only need to evaluate a single position at a time. A multi-threaded AB engine could theoretically evaluate a bunch of positions at the same time, but it would mean threads have to wait for other threads to be ready to get their position evaluated. Perhaps this can be lived with, e.g. by spawning 512 search threads and running the evaluation whenever 256 threads have a position to evaluate, but this will result in a quite different AB engine from what we're used to.

I think the question is not so much why does SF not use Lc0 networks but rather why does Lc0 not use AB search. (The answer may still be GPU-CPU communication latency. Perhaps hardware that removes this latency would allow Lc0 to run AB.)

alvinypeng · Post by **alvinypeng** » Wed May 24, 2023 3:41 am

syzygy wrote: ↑Tue May 23, 2023 10:17 pm A multi-threaded AB engine could theoretically evaluate a bunch of positions at the same time, but it would mean threads have to wait for other threads to be ready to get their position evaluated. Perhaps this can be lived with, e.g. by spawning 512 search threads and running the evaluation whenever 256 threads have a position to evaluate, but this will result in a quite different AB engine from what we're used to.

Apparently this has been implemented in Scorpio.

Daniel Shawul wrote: ↑Sun Nov 07, 2021 10:02 pm
j.t. wrote: ↑Sat Nov 06, 2021 12:11 am To make any use of the GPU you need to collect a pretty big amount of positions that you want to evaluate. The longer you postpone evaluating positions, the more you lose the advantages of the alpha-beta. Together with the extreme pruning engines do today, the relative efficient NNUE algorithm, and the overhead that comes from using a GPU, and last but not least the complexity of GPU code, I assume that people just think that it's not worth it.

But I'll be honest and say that I've never tried this myself (I thought about it, but I've never got familiar enough with GPUs to actually try it), so maybe I misjudged the potential of GPUs.
This is wrong! Modern AB engines use many threads, 256 threads with lazy smp, so you can batch together eval requests from different threads.
Each thread evaluates exactly one position and you then effectiviely have a batch size of 256 that should be enough to fill up a GPU with a decent net size (not nnue though since that is too tiny). I already do this in Scorpio and it "works", but it is darn slow even with a tiny 6x64 net ...

syzygy wrote: ↑Tue May 23, 2023 10:17 pm I think the question is not so much why does SF not use Lc0 networks but rather why does Lc0 not use AB search. (The answer may still be GPU-CPU communication latency. Perhaps hardware that removes this latency would allow Lc0 to run AB.)

A deep neural network (like a Lc0 network) acts like a shallow search. And a deep AB search has a bunch of shallow searches at the ends. So why not replace the shallow searches at the ends with a deep neural network? After all, a deep neural network optimized with gradient descent might yield better results than a bunch of human-written search code.

I think the reason why deep neural networks haven't replaced shallow searches is because they are very inefficient - over 90% of connections in many neural networks are useless and can be pruned away with little to no accuracy loss (as claimed in "The Lottery Ticket Hypothesis" paper). That means we are doing a lot more computation than necessary to evaluate these deep neural networks. Lc0 deals with this inefficiency through brute force, requiring the use of power hungry GPUs. Lc0 is close to Stockfish in terms of strength. But if you look at the power consumption of both engines, I'm pretty sure Lc0 uses way more power.

GPUs are good at dense-dense matrix multiplications. But if you have a sparse neural network with sparse matrix multiplications, perhaps you could evaluate a deep neural network on CPU quickly, thus eliminating any GPU-CPU latency. One big hurdle for this idea is that the neural network would have to be extremely sparse in order to be evaluated on CPU quickly, and I'm not sure how one would train such a sparse network.

chrisw · Post by **chrisw** » Wed May 24, 2023 8:08 am

alvinypeng wrote: ↑Wed May 24, 2023 3:41 am
syzygy wrote: ↑Tue May 23, 2023 10:17 pm A multi-threaded AB engine could theoretically evaluate a bunch of positions at the same time, but it would mean threads have to wait for other threads to be ready to get their position evaluated. Perhaps this can be lived with, e.g. by spawning 512 search threads and running the evaluation whenever 256 threads have a position to evaluate, but this will result in a quite different AB engine from what we're used to.
Apparently this has been implemented in Scorpio.

Daniel Shawul wrote: ↑Sun Nov 07, 2021 10:02 pm
j.t. wrote: ↑Sat Nov 06, 2021 12:11 am To make any use of the GPU you need to collect a pretty big amount of positions that you want to evaluate. The longer you postpone evaluating positions, the more you lose the advantages of the alpha-beta. Together with the extreme pruning engines do today, the relative efficient NNUE algorithm, and the overhead that comes from using a GPU, and last but not least the complexity of GPU code, I assume that people just think that it's not worth it.

But I'll be honest and say that I've never tried this myself (I thought about it, but I've never got familiar enough with GPUs to actually try it), so maybe I misjudged the potential of GPUs.
This is wrong! Modern AB engines use many threads, 256 threads with lazy smp, so you can batch together eval requests from different threads.
Each thread evaluates exactly one position and you then effectiviely have a batch size of 256 that should be enough to fill up a GPU with a decent net size (not nnue though since that is too tiny). I already do this in Scorpio and it "works", but it is darn slow even with a tiny 6x64 net ...

syzygy wrote: ↑Tue May 23, 2023 10:17 pm I think the question is not so much why does SF not use Lc0 networks but rather why does Lc0 not use AB search. (The answer may still be GPU-CPU communication latency. Perhaps hardware that removes this latency would allow Lc0 to run AB.)
A deep neural network (like a Lc0 network) acts like a shallow search. And a deep AB search has a bunch of shallow searches at the ends. So why not replace the shallow searches at the ends with a deep neural network?

This idea is often cited, but overlooks something important.
Let's say "shallow search" begins at depth 7. The shallow searches of AB are not all the same, some will prune at d6, and only contain one node, while others prune at different depths, or extend even, these types of "shallow search" will contain many nodes.
The proposal to replace with NN effectively replaces with something that takes the same time/effort for each "shallow search", ie some sort of constant search as opposed to the AB variable search and suddenly, you lost a major advantage of AB (not all sub-trees after d7 are the same).

After all, a deep neural network optimized with gradient descent might yield better results than a bunch of human-written search code.

I think the reason why deep neural networks haven't replaced shallow searches is because they are very inefficient - over 90% of connections in many neural networks are useless and can be pruned away with little to no accuracy loss (as claimed in "The Lottery Ticket Hypothesis" paper). That means we are doing a lot more computation than necessary to evaluate these deep neural networks. Lc0 deals with this inefficiency through brute force, requiring the use of power hungry GPUs. Lc0 is close to Stockfish in terms of strength. But if you look at the power consumption of both engines, I'm pretty sure Lc0 uses way more power.

GPUs are good at dense-dense matrix multiplications. But if you have a sparse neural network with sparse matrix multiplications, perhaps you could evaluate a deep neural network on CPU quickly, thus eliminating any GPU-CPU latency. One big hurdle for this idea is that the neural network would have to be extremely sparse in order to be evaluated on CPU quickly, and I'm not sure how one would train such a sparse network.

alvinypeng · Post by **alvinypeng** » Wed May 24, 2023 2:16 pm

chrisw wrote: ↑Wed May 24, 2023 8:08 am This idea is often cited, but overlooks something important.
Let's say "shallow search" begins at depth 7. The shallow searches of AB are not all the same, some will prune at d6, and only contain one node, while others prune at different depths, or extend even, these types of "shallow search" will contain many nodes.
The proposal to replace with NN effectively replaces with something that takes the same time/effort for each "shallow search", ie some sort of constant search as opposed to the AB variable search and suddenly, you lost a major advantage of AB (not all sub-trees after d7 are the same).

You are right. But I imagine this "variable effort shallow search" advantage becomes less and less of a factor the deeper you search? It seems like this boils down to knowledge vs search, where a deep NN evaluation has a lot of knowledge but lacks in search.

chrisw · Post by **chrisw** » Wed May 24, 2023 5:36 pm

alvinypeng wrote: ↑Wed May 24, 2023 2:16 pm
chrisw wrote: ↑Wed May 24, 2023 8:08 am This idea is often cited, but overlooks something important.
Let's say "shallow search" begins at depth 7. The shallow searches of AB are not all the same, some will prune at d6, and only contain one node, while others prune at different depths, or extend even, these types of "shallow search" will contain many nodes.
The proposal to replace with NN effectively replaces with something that takes the same time/effort for each "shallow search", ie some sort of constant search as opposed to the AB variable search and suddenly, you lost a major advantage of AB (not all sub-trees after d7 are the same).
You are right. But I imagine this "variable effort shallow search" advantage becomes less and less of a factor the deeper you search? It seems like this boils down to knowledge vs search, where a deep NN evaluation has a lot of knowledge but lacks in search.

Doubt it makes much difference. At whatever point you replace an AB sub tree with a lookup, NN or otherwise, you’re going to lose all the AB smarts that would otherwise have been applied. I guess it could be possible to add a secondary network to decide whether or not to invoke the lookup at each node.

towforce · Post by **towforce** » Wed May 24, 2023 8:51 pm

alvinypeng wrote: ↑Wed May 24, 2023 3:41 amGPUs are good at dense-dense matrix multiplications. But if you have a sparse neural network with sparse matrix multiplications, perhaps you could evaluate a deep neural network on CPU quickly, thus eliminating any GPU-CPU latency. One big hurdle for this idea is that the neural network would have to be extremely sparse in order to be evaluated on CPU quickly, and I'm not sure how one would train such a sparse network.

Bringing in some knowledge from another domain: might be irrelevant - I don't know. In the field of Integer Programming and Mixed Integer Programming (IP and MIP), the problem often comes down to a large sparse matrix. An expert told me that the reason why professional programs (CPLEX, Gurobi, CBC etc) are so much faster than open source ones is that so much more time has gone into developing techniques for collapsing sparse matrices. I wonder if anyone has considered applying such techniques to NNs? Being smaller would save time, power, memory, and the cost of buying GPUs.

I wasn't aware that NNs tend to be sparse. I'm guessing that one issue would be that IP and MIP programs would tend to be working with matrices that are a regular symmetrical shape, whereas the shape of a sparse NN would be more... "scribble".

To be honest, I'm confused by the idea that NNs are sparse: are you telling me that most of the weights in LC0's NN are zero?

towforce · Post by **towforce** » Wed May 24, 2023 9:12 pm

One more idea (though this is so obvious that somebody MUST have tried it): a NN is basically a blob of arithmetic operations - so how about converting the NN to a single mathematical expression and then using a CAS (computer algebra system) to simplify that expression?

Of course, the simplification may end up being not much smaller than the original.

syzygy · Post by **syzygy** » Thu May 25, 2023 10:04 pm

alvinypeng wrote: ↑Wed May 24, 2023 3:41 amA deep neural network (like a Lc0 network) acts like a shallow search. And a deep AB search has a bunch of shallow searches at the ends. So why not replace the shallow searches at the ends with a deep neural network? After all, a deep neural network optimized with gradient descent might yield better results than a bunch of human-written search code.

Because of how AB works, as already discussed. "Replacing shallow searches at the ends" is just another way of saying you call the deep NN to evaluate the position in leaf nodes.

But here I am (and have been) assuming you use the GPU or some other kind of massively parallel hardware to do the evaluation.

If you mean why not use the CPU to evaluate Lc0-type NNs, then the answer is that this is too slow to be competitive.
I suppose an AB search plus Lc0-type evaluation entirely run on CPU would play stronger than Lc0 entirely run on CPU, but it would be a lot weaker than SF on the same hardware. (I would be happy to be proven wrong.)

GPUs (and the DeepMind papers) are what enabled Lc0, and the AlphaZero/Lc0 approach is what allows us to utilise GPUs for chess.

I think the reason why deep neural networks haven't replaced shallow searches is because they are very inefficient - over 90% of connections in many neural networks are useless and can be pruned away with little to no accuracy loss (as claimed in "The Lottery Ticket Hypothesis" paper).

If an Lc0-type network can be reduced 90% in size without loss of quality, then that is an easy way to improve Lc0. But this sounds like low-hanging fruit which, if it existed at all, most likely has already been picked by the Lc0 developers a long time ago. (If not, then it is worth a try.)

If you compare Elo per operations/second, then SF on a CPU is far more "efficient" than Lc0 on a GPU.
But GPU-type hardware is much simpler and can be made to go much faster. CPUs have more or less hit their peak, GPUs are still improving.
(It is an interesting question if AI would be a big thing right now if single-threaded CPU speeds had not hit a wall.)

GPUs are good at dense-dense matrix multiplications. But if you have a sparse neural network with sparse matrix multiplications, perhaps you could evaluate a deep neural network on CPU quickly, thus eliminating any GPU-CPU latency. One big hurdle for this idea is that the neural network would have to be extremely sparse in order to be evaluated on CPU quickly, and I'm not sure how one would train such a sparse network.

Agreed (and I have no idea how to create such a sparse network either).

It seems unlikely that NNUE is the best we can do on CPU, but I don't know how to do it better.

syzygy · Post by **syzygy** » Thu May 25, 2023 10:11 pm

towforce wrote: ↑Wed May 24, 2023 8:51 pm To be honest, I'm confused by the idea that NNs are sparse: are you telling me that most of the weights in LC0's NN are zero?

He is saying that if someone could train a strong Lc0-type network that is sparse, then one could run it somewhat efficiently on a CPU. But this might be completely unfeasible (I have no idea).

The NNUE networks, or at least some of them, are relatively sparse, and some implementations use/have used that to speed up the evaluation.

The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?

Re: The Next Big Thing in Computer Chess?