Page 1 of 1

Perceptron for reduction / extension / other things

Posted: Sat Jan 04, 2020 9:13 pm
by xr_a_y
Sugar-NN include an idea to use a perceptron to tweak LMR reduction "live" (see https://github.com/official-stockfish/S ... eptron_new).

I wonder if the method can be more generic ; I am not sure to like the involved coefficients in the given method and the dependency on accuracy (and the way of computing it).

Starting with a simple perceptron, what search parameters can benefit from a reinforcement learning thing based on this ? And what will be the live training data ?

For example, following the initial idea, maybe reduction (or at least a last correction part of it) can be predicted this way based on some classic inputs (being in check, improving, tt move being a capture, tt move singular, ...). For reduction available training data may be the result reduced search score being < alpha or not. If not, a search without reduction is relaunch so we don't want that too often.

Any ideas on the subject ?

Re: Perceptron for reduction / extension / other things

Posted: Sat Jan 04, 2020 10:07 pm
by jdart
I think there is nothing wrong with it in theory. You can also use a black-box optimizer. CLOP (https://www.remi-coulom.fr/CLOP/) is one framework that can be used for this for example. I believe Stockfish and other programs have used SPSA (referenced in the CLOP paper), and I have tried a couple of other techniques. The problem is that you are really trying to find the optimum set of values (frequently it's a collection of related values) on a surface that is almost flat, and on top of that the value you are optimizing (results from games) is "noisy." So this is the worst case for optimization. You find a direction of change that looks like an improvement, and you go there, but you could be following noise, and also even in the best case you are only going to get a few ELO movement in the results after a lot of games. In theory you can get convergence but it is going to be very, very slow (that was the problem with CLOP when I was using it) and you could still find a local optimum, not the global one.

--Jon

Re: Perceptron for reduction / extension / other things

Posted: Sun Jan 12, 2020 4:33 am
by DustyMonkey
Using a neural net for anything devolves into some manner of alpha/leela methodology.

In this case we are talking about having a policy network as found in the original alpha papers (for the first go engine, I believe)

The policy network would generate a weight for how important a particular move (or the position it leads to) is.

Runtime-wise

The biggest bang for your buck on the reductions would be near the top of the search tree.
Generating extensions near the top of the tree comes with a significant search space explosion.
Evaluating a network near the top of the tree has minor amortized cost.
Evaluating a network at every node has significant cost.
Both existing PVS()-based extension and reduction methods have low runtime costs.

Thus, I suggest such a policy network shouldnt produce extensions, only reductions, and that it would probably be best to only evaluate such a network near the top of the tree as an addition to the existing PVS()-based strategies.

Re: Perceptron for reduction / extension / other things

Posted: Sun Jan 12, 2020 9:20 am
by xr_a_y
DustyMonkey wrote:
Sun Jan 12, 2020 4:33 am
Using a neural net for anything devolves into some manner of alpha/leela methodology.

In this case we are talking about having a policy network as found in the original alpha papers (for the first go engine, I believe)

The policy network would generate a weight for how important a particular move (or the position it leads to) is.

Runtime-wise

The biggest bang for your buck on the reductions would be near the top of the search tree.
Generating extensions near the top of the tree comes with a significant search space explosion.
Evaluating a network near the top of the tree has minor amortized cost.
Evaluating a network at every node has significant cost.
Both existing PVS()-based extension and reduction methods have low runtime costs.

Thus, I suggest such a policy network shouldnt produce extensions, only reductions, and that it would probably be best to only evaluate such a network near the top of the tree as an addition to the existing PVS()-based strategies.
By evaluate, do you mean both train and predict ?

What would be for you a good training data to train about reduction success or not ?