Page 1 of 3

Training the trainer: how is it done for Stockfish?

Posted: Fri Mar 01, 2019 9:25 pm
by mphuget
Hello all,

Usually, when developing our own chess engine, we got this answer: use engine vs. engine PGN files, especially if this is Stockfish (Marco Costalba's answer https://www.reddit.com/r/MachineLearnin ... olutional/). Now, I am wondering how Stockfish is trained to increase its strength. Is this only two versions of Stockfish are played and only the best one is kept after many games?

Thanks for your comments,
mph

Re: Training the trainer: how is it done for Stockfish?

Posted: Fri Mar 01, 2019 9:37 pm
by nionita
Stockfish is not trained. When Stockfish developers have a new idea (e.g. a search or evaluation improvement), they implement and test it by playing many games against current best version. If the new Stockfish is better, then it becomes the current best.

This is the way classic AB engines are improved.

Re: Training the trainer: how is it done for Stockfish?

Posted: Sat Mar 02, 2019 8:21 am
by mcostalba
I had forgot I wrote that post on reddit :-)

The above explanation is correct, let me just add that the most similar thing to training we do (but it is not training) is automatic tuning, that is used for parameter tweaks, i.e. to find the best value of a set of parameters. This is done in automatic way by mean of a tuning algorithm called SPSA (http://www.talkchess.com/forum3/viewtop ... =0&t=40662).

I have to add that not always tuning succeeds, in many cases at the end of tuning the resulting set of parameters fail to pass the verification test....but sometimes it works.

Re: Training the trainer: how is it done for Stockfish?

Posted: Sat Mar 02, 2019 10:39 am
by Steppenwolf
Simple question of a non-programmer of chessengines:

Is it possible to combine some code of Lc0 with SF in one engine (could be easier to handle then LeelaFish) to get all advantages both systems (AB and NN - tactic and strategic behavior) deliver? SF can run on CPU and Lc0 on GPU simultaneously...?

Re: Training the trainer: how is it done for Stockfish?

Posted: Sat Mar 02, 2019 12:23 pm
by brianr
Steppenwolf wrote: Sat Mar 02, 2019 10:39 am Simple question of a non-programmer of chessengines:

Is it possible to combine some code of Lc0 with SF in one engine (could be easier to handle then LeelaFish) to get all advantages both systems (AB and NN - tactic and strategic behavior) deliver? SF can run on CPU and Lc0 on GPU simultaneously...?
Several people are already trying (I am not one of them). As you can appreciate, writing a competitive chess engine is not a trivial exercise (although relatively straightforward these days compared to say 20 years ago). Combining two very different engine architectures is even more difficult. Incidentally, Leela uses both CPU (typically 2 CPUs) and GPU(s).

Re: Training the trainer: how is it done for Stockfish?

Posted: Sat Mar 02, 2019 5:07 pm
by syzygy
brianr wrote: Sat Mar 02, 2019 12:23 pm Several people are already trying (I am not one of them). As you can appreciate, writing a competitive chess engine is not a trivial exercise (although relatively straightforward these days compared to say 20 years ago). Combining two very different engine architectures is even more difficult. Incidentally, Leela uses both CPU (typically 2 CPUs) and GPU(s).
A simple approach would be to use LC0 for the first part of the game until sufficient material has gone from the board that SF is able to basically see to the end and then switch to SF. Certainly in the endgame it seems to make no sense to rely on LC0 if optimal match play is the goal. (But in the opening and early middle game SF apparently has no clue ;-))

Are there many examples of LC0 making serious mistakes early in the game?

Re: Training the trainer: how is it done for Stockfish?

Posted: Sun Mar 03, 2019 2:04 pm
by mcostalba
As a programmer I am inclined to support elegant solutions, more than patchworks.

The basic question is why AlphaZero (and hence LC0) opted for a Monte Carlo search instead of an Alpha/Beta scheme. This is the question that should be answered to have an understanding of the topic, before diving into some Frankenstein approach.

I have no knowledge to answer that question.

Re: Training the trainer: how is it done for Stockfish?

Posted: Sun Mar 03, 2019 2:32 pm
by Daniel Shawul
Very simple. MCTS is the only viable approach given the 80000 nps they got using 4 TPUs.
I have actually an alpha-beta implementation that uses neural networks and gets max 100000 nps using 6x64 nets on the Volta.
That nps is equivalent to maybe a 90's single core cpu nps so it is not competitive at all.

Re: Training the trainer: how is it done for Stockfish?

Posted: Sun Mar 03, 2019 3:00 pm
by chrisw
mcostalba wrote: Sun Mar 03, 2019 2:04 pm As a programmer I am inclined to support elegant solutions, more than patchworks.

The basic question is why AlphaZero (and hence LC0) opted for a Monte Carlo search instead of an Alpha/Beta scheme. This is the question that should be answered to have an understanding of the topic, before diving into some Frankenstein approach.

I have no knowledge to answer that question.
I would guess that it was already answered when AZ paper wrote of MCTS averaging out evaluation errors. Whereas AB will “find” evaluation errors, hence the importance of an “accurate” evaluation function. The NN delivers highly inaccurate evaluations.

Maybe I should rephrase that. Conventional AB evaluations deliver inaccurate evaluations, but with a high degree of relative precision between them. It probably will indeed tell you that two almost identical positions but one with nite developed to f3 and the other not, differ exactly by 0.1 pawns or whatever. But the accuracy of whether the position is winning by plus or minus is often hopelessly bad. Because materialist and so on.

NN evaluations are generally kind of accurate, in that almost by definition, they have a balanced and holistic view of the position. But the relative precision between positions is hopelessly bad. NNs need the averaging effect of the pseudo-MCTS search.

Re: Training the trainer: how is it done for Stockfish?

Posted: Sun Mar 03, 2019 3:23 pm
by Daniel Shawul
MinMax-ing MCTS works better than simple averaging for me even using lczero's nets...
NNs are much more non-linear functions than hand-written evals, which is why they tend to need slightly more
averaging towards the leaves. But that is more like an observation they made rather than their main reason
to choose MCTS.