Training the trainer: how is it done for Stockfish?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mphuget
Posts: 13
Joined: Fri Mar 01, 2019 12:46 pm
Full name: Marc-Philippe HUGET

Training the trainer: how is it done for Stockfish?

Post by mphuget »

Hello all,

Usually, when developing our own chess engine, we got this answer: use engine vs. engine PGN files, especially if this is Stockfish (Marco Costalba's answer https://www.reddit.com/r/MachineLearnin ... olutional/). Now, I am wondering how Stockfish is trained to increase its strength. Is this only two versions of Stockfish are played and only the best one is kept after many games?

Thanks for your comments,
mph
nionita
Posts: 175
Joined: Fri Oct 22, 2010 9:47 pm
Location: Austria

Re: Training the trainer: how is it done for Stockfish?

Post by nionita »

Stockfish is not trained. When Stockfish developers have a new idea (e.g. a search or evaluation improvement), they implement and test it by playing many games against current best version. If the new Stockfish is better, then it becomes the current best.

This is the way classic AB engines are improved.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Training the trainer: how is it done for Stockfish?

Post by mcostalba »

I had forgot I wrote that post on reddit :-)

The above explanation is correct, let me just add that the most similar thing to training we do (but it is not training) is automatic tuning, that is used for parameter tweaks, i.e. to find the best value of a set of parameters. This is done in automatic way by mean of a tuning algorithm called SPSA (http://www.talkchess.com/forum3/viewtop ... =0&t=40662).

I have to add that not always tuning succeeds, in many cases at the end of tuning the resulting set of parameters fail to pass the verification test....but sometimes it works.
Steppenwolf
Posts: 75
Joined: Thu Jan 31, 2019 4:54 pm
Full name: Sven Steppenwolf

Re: Training the trainer: how is it done for Stockfish?

Post by Steppenwolf »

Simple question of a non-programmer of chessengines:

Is it possible to combine some code of Lc0 with SF in one engine (could be easier to handle then LeelaFish) to get all advantages both systems (AB and NN - tactic and strategic behavior) deliver? SF can run on CPU and Lc0 on GPU simultaneously...?
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: Training the trainer: how is it done for Stockfish?

Post by brianr »

Steppenwolf wrote: Sat Mar 02, 2019 10:39 am Simple question of a non-programmer of chessengines:

Is it possible to combine some code of Lc0 with SF in one engine (could be easier to handle then LeelaFish) to get all advantages both systems (AB and NN - tactic and strategic behavior) deliver? SF can run on CPU and Lc0 on GPU simultaneously...?
Several people are already trying (I am not one of them). As you can appreciate, writing a competitive chess engine is not a trivial exercise (although relatively straightforward these days compared to say 20 years ago). Combining two very different engine architectures is even more difficult. Incidentally, Leela uses both CPU (typically 2 CPUs) and GPU(s).
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Training the trainer: how is it done for Stockfish?

Post by syzygy »

brianr wrote: Sat Mar 02, 2019 12:23 pm Several people are already trying (I am not one of them). As you can appreciate, writing a competitive chess engine is not a trivial exercise (although relatively straightforward these days compared to say 20 years ago). Combining two very different engine architectures is even more difficult. Incidentally, Leela uses both CPU (typically 2 CPUs) and GPU(s).
A simple approach would be to use LC0 for the first part of the game until sufficient material has gone from the board that SF is able to basically see to the end and then switch to SF. Certainly in the endgame it seems to make no sense to rely on LC0 if optimal match play is the goal. (But in the opening and early middle game SF apparently has no clue ;-))

Are there many examples of LC0 making serious mistakes early in the game?
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Training the trainer: how is it done for Stockfish?

Post by mcostalba »

As a programmer I am inclined to support elegant solutions, more than patchworks.

The basic question is why AlphaZero (and hence LC0) opted for a Monte Carlo search instead of an Alpha/Beta scheme. This is the question that should be answered to have an understanding of the topic, before diving into some Frankenstein approach.

I have no knowledge to answer that question.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Training the trainer: how is it done for Stockfish?

Post by Daniel Shawul »

Very simple. MCTS is the only viable approach given the 80000 nps they got using 4 TPUs.
I have actually an alpha-beta implementation that uses neural networks and gets max 100000 nps using 6x64 nets on the Volta.
That nps is equivalent to maybe a 90's single core cpu nps so it is not competitive at all.
chrisw
Posts: 4313
Joined: Tue Apr 03, 2012 4:28 pm

Re: Training the trainer: how is it done for Stockfish?

Post by chrisw »

mcostalba wrote: Sun Mar 03, 2019 2:04 pm As a programmer I am inclined to support elegant solutions, more than patchworks.

The basic question is why AlphaZero (and hence LC0) opted for a Monte Carlo search instead of an Alpha/Beta scheme. This is the question that should be answered to have an understanding of the topic, before diving into some Frankenstein approach.

I have no knowledge to answer that question.
I would guess that it was already answered when AZ paper wrote of MCTS averaging out evaluation errors. Whereas AB will “find” evaluation errors, hence the importance of an “accurate” evaluation function. The NN delivers highly inaccurate evaluations.

Maybe I should rephrase that. Conventional AB evaluations deliver inaccurate evaluations, but with a high degree of relative precision between them. It probably will indeed tell you that two almost identical positions but one with nite developed to f3 and the other not, differ exactly by 0.1 pawns or whatever. But the accuracy of whether the position is winning by plus or minus is often hopelessly bad. Because materialist and so on.

NN evaluations are generally kind of accurate, in that almost by definition, they have a balanced and holistic view of the position. But the relative precision between positions is hopelessly bad. NNs need the averaging effect of the pseudo-MCTS search.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Training the trainer: how is it done for Stockfish?

Post by Daniel Shawul »

MinMax-ing MCTS works better than simple averaging for me even using lczero's nets...
NNs are much more non-linear functions than hand-written evals, which is why they tend to need slightly more
averaging towards the leaves. But that is more like an observation they made rather than their main reason
to choose MCTS.