Training the trainer: how is it done for Stockfish?

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
grahamj
Posts: 37
Joined: Thu Oct 11, 2018 12:26 pm
Full name: Graham Jones

Re: Training the trainer: how is it done for Stockfish?

Post by grahamj » Sun Mar 03, 2019 4:04 pm

On why AlphaZero (and hence LC0) opted for a Monte Carlo search...

There are no strong chess engines which use alpha-beta pruning alone. All the strong engines use other techniques to do a huge amount of pruning. These other techniques are mostly chess-specific, so no good for DeepMind's purposes. Also, I don't I think that you could simply take the pruning techniques that work well for SF and expect them them to work well for LC0 (or LC1).

"Monte Carlo search" or "MCTS" is a misnomer for the deterministic tree search algorithm used by A0 and LC0. PUCT, ie predictor + upper confidence tree, is a better name. The predictor is LC0's policy vector, which as Matthew Lai said, is hugely important to performance. You can see some of LC0's search trees (with explanations) via this link to the Leela forum.
https://groups.google.com/forum/#!searc ... Kaf1wWEwAJ
Example attached. Note the branching factor of only 1.28. All those branchless lookaheads near the tips are being chosen entirely by the policy vector - and of course, when you only consider one move at each depth, the distinction between average and minimax disappears.
puct.png
puct.png (146.05 KiB) Viewed 1925 times
Graham Jones, www.indriid.com

Daniel Shawul
Posts: 3749
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: Training the trainer: how is it done for Stockfish?

Post by Daniel Shawul » Sun Mar 03, 2019 4:29 pm

grahamj wrote:
Sun Mar 03, 2019 4:04 pm
On why AlphaZero (and hence LC0) opted for a Monte Carlo search...

There are no strong chess engines which use alpha-beta pruning alone. All the strong engines use other techniques to do a huge amount of pruning. These other techniques are mostly chess-specific, so no good for DeepMind's purposes. Also, I don't I think that you could simply take the pruning techniques that work well for SF and expect them them to work well for LC0 (or LC1).
I think in this discussion by alpha-beta we mean to include all the modern pruning heuristics (nullmove + lmr) etc.

"Monte Carlo search" or "MCTS" is a misnomer for the deterministic tree search algorithm used by A0 and LC0. PUCT, ie predictor + upper confidence tree, is a better name. The predictor is LC0's policy vector, which as Matthew Lai said, is hugely important to performance. You can see some of LC0's search trees (with explanations) via this link to the Leela forum.
https://groups.google.com/forum/#!searc ... Kaf1wWEwAJ
Example attached. Note the branching factor of only 1.28. All those branchless lookaheads near the tips are being chosen entirely by the policy vector - and of course, when you only consider one move at each depth, the distinction between average and minimax disappears.
That is a nice tool btw!
If you plot for 8000 nodes (instead of 800) or more you can see that the upper parts of the tree get more moves so there is a lot to gain
by fine tuning your backup operator. The reason why averaging is best at deeper depths where 1 or more moves are visited is to reduce the uncertainity in evaluation. If you just have one move it doesn't matter, if you have two it does matter whether you averge or minimax. Actually what I do is average for 800 nodes or less, and minimax above that.

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Training the trainer: how is it done for Stockfish?

Post by mcostalba » Mon Mar 04, 2019 6:35 pm

Regarding:

"Very simple. MCTS is the only viable approach given the 80000 nps they got using 4 TPUs."

Well, A/B fully replaced any other scheme when computers where much slower than 80Knps, and MCTS never won above A/B in any period of chess engine history...so maybe not so simple :-)

Regrading:

"it was already answered when AZ paper wrote of MCTS averaging out evaluation errors. Whereas AB will “find” evaluation errors, hence the importance of an “accurate” evaluation function. The NN delivers highly inaccurate evaluations."

I don't understand that line, and your further explanation unfortunately does not clarify it to me. For instance saying "NN evaluations are generally kind of accurate ... But the relative precision between positions is hopelessly bad" it sounds impossible to me: If position A = 1,6 and position B = 1,2 and evaluation is enough accurate, then also A > B is true....and the validity of that comparison is more than enough for A/B to work great. Indeed it is the only thing that matters.

If in the AZ paper only that ambiguous line is dedicated to explain this basic design decision....maybe it 's not clear even to them...they just chose it out of implementation effectiveness.

chrisw
Posts: 2013
Joined: Tue Apr 03, 2012 2:28 pm

Re: Training the trainer: how is it done for Stockfish?

Post by chrisw » Mon Mar 04, 2019 9:53 pm

mcostalba wrote:
Mon Mar 04, 2019 6:35 pm
Regarding:

"Very simple. MCTS is the only viable approach given the 80000 nps they got using 4 TPUs."

Well, A/B fully replaced any other scheme when computers where much slower than 80Knps, and MCTS never won above A/B in any period of chess engine history...so maybe not so simple :-)

Regrading:

"it was already answered when AZ paper wrote of MCTS averaging out evaluation errors. Whereas AB will “find” evaluation errors, hence the importance of an “accurate” evaluation function. The NN delivers highly inaccurate evaluations."

I don't understand that line, and your further explanation unfortunately does not clarify it to me. For instance saying "NN evaluations are generally kind of accurate ... But the relative precision between positions is hopelessly bad" it sounds impossible to me: If position A = 1,6 and position B = 1,2 and evaluation is enough accurate, then also A > B is true....and the validity of that comparison is more than enough for A/B to work great. Indeed it is the only thing that matters.

If in the AZ paper only that ambiguous line is dedicated to explain this basic design decision....maybe it 's not clear even to them...they just chose it out of implementation effectiveness.
okay, rather than stumble around for words, let’s try another view.

Zero NN evaluations make a reasonable effort to evaluate a chess position on the scale 0 to 1. Because holistic. The evaluation is noisy and inaccurate. Because early stage of technology. Averaged out over a tree region, accuracy emerges. Because maths/statistics.
Try using a noisy, inaccurate evaluation in AB and given the slighest opportunity, AB will pick out the inaccuracy and probably lose because of it. Therefore averaging search algorithm required. Hence answer to why AZ used MCTS, although you pointed out that is quite probably a retro-rationalisation.

AB evaluations make a really excellent effort to evaluate a chess position, just as long as certain features are NOT present. The evaluation is usually correctly placed on the scale 0-1 but sometimes, often enough, it gets placed in an entirely wrong point on the scale. Because not holistic enough. As we know, this didn’t used to matter. Now it does. Question is, can AB (or SF basically) wrap sufficient new concepts into its evaluation? Well, you have to define first what conceptually does Zero “know”, how it knows it and can it be duplicated. If sufficient duplication possible without too much time penalty, then game on.

Michel
Posts: 2040
Joined: Sun Sep 28, 2008 11:50 pm

Re: Training the trainer: how is it done for Stockfish?

Post by Michel » Tue Mar 05, 2019 2:49 pm

chrisw wrote:
Mon Mar 04, 2019 9:53 pm
mcostalba wrote:
Mon Mar 04, 2019 6:35 pm
Regarding:

"Very simple. MCTS is the only viable approach given the 80000 nps they got using 4 TPUs."

Well, A/B fully replaced any other scheme when computers where much slower than 80Knps, and MCTS never won above A/B in any period of chess engine history...so maybe not so simple :-)

Regrading:

"it was already answered when AZ paper wrote of MCTS averaging out evaluation errors. Whereas AB will “find” evaluation errors, hence the importance of an “accurate” evaluation function. The NN delivers highly inaccurate evaluations."

I don't understand that line, and your further explanation unfortunately does not clarify it to me. For instance saying "NN evaluations are generally kind of accurate ... But the relative precision between positions is hopelessly bad" it sounds impossible to me: If position A = 1,6 and position B = 1,2 and evaluation is enough accurate, then also A > B is true....and the validity of that comparison is more than enough for A/B to work great. Indeed it is the only thing that matters.

If in the AZ paper only that ambiguous line is dedicated to explain this basic design decision....maybe it 's not clear even to them...they just chose it out of implementation effectiveness.
okay, rather than stumble around for words, let’s try another view.

Zero NN evaluations make a reasonable effort to evaluate a chess position on the scale 0 to 1. Because holistic. The evaluation is noisy and inaccurate. Because early stage of technology. Averaged out over a tree region, accuracy emerges. Because maths/statistics.
Try using a noisy, inaccurate evaluation in AB and given the slighest opportunity, AB will pick out the inaccuracy and probably lose because of it. Therefore averaging search algorithm required. Hence answer to why AZ used MCTS, although you pointed out that is quite probably a retro-rationalisation.

AB evaluations make a really excellent effort to evaluate a chess position, just as long as certain features are NOT present.
So _your theory_ seems to be that the distribution of a traditional evaluation is narrower than that of a NN evaluation but has heavier tails (=outliers) (I am just trying to understand in somewhat exact terms of what you are trying to say).

This would be trivially testable. But without such tests being available: since a NN can easily emulate a traditional evaluation your theory seems highly suspect to me. I don't see where the trade off of accuracy versus outliers would come from.

Also I don't see why MCTS would work better with one type of evaluation and A/B would work better with the other type of evaluation. It might be true but it would require proof (theoretical or experimental).

The evaluation is usually correctly placed on the scale 0-1 but sometimes, often enough, it gets placed in an entirely wrong point on the scale. Because not holistic enough. As we know, this didn’t used to matter. Now it does. Question is, can AB (or SF basically) wrap sufficient new concepts into its evaluation? Well, you have to define first what conceptually does Zero “know”, how it knows it and can it be duplicated. If sufficient duplication possible without too much time penalty, then game on.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

chrisw
Posts: 2013
Joined: Tue Apr 03, 2012 2:28 pm

Re: Training the trainer: how is it done for Stockfish?

Post by chrisw » Tue Mar 05, 2019 4:08 pm

Michel wrote:
Tue Mar 05, 2019 2:49 pm
chrisw wrote:
Mon Mar 04, 2019 9:53 pm
mcostalba wrote:
Mon Mar 04, 2019 6:35 pm
Regarding:

"Very simple. MCTS is the only viable approach given the 80000 nps they got using 4 TPUs."

Well, A/B fully replaced any other scheme when computers where much slower than 80Knps, and MCTS never won above A/B in any period of chess engine history...so maybe not so simple :-)

Regrading:

"it was already answered when AZ paper wrote of MCTS averaging out evaluation errors. Whereas AB will “find” evaluation errors, hence the importance of an “accurate” evaluation function. The NN delivers highly inaccurate evaluations."

I don't understand that line, and your further explanation unfortunately does not clarify it to me. For instance saying "NN evaluations are generally kind of accurate ... But the relative precision between positions is hopelessly bad" it sounds impossible to me: If position A = 1,6 and position B = 1,2 and evaluation is enough accurate, then also A > B is true....and the validity of that comparison is more than enough for A/B to work great. Indeed it is the only thing that matters.

If in the AZ paper only that ambiguous line is dedicated to explain this basic design decision....maybe it 's not clear even to them...they just chose it out of implementation effectiveness.
okay, rather than stumble around for words, let’s try another view.

Zero NN evaluations make a reasonable effort to evaluate a chess position on the scale 0 to 1. Because holistic. The evaluation is noisy and inaccurate. Because early stage of technology. Averaged out over a tree region, accuracy emerges. Because maths/statistics.
Try using a noisy, inaccurate evaluation in AB and given the slighest opportunity, AB will pick out the inaccuracy and probably lose because of it. Therefore averaging search algorithm required. Hence answer to why AZ used MCTS, although you pointed out that is quite probably a retro-rationalisation.

AB evaluations make a really excellent effort to evaluate a chess position, just as long as certain features are NOT present.
So _your theory_ seems to be that the distribution of a traditional evaluation is narrower than that of a NN evaluation but has heavier tails (=outliers)
Nope.

(I am just trying to understand in somewhat exact terms of what you are trying to say).
it's a truism on the internet, that when people/somebody doesn't get what you're saying, they're not going to it via a re-explanation.

This would be trivially testable. But without such tests being available: since a NN can easily emulate a traditional evaluation your theory seems highly suspect to me.
You mean my "re-phrased" theory?!

I don't see where the trade off of accuracy versus outliers would come from.

Also I don't see why MCTS would work better with one type of evaluation and A/B would work better with the other type of evaluation. It might be true but it would require proof (theoretical or experimental).
Your "not seeing" proves only that you "don't see"
It's not my job to make you see. Sorry. Ideas are for thinking about, not invalidating. Bye.

The evaluation is usually correctly placed on the scale 0-1 but sometimes, often enough, it gets placed in an entirely wrong point on the scale. Because not holistic enough. As we know, this didn’t used to matter. Now it does. Question is, can AB (or SF basically) wrap sufficient new concepts into its evaluation? Well, you have to define first what conceptually does Zero “know”, how it knows it and can it be duplicated. If sufficient duplication possible without too much time penalty, then game on.

chrisw
Posts: 2013
Joined: Tue Apr 03, 2012 2:28 pm

Re: Training the trainer: how is it done for Stockfish?

Post by chrisw » Tue Mar 05, 2019 6:52 pm

grahamj wrote:
Sun Mar 03, 2019 4:04 pm
On why AlphaZero (and hence LC0) opted for a Monte Carlo search...

There are no strong chess engines which use alpha-beta pruning alone. All the strong engines use other techniques to do a huge amount of pruning. These other techniques are mostly chess-specific, so no good for DeepMind's purposes. Also, I don't I think that you could simply take the pruning techniques that work well for SF and expect them them to work well for LC0 (or LC1).

"Monte Carlo search" or "MCTS" is a misnomer for the deterministic tree search algorithm used by A0 and LC0. PUCT, ie predictor + upper confidence tree, is a better name. The predictor is LC0's policy vector, which as Matthew Lai said, is hugely important to performance. You can see some of LC0's search trees (with explanations) via this link to the Leela forum.
https://groups.google.com/forum/#!searc ... Kaf1wWEwAJ
Example attached. Note the branching factor of only 1.28. All those branchless lookaheads near the tips are being chosen entirely by the policy vector - and of course, when you only consider one move at each depth, the distinction between average and minimax disappears.
Sure, but when the leaf eval gets backed up back into the tree again, and the node is no longer an orphan, it's averaged with its siblings. Whereas, at this equivalent point, minimax takes the max, not the average.
One high scoring error, and minimax takes the error. One high scoring error, and pseudo-MCTS averages it with the other siblings. One search is error tolerant, the other isn't, hence, if the evaluation is error prone, which is the case with a NN eval at present, MCTS is way to go. Stockfish would "error" in its search on buggy eval, but this would get spotted by developers and fixed. NN developers can't fix errors, because they don't know why they happen, or what they are, or have a fixing tool.

puct.png

jp
Posts: 755
Joined: Mon Apr 23, 2018 5:54 am

Re: Training the trainer: how is it done for Stockfish?

Post by jp » Wed Mar 06, 2019 10:12 am

chrisw wrote:
Sun Mar 03, 2019 2:00 pm
I would guess that it was already answered when AZ paper wrote of MCTS averaging out evaluation errors. Whereas AB will “find” evaluation errors, hence the importance of an “accurate” evaluation function. The NN delivers highly inaccurate evaluations.

NN evaluations are generally kind of accurate, in that almost by definition, they have a balanced and holistic view of the position. But the relative precision between positions is hopelessly bad. NNs need the averaging effect of the pseudo-MCTS search.
Do you mean hopelessly bad relative precision in both magnitude and sign (i.e. ordering)?

[I wrote this while reading p.1, so it's possible this is made clear in posts on p.2.]

JollyJoker
Posts: 3
Joined: Sat Feb 16, 2019 7:43 pm
Full name: Lennart Qvarnström

Re: Training the trainer: how is it done for Stockfish?

Post by JollyJoker » Fri Mar 15, 2019 2:29 pm

syzygy wrote:
Sat Mar 02, 2019 4:07 pm
A simple approach would be to use LC0 for the first part of the game until sufficient material has gone from the board that SF is able to basically see to the end and then switch to SF. Certainly in the endgame it seems to make no sense to rely on LC0 if optimal match play is the goal. (But in the opening and early middle game SF apparently has no clue ;-))

Are there many examples of LC0 making serious mistakes early in the game?
This last question is interesting but I assume no one knows enough to give a good answer.

Is there any reason whatsoever to believe Leela would be any more accurate early on than it is in the end game? Is the talk about "endgame blunders" just an artifact of classical engines being so good in the end game we can actually see those?

Also, there seems to be various hybrid engines around. Is/are any of those using the approach Sysygy mentioned; switching to a classical engine when that one should be able to handle the game better? Intuitively, that would make training the net much less resource intensive; you have a far smaller game tree and evals of another engine all the way. Maybe you could even limit it to just finding booms for the other engine, moves where the payoff was beyond the other engine's horizon?

Maybe I should start a thread for "Stupid NN/hybrid questions that may still have interesting answers" :D

Robert Pope
Posts: 510
Joined: Sat Mar 25, 2006 7:27 pm

Re: Training the trainer: how is it done for Stockfish?

Post by Robert Pope » Fri Mar 15, 2019 6:11 pm

JollyJoker wrote:
Fri Mar 15, 2019 2:29 pm
This last question is interesting but I assume no one knows enough to give a good answer.

Is there any reason whatsoever to believe Leela would be any more accurate early on than it is in the end game? Is the talk about "endgame blunders" just an artifact of classical engines being so good in the end game we can actually see those?
Yes, it is likely that Leela is much more accurate in the early/mid-game than it is in the endgame.

Leela uses a sort of score averaging in its search. Because of that, it can make errors when there is only one correct continuation that has to be followed exactly. Classical engines use exhaustive search to come up with the provably "correct" result of a search, subject to the limitations of its evaluation and pruning rules. So if there is one correct continuation, and it can be found within the search depth, a traditional engine is much less likely to overlook it and make an error than Leela.

When these positions arise in the middle game, Leela will often miss the correct continuation, just like it does in the endgame. The question then becomes "are positions that require extremely precise play equally likely to arise in the middle game as they are in the endgame?" I believe the answer to that is "no". I think that early in a chess game, there tend to be many workable strategies that can be followed, so positions requiring a single continuation of perfect play are not common. As you get closer to the endgame, more situations arise where the wrong move can make the difference between a draw and a loss. e.g. if you are one move too late moving your king toward an enemy pawn, you will lose a pawn race, and be down a queen.

Post Reply