Incremental or non-incremental PST evaluation calcs

RoadWarrior · Post by **RoadWarrior** » Thu Jan 26, 2012 1:16 am

I've been drawn back into the computer chess community by the Rybka controversy, and decided to resurrect my old Fortran/assembly chess engine and update it with modern techniques. My weapon of choice is now C#

I put together a magic bitboard board representation, move generator, and move make/unmake. To my pleasant surprise, perft from the initial position not only produces the correct moves, but also runs at around 16M nodes per second on a single physical core. Modern hardware is just so impressive.

By "node", I mean generating the pseudo-legal moves, checking for move validity, then doing the move make/unmake. There is no bulk-counting or transposition table.

Flushed with success, I started to implement the evaluation function. Currently this is just the material score and interpolated PSTs using the opening/endgame trick popularised by Fruit and Toga.

Unfortunately, doing this evaluation on every leaf node in perft 6 reduced the nodes-per-second from 16M to 4M - ouch! Tonight I implemented an incremental PST evaluation, where the PST score is adjusted during every move make/unmake. This helped dramatically, bringing the node count back up to 12M.

So I have 2 questions.

Do the majority of people do incremental (PST or other) evaluation, given this magnitude of performance difference, or am I missing something?

My PST calcs use opening/endgame interpolation. Is this rather expensive (managed arrays?) trick really worthwhile from the ELO perspective, or is it better to stick with static PST values?

bob · Post by **bob** » Thu Jan 26, 2012 1:51 am

RoadWarrior wrote:I've been drawn back into the computer chess community by the Rybka controversy, and decided to resurrect my old Fortran/assembly chess engine and update it with modern techniques. My weapon of choice is now C#

I put together a magic bitboard board representation, move generator, and move make/unmake. To my pleasant surprise, perft from the initial position not only produces the correct moves, but also runs at around 16M nodes per second on a single physical core. Modern hardware is just so impressive.

By "node", I mean generating the pseudo-legal moves, checking for move validity, then doing the move make/unmake. There is no bulk-counting or transposition table.

Flushed with success, I started to implement the evaluation function. Currently this is just the material score and interpolated PSTs using the opening/endgame trick popularised by Fruit and Toga.

Unfortunately, doing this evaluation on every leaf node in perft 6 reduced the nodes-per-second from 16M to 4M - ouch! Tonight I implemented an incremental PST evaluation, where the PST score is adjusted during every move make/unmake. This helped dramatically, bringing the node count back up to 12M.

So I have 2 questions.

Do the majority of people do incremental (PST or other) evaluation, given this magnitude of performance difference, or am I missing something?

My PST calcs use opening/endgame interpolation. Is this rather expensive (managed arrays?) trick really worthwhile from the ELO perspective, or is it better to stick with static PST values?

PSTs are not very important. They offer some guidance, but they have to work in all positions. IE Crafty uses 3 for kings, since a king might not want to centralize of there are just pawns on one wing. Most knowledge is computed at the tips. Incremental can begin to not pay off if you go deep. If the tips are too far from the root, then the incremental cost can exceed the total eval cost if computed at the endpoints.

I think for simplicity, eval at the tips is certainly the place to start, and you might well end up there forever as I did...

hgm · Post by **hgm** » Thu Jan 26, 2012 2:36 pm

RoadWarrior wrote:By "node", I mean generating the pseudo-legal moves, checking for move validity, then doing the move make/unmake. There is no bulk-counting or transposition table.

So this is not really nodes, but moves. An engine normally counts something only as a node when you do something there (evaluation, move generation). Moves where you don't want to do anything in the position they lead to are not made at all, taken care of by futility pruning. So your pure perft speed is really something like 500knps (assuming 32 moves per position in horizon nodes). That is a bit slow, compared to an engine, because you make all the moves (which an engine usually would not do).

This explains your huge drop in performance; you are just counting yourself rich, by counting empty nodes. In a sense this is equivalent to division by zero.

I always thought everyone did PST incrementally. I cannot see any arguments to ever do it differently. But it is by far the cheapest eval term, so as soon as you start to do some serious evaluation, like passers, king safety, mobility, even a very inefficient PST implementation would not slow you down that much. (But every little bit of speed helps...)

I don't see why game-phase interpolation should make it slow. You can use the lower and upper half of an int for storing the opening and end-game values, respectively, and do the incremental update with that. You can update the game-phase counter similarly (perhaps together with a material index). Then the only extra thing you have to do in each eval is take the weighted average between upper and lower half.

In fact you could even speed this up by storing endgameValue and openingValue minus endgameValue in both halfs; then you just have to multiply the latter with the game-phase counter (assumed to approach 0 in the end-game) and add it to the upper half. And even that you could do in the same multiply, if you are clever, by storing a 1 in the upper half of the game-phase counter.

RoadWarrior · Post by **RoadWarrior** » Fri Jan 27, 2012 12:53 am

bob wrote:Incremental can begin to not pay off if you go deep. If the tips are too far from the root, then the incremental cost can exceed the total eval cost if computed at the endpoints.

Is this correct? Given, say, a minimal branching factor of 3, the ratio between the total number of nodes and the number of leaf nodes barely alters at all between ply 4 and ply 15.

Once I've expanded my evaluation function to something decent (and added the search), I think it would be worth running a self-play tournament between the incremental and non-incremental versions to see if there's any significant difference. Given your experience, I suspect you're right. But I'd like to prove it for myself.

RoadWarrior · Post by **RoadWarrior** » Fri Jan 27, 2012 12:57 am

hgm wrote:An engine normally counts something only as a node when you do something there (evaluation, move generation). Moves where you don't want to do anything in the position they lead to are not made at all, taken care of by futility pruning.

I take your point about the difference between nodes and moves.

hgm wrote:You can use the lower and upper half of an int for storing the opening and end-game values, respectively, and do the incremental update with that. You can update the game-phase counter similarly (perhaps together with a material index). Then the only extra thing you have to do in each eval is take the weighted average between upper and lower half.

This is a cool trick, and one that I will try to implement tonight. Many thanks!

bob · Post by **bob** » Fri Jan 27, 2012 6:37 am

RoadWarrior wrote:
bob wrote:Incremental can begin to not pay off if you go deep. If the tips are too far from the root, then the incremental cost can exceed the total eval cost if computed at the endpoints.
Is this correct? Given, say, a minimal branching factor of 3, the ratio between the total number of nodes and the number of leaf nodes barely alters at all between ply 4 and ply 15.

Once I've expanded my evaluation function to something decent (and added the search), I think it would be worth running a self-play tournament between the incremental and non-incremental versions to see if there's any significant difference. Given your experience, I suspect you're right. But I'd like to prove it for myself.

With todays programs, the trees are deep and narrow. Which is exactly the circumstance that doesn't favor incremental calculations. The old-style wide/bushy trees were quite good for incremental update, but I don't think it is nearly as clear an advantage today...

Worth actually quantifying at some point...

pocopito · Post by **pocopito** » Fri Jan 27, 2012 8:48 am

RoadWarrior wrote:This is a cool trick, and one that I will try to implement tonight. Many thanks!

Maybe I'm wrong, but I guess that's the way it's implemented in TSCP (just in case you want to check some code aplying this idea).

hgm · Post by **hgm** » Fri Jan 27, 2012 9:05 am

bob wrote:With todays programs, the trees are deep and narrow. Which is exactly the circumstance that doesn't favor incremental calculations. The old-style wide/bushy trees were quite good for incremental update, but I don't think it is nearly as clear an advantage today...

Worth actually quantifying at some point...

They are not that narrow: a branching ratio of 2 only doubles the incremental work per end leaf from 1 to 2 updates, as all earlier updates are shared between 2, 4, 8... nodes. And that is only infinitesimally smaller than double for a tree that is not infinitely deep, so depth has certainly nothing to do with it.

Only when the EBF smaller than 1.125 would you get substantial difference in the workload of incremental updating in an 8 ply tree,

Sven · Post by **Sven** » Fri Jan 27, 2012 12:27 pm

Hi Harm-Geert,

basically I agree, depth seems not to be the main point here. But there are a lot of other things influencing that trade-off consideration. Incremental updating can be done either by "make/unmake" or by "copy & make", where in the latter case you would store both the game phase and the MG/EG PST scores per ply, saving the effort of unmaking but adding to the overall memory footprint. Furthermore, most moves are actually captures where the PST updating effort is even higher (subtract PST score of captured piece). Incrementally updating also means subtracting old + adding new scores, while "from scratch" only involves additions.

Example pseudo code (only PST score update, ignoring special cases, could be optimized):

Code: Select all

makeMove&#40;) &#123;
    if &#40;isCapture&#41; &#123;
        pstScore&#91;MG&#93; -= pst&#91;capturedPiece&#93;&#91;toSqr&#93;&#91;MG&#93;;
        pstScore&#91;EG&#93; -= pst&#91;capturedPiece&#93;&#91;toSqr&#93;&#91;EG&#93;;
    &#125;
    pstScore&#91;MG&#93; += pst&#91;movingPiece&#93;&#91;toSqr&#93;&#91;MG&#93; - pst&#91;movingPiece&#93;&#91;fromSqr&#93;&#91;MG&#93;;
    pstScore&#91;EG&#93; += pst&#91;movingPiece&#93;&#91;toSqr&#93;&#91;EG&#93; - pst&#91;movingPiece&#93;&#91;fromSqr&#93;&#91;EG&#93;;
&#125;

And you always need to update the game phase, too, for the MG/EG interpolation approach to be applicable at each node.

Finally, I agree in general on the statements about the influence of the EBF, and I understand and accept your point regarding "sharing" of update operations done at higher tree levels. But since most evaluation calls are done during QS and not during full-width search (and lazy eval does not save you since PST scoring is usually not skipped in lazy eval due to its relative cheapness), the relevant EBF to use for this trade-off consideration is actually the one within QS (since most moves are captures during QS) which is a lot smaller than the overall EBF of the whole search.

Actually we might want to figure out the ratio of occurrences of "making a capture move" to "calling the evaluation function" to get a better idea of the trade-off being discussed, assuming that captures have the biggest impact.

Caching effects could favor the "from scratch" implementation even further, depending on actual data structures.

My conclusion of the above is:

1) I expect the "from scratch" method of PST score calculation to be slightly more efficient than the "incremental" method in the average case. Of course, that should actually be tested, a theoretical comparison is difficult. (Here I agree with Bob.)

2) That statement also seems to be almost independent from the overall EBF and the search depth, since I believe that everything that happens during QS has the biggest impact on the effort of the "incremental" method. So I am not sure whether any change caused by today's deeper searches has happened regarding this trade-off consideration. (Here I agree with you.)

Any other opinions are welcome, of course.

Sven

RoadWarrior · Post by **RoadWarrior** » Fri Jan 27, 2012 1:26 pm

hgm wrote:You can use the lower and upper half of an int for storing the opening and end-game values, respectively, and do the incremental update with that. You can update the game-phase counter similarly (perhaps together with a material index). Then the only extra thing you have to do in each eval is take the weighted average between upper and lower half.

I've implemented this as a struct, as that's easier to maintain (for me) and more idiomatic in C#. This has improved the perft performance significantly, but I need to create my search function (and improve the evaluation) before I can run some self-play experiments between "scratch" and "incremental". When I do that, I will post the result of each experiment.

Incremental or non-incremental PST evaluation calcs

Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs

Re: Incremental or non-incremental PST evaluation calcs