zd3nik wrote:I must have misunderstood your original description. I thought you implied the fixed offset ranges where each piece type is stored was used to determine piece type (in some cases). But I was indeed a bit skeptical about using that method. As you point out it would require two tests instead of one.
What I meant is that when you want to loop over all pieces of a specific type, you can just loop over that section. In loops you test for the end of the range only, and using the fixed end of the range for that is most efficient. If you want to do incompatibly different things with various piece types, you can then split it in loops over one type that only do the thing that applies to those, relieving you of the need to test inside the loop what piece type you have.
As a side note: by clever assignment of the numbers it can often be arranged that being inside a certain range coincides with certain bits being set, which can be eaisly tested for with just a single test. E.g. Bishops = 4-7, Rooks = 8-11, Queens = 12-15 can test for diagonal sliders as (pieceNr & 8) and for orthogonal sliders as (pieceNr & 4). If it would beimportant to also specifically test for Queens, you could move their codes to 28-31, so that (pieceNr & 16) will tell you if you have a Queen. This of course leaves many holes in the list
I was under the impression that any modern compiler worth spit will turn switch statements on small sets of integer values into an indexed jump rather than a bunch of branches. Something like this:
Code: Select all
goto label[value_as_index]
label_for_value_1:
...
label_for_value_2:
...
etc
Isn't this essentially how switch statements on integer values work? And if so does this still cause occasional pipeline stalls? And if your using compile time constants in the switch statement (which isn't the case in the code in question, but something I often do) the compiler can do it's magic to eliminate all but the matched case.
In any case it almost always causes pipeline stalls. What code it compiles to (branch-tree or indirect jump) alsodepends on whether the range of cases is reasonably contiguous and small, and whether the compiler can know that. It cannot reserve a table of 4G elements (all but 6 pointing to the defined cases, and all others to the 'default' case) when the value to switch on can be an arbitrary integer without any clue to its value. So it would have a limited table, and would have to ensure that the index is not out of bounds (which might require another test and branch).
But the indirect jump almost always will be a mispredict. Two-way branches can be reasonably well predicted as being taken or non-taken, both for branches that have a large preference for one of the two (the CPU keeps a 2-bit counter for each cached branch for that), as well as for branches that are unpredictable in themselves, but correlate very well with branches taken shortly before it. And when the prediction is 'taken', the CPU remembers where it jumped. Indirect branches, however, are always taken, but to different locations all the time. There is no way a CPU is going to remember all the possible destinations, and have some clever algorithm to decide which one is the most likely to occur next, based on the history for this branch or others. The best it can do is predict it will be the same as last time; then it has to remember only one address, in line with the prediction for two-way branches. This catches the sitution where one of the cases dominates very much over the other, but almost certainly is a mispredict when the cases vary. For a tree of two-way branches the CPU would be able to remember the destination of each of them,and try to predict whether it will be used based on the outcome of previous branches, which can cause much lower misprediction rate.
All that being said, I like your suggested alternative to using switch statements. But it would require an almost complete rewrite of my project.
Well, I guess that cannot be helped. You should decide for yourself if that is worth it. Note that you can still speed up the code I gave by making use that the number of directions in which chess pieces move tends to be a multiple of 4, so that you could 'unroll' the loop over directions so that you only test for the end of the list every 4 directions. That would not work for Pawns, but generating Pawn moves needs so many things not needed for pieces (diverging capture/non-capture, conditional double-push, promotion...) that you'd better do it in a separate loop over Pawns. So it definitely pays to have a separate Pawn section in your piece list.
Other tricks that help to speed up code is to combine branches as much as possible. E.g. if the same loop over directions is going to handle both sliders and leapers, you would want to avoid to have an if(slider) part somewhere in the loop. So you could write the loop for sliders, with an inner do-while loop inside it that iterates over the distance travelled. There would be a necessary branch at the while that decides if more steps should be added, based on if the to-square just treated was occupied (or off-board). So it would read something like while(victim == 0) if 0 encodes an empty square. You can piggy-back the test for leaper on that by writing while((victim | isLeaper) == 0), so that it will also not any more steps when the piece is a leaper, even when there is no victim. This just requires an extra OR instruction, and uses the same branch. When pieces can be dichotomized into sliders and leapers, as in orthodox Chess, the isLeaper variable can be set outside the loop over directions.
For mixed leaper-slider pieces, as you have in Capablanca Chess and Shogi, it would have to depend on the move direction you are currently treating. In Fairy-Max this is done by having a second table (next to the board step) that for each direction specifies if it is a leaper or slider (or hopper...) move. In Shokidoki I just do that based on the direction index itself, e.g. isLeaper = (direction & 16) would alternate 16 slider directions with 16 leaper directions in the steps table, and I would align the allowed-move lists of the mixed pieces such that they straddle the boundaries in the right place to declare some of their moves as leaping, and others as sliding.