Staged move generation and killers

xr_a_y · Post by **xr_a_y** » Mon Dec 31, 2018 10:10 am

Just a little feedback about this experiment I did in Minic. Short answer : I didn't succeed.

Long answer, I try a staged move generator working like this :

* return TT move (without validating it, in Minic it cannot be wrong expect in case of a real hash collision)
* Gen Cap and sort them (mvv-lva + see)
* return Good Cap
* return killers (validating them as suggested here)
* Gen quiet and sort them (history + pst)
* return quiet

In fact it cost me too much lines and involve (in my opinion) to much management. Indeed, using the HQ bitboard generator, generating all moves or only captures does not impact much cpu consomption, I kept this possibility as a tool for qsearch anyway (to avoid an expensive if isCapture inside the moves loop in qsearch). The only things that can be gain are :
- the TT move leads to a cutoff before generating moves list : I kept that without using a staged move generator
- a good capture leads to a cutoff : but the time to sort them (including SEE) is by far the most expensive task in sorting move ...
- a killer leads to a cutoff : so that quiet moves are not generated and sorted, but killers has to be validated.

So at the end, I only kept :
- the possibility to generate only cap (used in qsearch)
- try the TT move first without generating anything

What I probably still miss is :
- generating checks and checks evasions only to be used in qsearch (does someone knows how it impact strength ?, and an efficient way of doing this without incremental attack update?)
- sort capture really using the SEE value, so that bad capture are well sorted, for now my SEE only return true/false meaning if it is a good or bad cap

hgm · Post by **hgm** » Mon Dec 31, 2018 11:02 am

When I use staged move generation the main advantage comes from also staging the generation of captures themselves: I generate them by value group of the victim (i.e. Queens, Rooks, minors, Pawns). So if the capture of a Queen provides a cutoff, I never get to generating and sorting captures on Rooks, minors and Pawns.

This also strongly reduces the effort taken by sorting: often value groups consist of just a single victim, with just one or perhaps two attackers, and the moves then only have to be sorted by attacker value. (And only if there is more than one.)

Validation of killers should not be that costly, in terms of CPU time. But it will require some dedicated code that serves no other purpose.

I very much doubt if sorting bad captures by SEE will buy you anything. For one, bad captures will almost always fail low, so you would have to search them all anyway. They only stand a chance of being cut moves when there are side effects that SEE cannot recognize, and this doesn't correlate that well with the actual SEE value. So in the rare case that any of the bad captures was good, searching them by SEE value will probably not help you getting the good one any quicker, on average.

xr_a_y · Post by **xr_a_y** » Mon Dec 31, 2018 12:00 pm

hgm wrote: ↑Mon Dec 31, 2018 11:02 am When I use staged move generation the main advantage comes from also staging the generation of captures themselves: I generate them by value group of the victim (i.e. Queens, Rooks, minors, Pawns). So if the capture of a Queen provides a cutoff, I never get to generating and sorting captures on Rooks, minors and Pawns.

I like that ! In Minic I use a pseudo "GeneratorPhase" enum this way

Code: Select all

        BitBoard bb = pf[ptype-1](from, p.occupancy, p.c) & ~myPieceBB;
        if      (phase == GP_cap)   bb &= oppPieceBB;  // only target opponent piece
        else if (phase == GP_quiet) bb &= ~oppPieceBB; // do not target opponent piece

I think I can introduce GP_capQ, GP_capR, GP_capMinor, GP_capP and generate only capture of a specific target type. But then I'll have to manage a multi stage generate. I'll give it a shot somedays. Thanks

Ras · Post by **Ras** » Mon Dec 31, 2018 1:10 pm

I don't think there's much to win anyway. Just run a perft 6 on the initial position with make/unmake in the leaves. And that is already a cautious estimation because when you get cutoffs in real search, make/unmake will not be done for the remaining nodes so that the actual possible gain is even smaller. Then run a full search analysis.

You'll probably find that the full search node rate is around 15% of the perft rate. Even if you ignore the effort in make/unmake and assume that the move generator is the bottleneck in perft, making the move generator twice as efficient by not generating moves that won't be executed, you'd gain half of that 15% in speed, i.e. 7.5%. Assuming 60 Elo per doubling, we're talking about 6 Elo tops. When also considering make/unmake, maybe 3 Elo.

hgm · Post by **hgm** » Tue Jan 01, 2019 9:46 am

Ras wrote: ↑Mon Dec 31, 2018 1:10 pm I don't think there's much to win anyway. Just run a perft 6 on the initial position with make/unmake in the leaves. And that is already a cautious estimation because when you get cutoffs in real search, make/unmake will not be done for the remaining nodes so that the actual possible gain is even smaller.

Make/unmake will not be done for any leaf node in search, right? Otherwise they would not be leaf nodes, by definition. In most nodes you just generate moves (captures), to conclude that all are futile. So it seems wrong to compare with a perft that does make/unmake in the leaves; this is nothing like search.

Sven · Post by **Sven** » Tue Jan 01, 2019 1:54 pm

hgm wrote: ↑Tue Jan 01, 2019 9:46 am
Ras wrote: ↑Mon Dec 31, 2018 1:10 pm I don't think there's much to win anyway. Just run a perft 6 on the initial position with make/unmake in the leaves. And that is already a cautious estimation because when you get cutoffs in real search, make/unmake will not be done for the remaining nodes so that the actual possible gain is even smaller.
Make/unmake will not be done for any leaf node in search, right? Otherwise they would not be leaf nodes, by definition. In most nodes you just generate moves (captures), to conclude that all are futile. So it seems wrong to compare with a perft that does make/unmake in the leaves; this is nothing like search.

Perft and regular search can't be compared for other reasons as well. So I would not try to analyze possible savings of staged move generation by comparing search to Perft. But the point of Rasmus is still valid: from a theoretic viewpoint there is not much room for a speedup by using staged move generation because move generation takes only a small portion of overall engine runtime. And yet there are engines using it successfully, so there must be more to it ...

In my engine Jumbo I have almost completed the implementation of staged move generation recently. "Almost" because I am still struggling to manage the delayed processing of losing captures correctly without losing strength. The initial step of trying the TT move first without generating any moves was the most successful one, it gave quite some strength improvement, but all later additions like separating the generation of captures and quiet moves did not improve strength significantly (i.e., nothing outside the error bars of roughly +/- 7 Elo points in my setup).

Joost Buijs · Post by **Joost Buijs** » Tue Jan 01, 2019 2:17 pm

In my engine I can easily switch between staged and non staged move generation, the difference in speed or strength is negligible and disappears completely in the noise. So you can argue if it is really worth the added complexity, it is usually better to keep things as simple and stupid as possible.

Of course this has nothing to do with first trying the TT-move, which I do in both cases.

Ras · Post by **Ras** » Tue Jan 01, 2019 3:22 pm

hgm wrote: ↑Tue Jan 01, 2019 9:46 amMake/unmake will not be done for any leaf node in search, right?

Right before that, it will. Then eval would follow. My point is that search is not like the bulk counting perft version because in search, you don't generate a move list just for counting the moves and then discard them, but you will make/unmake at least some of these moves. That's why the perft version with make/unmake is more representative. Also, I chose this optimistic assumption to figure out an upper bound of the gain.

However, if you instead argue that the optimised perft version with bulk counting is more like search, then my original argument just becomes stronger because then the amount of possible time reduction is even less. Then we aren't talking about some 10M per second perft, but rather a 100M per second, like in your optimised perft code. Then the numbers instead go like this:

Search node rate 2 Mnps, which is not 15% of the perft rate anymore, but only 2%. Now we make the move generation twice as efficient and save half of that 2%, so 1% speedup. That's even less than 1 Elo.

hgm · Post by **hgm** » Tue Jan 01, 2019 3:33 pm

Actually that is exactly what a search does for most of the time: generate moves just to conclude that none of those is worth searching. Leaves where no move generation is needed because stand-pat produced a cutoff are rare, because futility pruning eliminates most of those.

Your math doesn't add up. A bulk-counting perft only does ~1Mnps, not 100Mnps. When you don't make the moves you generate, move generation of course will take a larger fraction of the time, not a smaller one.

Ras · Post by **Ras** » Tue Jan 01, 2019 3:48 pm

hgm wrote: ↑Tue Jan 01, 2019 3:33 pmActually that is exactly what a search does for most of the time: generate moves just to conclude that none of those is worth searching.

Most of the move generation is in quiescence, and making none of the moves there sounds strange. Or do you exclude high takes low captures e.g. via SEE? Of course, in QS, you only generate captures and not quiet moves (except maybe for check evasions).

A bulk-counting perft only does ~1Mnps, not 100Mnps.

My full search already does around 1.5 - 2 Mnps, and perft does around 11 already without bulk counting. But none of this changes the fact that move generation doesn't take considerable amount of time so that the potential for optimisation is not that big - with the obvious exception of only generating captures in QS. Besides, the number of nodes in QS even dominates regular search, which is another reason why staged move generation in the main search won't yield gains.

Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers

Re: Staged move generation and killers