What is the best known speed of move generation?

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
smrf
Posts: 484
Joined: Mon Mar 13, 2006 11:08 am
Location: Klein-Gerau, Germany

Re: What is the best known speed of move generation?

Post by smrf »

Why we do this by discussion? Simply generate a comparable statistic as I have made with SMIRF, and check, which approach is generating that useful information more quickly.

Reinhard.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: What is the best known speed of move generation?

Post by bob »

hgm wrote:
bob wrote:Err, I do it somewhere _else_???
Ach so, somewhere else... Is that suddenly "allowed"? So why are you trying to build a court case out of it when I happen to do the pin test "somewhere else"???
1. Please show me where I have made a "court case" out of this.

2. You made a statement that eliminating moves for pinned pieces was "free". I took issue with that as it is not "free" by any measure of the word free, as in "absolutely no cost".

Otherwise, I really don't care how you do it. The original focus of this thread was mangled perft comparisons...


Determining "in check" is not very complicated in bitboards...
Neither is testing for pins or potential pins.
I do, and always have, generates pseudo-legal moves. For many reasons:

(1) at every other ply, only one move needs to be searched. Why screen all moves for legality as they are generated if I am only going to search one or two?
Because it is faster (even if only a little)?
Except that it isn't. at least 1/2 the moves don't need to be screened as they will never be searched. I've explained this several times already. It is the _exact_ same reason why we don't do a full sort of the moves after we screen them, instead we do a less efficient "selection sort" (for history counters as but one example). Why? The answer is obvious.
In fact, I avoid generating moves a significant part of the time because I can try the hash move prior to generating _anything_.
Of course. So that has no bearing on the legal vs pseudo-legal issue at all.
(2) most moves are legal. Testing for legality when generating them is a waste. And when you consider (1) above culls about 1/2 of them on average anyway, this is further justification for putting off those tests until I am certain they are needed, so that I avoid doing it most of the time.
That argument would only have any validity if you would first generate the moves, and then test them one by one. If, otoh, one suppresses generating the moves with pinned pieces, like I do, the time saved on not generating those moves already is more than the time taken by testing for pinned _pieces_ (rather than testing for illegal moves). And the factor 1/2 is no issue, as even in cut nodes you generate _all_ the moves. So you save this time in every node.
(3) I certainly don't care about "in check" in the q-search as that is a simple part of the search only following captures. Most captures are not illegal, so again, testing them for legality wastes time over discovering this later, at least in my code. This gives me the _same_ move generator for normal search and quiescence search, which is cache-friendlier.
That is a design choice, and probably a good one. But note that for me, using the same generator in QS and main Search would actually mean that _both_ of them abstain from captures with pinned pieces.
Here's a hint. Not one single design feature in Crafty was done "just for the hell of it"... _EVERYTHING_ I (and now we) do in Crafty is thoroughly tested, both for performance in terms of speed, and then for performance in terms of playing against other opponents. I used to not do in_check at all, letting capturing the king at the next ply indicate that the previous move was illegal. But when I added null-move, making such when you are in check is bad and has to be avoided. But I at least follow the old rule "never compute today what you can put off until tomorrow, because tomorrow might never come." That is a golden rule when dealing with alpha/beta and the cutoffs it produces.
So that means that the cost per illegal pseudo-legal move is indeed very much higher than the cost of generating and making that single move: it in fact requires the generation of all replies. Which is, btw, also the way uMax does it, for reasons of code compactness. (And it is yet another case where moves are generated that are not made. :idea: )
How do you reach that conclusion? But even if it were true (the cost per illegal move was high) it doesn't matter since there are so few of them. Someone posted some perft data that suggested it was a fraction of 1% of the moves. Which I had already estimated as being very low based on much older testing about pseudo-legal moves.

But this raises a quite different question: if you don't explicitly test for legality after making the move of the last ply, but wait for the move generation in the reply node to find the King capture, how can you ever get the correct perft value?
Back to planet earth. Perft is completely different from my normal search. Nothing in common except for movgen/make/unmake. In perft, _every_ move is made and verified to be legal before it is counted, at the last ply or not is irrelevant. In perft there is no quiescence search of any kind. So I am not sure where this line of reasoning originated...

It would mean you have to _generate_ a (pseudo-legal) perft(7) (but not making the final ply, just screening the moves for King capture... :lol: ), to calculate a perft(6).
Or perhaps I don't do it that way and never said I did.


You wouldn't have something in there that tests the moves for legality just after they are made, something that wouldn't be called in your normal search, but would only be there for the purpose of speeding up the perft, would you? No, that can't be... :?
of course not. Here is the _entire_ perft code from crafty. See if you see anything that is optimized for speed:

Code: Select all

void OptionPerft(TREE * RESTRICT tree, int ply, int depth, int wtm)
{
  int *mv;
  static char line[256], *p[64];

#if defined(TRACE)
  static char move[16];
#endif

  tree->last[ply] = GenerateCaptures(tree, ply, wtm, tree->last[ply - 1]);
  for (mv = tree->last[ply - 1]; mv < tree->last[ply]; mv++)
    if (Captured(*mv) == king)
      return;
  tree->last[ply] = GenerateNonCaptures(tree, ply, wtm, tree->last[ply]);
  p[1] = line;
  for (mv = tree->last[ply - 1]; mv < tree->last[ply]; mv++) {

    MakeMove(tree, ply, *mv, wtm);

    if (depth - 1)
      OptionPerft(tree, ply + 1, depth - 1, Flip(wtm));
    else if (!Check(wtm))
      total_moves++;
    UnmakeMove(tree, ply, *mv, wtm);
  }
}


To explain: I generate captures. If I capture a king, I return as last move was illegal. Else I generate the rest of the moves (non-captures). Then for each move I make the move, and if there is depth left, recursively call perft(). Else if the current side is not in check (last move was legal) I increment the counter and unmake the move.

nothing done for speed. Just straight-forward code to walk the tree and count only the moves that are legal. So, as you chuckle to yourself now, perhaps you are chuckling _at_ yourself, as I have not tried to optimize the above whatsoever. It provides exactly what I wanted it to provide, the number of terminal positions for the perft test.

Then last year I got myself a free truck. Didn't have to pay a penny the day I drove it off the lot. Of course I had to start paying some later on. So the definition of "free" you are using is bad. Free means _zero cost ever_. Not "zero cost now but I either did some work earlier or will do some more later."
Well, to me "free" can also mean paying for itself. When I bought an apartment building with one empty appartment from entirely borrowed money, and the tenants of the other apartments were paying off the mortgage, plus interest, plus maintenance cost, I indeed had the impression that I was living there for free. 8-)

As to the pin test: the code that does it is almost never reached. In the rare case that it is reached, executing it almost always saves a multiple of the time executing it, in the move genrator alone (not even considering make/unmake cost, move generation of the replies, testing them for King capture...). That is free enough for me.
I do that all the time. But I do it with profile runs rather than looking at raw ASM code, but then that is just as accurate either way. My overhead for advancing to the next ply is _really_ minimal. I don't do copy/make for example.
Well, I do that too. But considering how much difficulty I have to get across how I actually do things, I would be surprised if you actually timed my approach. That all code you have in your engine is optimally fast doesn't mean a thing, if the task that the code is performing could have been avoided alltogether.
Unless you do like _some_ and change them for perft. That is the case I was arguing against. Including the idea of hashing perft results from one node and using that if you reach the same position again rather than walking that part of the tree. That can even hide bugs...
Yes, but it can also reveal them. In particular hash bugs. So perft is both very useful without hashing, to determine the correctness and speed of your tree-walk, as well with hashing, to determine the correctness of your hash probing and the quality of your replacement algorithm.
Nope. If you can "test for free" can you not formulate that into a methodology to generate moves for free as well? The question is similar (which squares are occupied vs empty, and what is on them). There is absolute no free "test" on a computer. Unless you can show me a few instructions that execute in zero cycles...
Actually there are many instructions that take zero cycles of execution time.
Time for accurate speaking now. And that is simply false. You can _hide_ the time by burying it in the latency of another instruction, but it is not free. And often when burying it it still has a cost due to cache issues or fetch conflicts or whatever...

When they are hidden in the latency of unavoidable other instructions (e.g. the hash probe), for instance, or when they use a CPU resource that is scarcely used while other instructions queue up for scarce resources. Rarely taken conditional branches are usually completely free in that sense, as the code is usually limited by register-file access. But let us not divert into that, as it will open up an entirely fresh can of worms.

In the pin case I consider the test "free" because it is done (in the rare cases that it is done) at the point where the in-check test encounters an obstacle on the ray connecting slider and King. At that point there is a conditional branch that is taken if the obstacle is the slider. When you fall trhough that branch, you have automatically arrived in a place in the code where there is a very high probability for a pin, without doing anything specifically aimed at testing for pins yet. The control flow already exists, one just has to exploit it.
Ah, but most of the time it is not done. So you spend time trying to eliminate something that is not there, when I don't. That's my point. With alpha/beta, it is _always_ better to defer when possible/practical, because often it is a permanent deferral because of a cutoff.
As you pointed out above, about half of the nodes are all nodes. So by deferring, you at most save half the original cost, on the average. (And not all cut nodes do cutoff on the first move, especially if they fail low on the hash move so that you might start to profit from deferring something...) So if deferring something makes it more than twice as expensive, because you are deferring stuff that could have been done easily to a place where it has to be done with difficulty, you will lose on deferring it.
But your math is simply poor. In 50% of the cases, only one move is searched, so doing work to generate the rest is wasted. But of the remaining 50%, there is a tiny fraction that have the pinned piece problem. So it is absolutely not a "twofer" trade. Deferring it saves me 50% constantly. On occasion I lose .1% due to an occasional illegal move. that is like a 500:1 savings, not 1 for 2.

The 50% of the moves we do need to search are not all pinned piece moves... which is where your argument falls apart. Hardly any of them are...

I tried three positions from a typical 1. e4 game. One 8 moves into the game, one 16 moves into the game and one 24 moves into the game. I added a bit of code so that for each "rejected as illegal" move, I tested the move to see if it was moving and exposed the king to check.

For these three tests, I got a pinned piece moving count of 128 (31.2 M nodes total), 177 (33.5M nodes total) and 331 (41.7M nodes total). Those turn out to be .00057%, up to .00079%.

Now that is hardly a conclusive set of positions, but the numbers speak for themselves. They suggest that I searched a _tiny_ number of illegal moves where a pinned piece was moved. In fact, I moved the king into check more times than I moved a piece pinned on the king.

I could run more, and I suppose I could find positions where there are one or two pinned pieces to start with, but I just picked a game Crafty had just played on ICC since the PGN was handy.


So your rule has certainly no absolute validity. I would even say that it has very limited validity. Almost always it is better to do things in places where they can be done most efficiently. Only for tasks that can be done equally efficiently everywhere, deferring might be a default tie breaker.
Perhaps after you work on your math a bit, you might reach a different conclusion about "limited validity". Do I really want to do extra work on 32 million moves, to avoid generating just over 100 total moves?
You are hung up on avoiding something I do that doesn't happen very often. I don't carry a meteor umbrella around since I don't expect to be hit by one very often. That's the idea here.
I don't have the feeling I am "hung up" on anything. In building a perft and an engine I had to make design choices, and this seemed the optimal way to handle it. I merely mentioned it, and it was you who got hung up about it. But I seem to detect a shift in your position now from "this only costs extra time" to "the time you gain by this is so small that I don't care". Well, the latter might very well be true. But when in a "don't care" situation, why would I intentionally choose for the solution that should be slightly worse?
I don't believe I have changed my stance at all, other than to say you are certainly free to do it however you want. I don't believe it to be faster in any measurement of time one could make, based on the small numbers above.
Still going in circles however. If we don't do that very often, but you do your test for _every_ move, then I wonder which takes the most time? That's my point.
It is more a "beside the point"... :lol:

as I do _not_ test for every move. I do not even test once for every move generation. I only test something when my check test (wherever I do it) has already filtered out a situaton where an enemy slider aimed at my King is blocked by something. This very rarely happens. And if it happens, it is very likely to be a pin. If it is a pin in 10% of the cases, I only have to earn back 10 times the cost of the test if it actually is a pin to break even. That goal is easily achieved in the move generator alone.
No, but that is worst-case for my approach as well. If you start off in check, almost all moves are illegal. Trying them only adds 1%. What about normal positions where most positions are not "in check" positions? The difference can't be measured, at least on my code. I only see advantage to the "legal generator only" when the position is highly tactical with lots of deep checking lines. Then I get 1%. I didn't know that until I wrote the code or I would not have even done that...
Well, in any case we seem to agree that an in-check test is worth doing, even if only a little. And that check test happens to also inform me of strong pin candidates. I might as well use that information.
I doubt it. but I suppose we will never know. I am certain enough that moving an absolute pinned piece is rare enough it is not worth worrying about. If you feel differently, go for it. I suppose I could add a counter and run some positions to see how often it happens... I am pretty sure I know the answer already however...
Well, like I said, why intentionally prefer the suboptimal solution just because the difference is only small, if you have a choice?
Fine by me, so long it is the originally defined perft algorithm. Start at an initial position, generate all moves and make each one.
Well, one of the results I quoted obviously was.
After each move is made, change sides, and generate all moves and make each one. Repeat until some fixed depth. Illegal moves must be discarded and not counted. Give me the position and I'll run it.
Why not try the opening?

But I still would like to have clarity on if you decide upon the legality by running a capture generator for the reply and extracting the most-valuable victim, like you seem to do in your engine, or if you have special code there only for the benifit of making the perft faster.
See the code posted above. Nothing for the benefit of perft, sorry to burst that bubble before it got started...

Can I say this any clearer than:

"NOT IN PERFT"

We are talking about perft. We are not talking about alpha/beta. Not about anything else. Just "perft" which has been around since 1981 or 1982 I don't remember which.
Clear enough, but that means that in your definition of perft, the perft time is a pretty meaningless indicator of the performance of the search part of an engine. So you should not be surprised that they are using small variations on it. As you did not trademark the name 'perft', it is well within their rights to also call these variations 'perfts'.
I have never said otherwise. Perft gives me performance data for code that represents maybe 15% of my total search effort. That is why I always laugh at Vincent's claim that his move generator is 2x faster than mine. That is a 3% or less savings, since the move generator is about 1/2 the total for make/unmake/movgen.

Let me again explain where perft came from and exactly why...

In 1981 I used something that is almost identical to what is now called 0x88. It was the way I generated moves from the middle 70's. Harry Nelson was working on a vectorized move generator and he wanted to get rid of the 0x88 (AND) test as it was not convenient to do in a vector form. We talked about different board layouts, eventually ending up with board[120] where the board had 12 rows and 10 columns. The first two and last two rows were "off the board and gave 2 squares to catch illegal knight moves. The two edge rows were off the board (one on each side) but since the board wrapped around, it gave us 2 squares to catch illegal knight moves off the edge of the board.

Harry said "we need a way to test various ways to make/unmake and generate to see which is the fastest, without using a search since the move generators could produce moves in different orders and greatly alter the shape of the resulting tree.

So, off I went and developed the make/unmake/moven test to walk a depth-first tree from front to back, no pruning or anything, solely to time make / unmake and debug everything (that's why perft walks a tree rather than just making/unmaking/generating the same moves over and over -- the crays had to cache so that would have worked. But we wanted to make sure everything was getting generated and updates properly, and the existing "perft" was born.

The node counts were used to verify that all the different generator algorithms we tried, and the make/unmake stuff, all worked properly and produced the same numbers. We timed the test to a fixed depth to choose the fastest. And I have had that bit of code ever since. It is very complicated, taking about 10 lines of code in total...

That is why it was developed. I later exchanged those "numbers" with others that were developing chess engines since particularly en passant generation can be tricky. In theory, two programs should match exactly in total nodes (or else one has a bug), and the time used by each can be compared to measure the difference in performance of the make/unmake/movgen algorithms used in each. But today the time comparison is impossible, because many are now not walking the tree by generating/making/unmaking, but instead they are using hashing to avoid generating and making/unmaking the same moves in the same position a second time. Nothing wrong with the idea, except that now you can't compare those times with Crafty's times and draw any conclusion other than that the author of the other program spent more time making perft faster, without it having any influence on how his program plays chess at all. Out of the perft code, only make/unmake/movgen carry into the real search... In my code, that is all I time. And it gives a standard way to measure the performance of those parts of the code, for those interested in comparing.


Pretty infantile, eh? I spend 10% of my time or less in make move. If I make it 50% faster, my program becomes 5% faster. That is not going to be a big Elo boost, and I am not going to get 50% anyway. I don't believe I ever said that faster movgen had _zero_ effect. I did (and still do) say it has _minimal_ effect. That can be proven easily enough.

So what is your point? That you can make gratuitous comments when you run out of real data to talk about?
The point was that for the things one does to optimize and tune engines there are never any prizes awarded directly, and that forwarding this as a reason for not doing them is plain silly. You furthermore seem unable to escape from your own situation, which is totally different from that of people who are just starting to build their engine. (The people that typically are interested in threads like this.) For many people move generation is more than 50% of their search time. These are the people for which perft is important, and these are the people that use it. That you have gone through this stage long ago is no reason at all why others shoud not do it.

And, btw, I thought the whole purpose of your 100,000-game cluster testing was to be able to identify very small improvements. I am sure that a 5% speedup would actually show up as a big improvement on that scale.
Then you are sure about something that is dead wrong. I have been using 20K games for the longest not 100K. But for 5% it might take 100K to get a reliable estimate of the small advantage that gives.

Now I really have no idea where this is heading. SEE can be disposed of? And blow up my tree by a factor of 3-4?
Well, this number certainly has grown since you reported it last. But we have had discussions about this topic before, and not all people seemed to agree on that number. A SEE is certainly not an integral part of alpha-beta search. Alternatives have known to be used for it.
Only one I know of. MVV/LVA. A proven inferior approach. It is easy enough to test. I could probably even find the old versions of Crafty where this was an option. I did that back when Hsu and I discussed this very issue around 1995/1996 or so. MVV/LVA is ideal for hardware implementations. It is way sub-optimal for software implementations that don't have to be a synchronous finite-state machine in behavior.
If you want, I can easily hack up my eval and compare the perft speed to actual search speed using just material to make it as favorable for you as possible. In fact:

rnbq1b1r/p1pp1p1p/4k3/1p1NP1p1/2QP1p2/5N2/PP1B1KPP/n6R w - - 0 1

perft 5 ->
total moves=69252968 time=11.10

By my reckoning, that is 6.2M nodes per second. Searching that position:

time=11.57 mat=-10 n=23518178 fh=98% nps=2.0M

So I search (material only) at just under 1/3 the speed of the perft code. According to my recent numbers I posted, eval was 50%. No eval in the above. That leaves 3x the work being done that is inside perft.

Since perft is doing make/unmake/movgen, the other 2/3 of the time being spent has to be the search itself, hash probing, repetition checking, etc.

So my comment about converting perft NPS to search was dead on. Saying "almost none" is actually pretty comical.
Yes, very funny. Except, of course, that your comparison between the non-hashing perft and the hashing engine is totally meaningless, and was certainly not covered by the original statement. That applied to comparing a perft not making / unmaking the final ply to an alpha-beta search under otherwise similar conditions (of hashing / no hashing). The numbers you quote can only be seen as support for my statement that one cannot get a good indicator for engine speed from a perft that makes/unmakes the last ply. So it seems the joke is on you...
Do you not think about what you are saying? My search does hash. Perft does not. If I hashed perft the speed would go _UP_ correct? and this difference would be _larger_ not _smaller_ correct? What part of that don't you understand. No way perft and search are going to run about the same NPS. Absolutely no way. Except for an incredibly primitive engine with a search function that is 10 lines long (like the classic negamax code Ken Thompson published 20+ years ago...

So there is no "joke on me". I'm doing my best to be consistent. PERFT has only a 15% correlation with my search speed, because 15% of my total search effort is in make/unmake/generate.

What does make/unmake in the last ply have to do with anything at all, since my engine certainly does that when searching the tree... My perft behaves exactly like my search in that regard. Whatever I generate I make unless they get cut off or discarded as losing.



I actually just went through the reverse process that you are describing last year, which is how I know: adding stuff to the perft that would turn it into an alpha-beta search. I had expected some of the indispensible tasks, in particular move sorting, to be expensive enough to severly degrade the nps. But this proved not to be the case at all, probably because there are typically only a few captures, and I did not sort the non-captures.
Good. Then I don't have to worry about direct competition from your program for many years. If I am doing things you are not doing, and I consider myself to be doing the bare minimum to keep speed up, where does that leave you? Or, again, does your comment become gratuitous but meaningless???
We will see... :lol:
OK, back up a few dozen lines. I now provided you with some data without evaluation included. Now dance and bobble around (again) real data, not what you "think". perft speed is only marginally related to actual search speed. I removed my eval, made a run, and the perft numbers were _still_ 3x faster than my program. Exactly as I had said they would be. Begin to think that I _do_ know what my program is doing???
I already addressed this above. You seem to be comparing apples and oranges. Btw, I did relay the real data that I obtained when converting the perft above into an engine. Not what i "think", and in fact opposite to my own expectation.

Based on this experience, I made my statement, which I will repeat here lest it'd be forgotten: "Adding the minimal stuff (with material+PST eval only) needed to make a perft that does _not_ make/unmale the final ply into an engine (preserving the hashing conditions) would give you an engine with a speed in leaf-nodes per second only slightly worse than the perft had in terms of d=1 nodes per second."

That begs the question "Who cares?" What sense does doing material + PST in perft make? My make/unmake maintain material balance incrementally, so that is a penalty in my Make/Unmake code that is already in perft. But it is where I choose to do it, and that is the cost of doing a make/unmake in crafty. And when someone has a faster make/unmake and they report "and I do no incremental updating in make/unmake" I (and they) quickly realize that the comparison is not exactly perfect. But it is as good as it can be, because I am not going to remove it from mine, and they are not going to add it to theirs. So the real test becomes raw NPS when the engines do a real search, as that is the only way to compare the overall speed...

Jeez. No extensions, no effect. So extensions are not causing the effect. Let me get this right. Extensions on, effect observed. Extensions off, effect not observed. Yet the two are independent?
As you should know, the fact that two quantities correlate, does not always mean there is a causal relationship between the two. You seem to focus on check extensions here, which is tricky, as in-check positions have very different numbers of legal moves than others, and the nps is obviously sensitive to this number of moves.
That was my point.

So to test if extension per se affects the nps, you should test under conditions that keep the other parameters as closely the same as possible. E.g. test it on a recapture extension, or simply a random extension.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: What is the best known speed of move generation?

Post by bob »

Aleks Peshkov wrote:
hgm wrote:As you pointed out above, about half of the nodes are all nodes. So by deferring, you at most save half the original cost, on the average.
Some years ago someone (Vasik Rajlich) said that CUT-nodes number about 20 times greater then ALL-nodes number in TT... I did not test it myself, please correct if it is a wrong statement.
I am not sure of those numbers. The only thing clear is that for each ALL node, you will find N cut nodes. For each Cut node, you will find one ALL node. So there should be more. But I would expect something like 6 or less for the ratio, which ought to be somewhere around the effective branching factor or a bit higher. But probably not 20 with today's searches.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: What is the best known speed of move generation?

Post by bob »

smrf wrote:Why we do this by discussion? Simply generate a comparable statistic as I have made with SMIRF, and check, which approach is generating that useful information more quickly.

Reinhard.
I just posted a test I ran. I got far fewer pinned piece moves than you did, but I picked three positions from a real game, one from move 8, 16 and 24 in a game that started e4 e5 Nf3 Nc6 Bc4...
Uri Blass
Posts: 10793
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: What is the best known speed of move generation?

Post by Uri Blass »

bob wrote:
hgm wrote:
So your rule has certainly no absolute validity. I would even say that it has very limited validity. Almost always it is better to do things in places where they can be done most efficiently. Only for tasks that can be done equally efficiently everywhere, deferring might be a default tie breaker.
Perhaps after you work on your math a bit, you might reach a different conclusion about "limited validity". Do I really want to do extra work on 32 million moves, to avoid generating just over 100 total moves?
No but the main advantage of the legal move generator is not the fact that you generate less moves but the fact that you do not need to check the possiblity to capture the king after every move.

In your code you need to test after every move if capturing the king is possible.
With legal move generator this is not needed and you can start from capturing the queen or the biggest piece.

I do not claim which move generator is faster but I see no reason to assume as obvious that legal move generator is slower.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: What is the best known speed of move generation?

Post by bob »

Uri Blass wrote:
bob wrote:
hgm wrote:
So your rule has certainly no absolute validity. I would even say that it has very limited validity. Almost always it is better to do things in places where they can be done most efficiently. Only for tasks that can be done equally efficiently everywhere, deferring might be a default tie breaker.
Perhaps after you work on your math a bit, you might reach a different conclusion about "limited validity". Do I really want to do extra work on 32 million moves, to avoid generating just over 100 total moves?
No but the main advantage of the legal move generator is not the fact that you generate less moves but the fact that you do not need to check the possiblity to capture the king after every move.

In your code you need to test after every move if capturing the king is possible.
With legal move generator this is not needed and you can start from capturing the queen or the biggest piece.

I do not claim which move generator is faster but I see no reason to assume as obvious that legal move generator is slower.

Uri
I don't check after every move. When I generate moves, I use a switch to factor in the piece being captured. If the piece is a king, I just abort things on the spot. A switch doesn't do extra work depending on the case being handled. It is just a jump table that jumps to a different place for king captures than when continuing normally after capturing something else...
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: What is the best known speed of move generation?

Post by hgm »

Jacob wrote:Number of pinned pieces of the side to move, calculated before any moves are generated.

Code: Select all

r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
 1.            0 pins,             48 nodes,  0.00%
 2.            1 pins,           2039 nodes,  0.05%
 3.           41 pins,          97862 nodes,  0.04%
 4.         3587 pins,        4085603 nodes,  0.09%
 5.       155840 pins,      193690690 nodes,  0.08%

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
 1.            0 pins,             20 nodes,  0.00%
 2.            0 pins,            400 nodes,  0.00%
 3.            0 pins,           8902 nodes,  0.00%
 4.           96 pins,         197281 nodes,  0.05%
 5.         1818 pins,        4865609 nodes,  0.04%
 6.       125295 pins,      119060324 nodes,  0.11% 
I could run more positions, but it seems pretty clear that they are indeed rare. Assuming that multiple illegal moves would be generated for each pinned piece, it'd probably average near 1%.
Just out of curiosity: can you run the same test for the following position?
[d] rnb1kbnr/ppppqppp/8/8/8/8/PPPPQPPP/RNB1KBNR w KQkq -
Uri Blass
Posts: 10793
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: What is the best known speed of move generation?

Post by Uri Blass »

bob wrote:
Uri Blass wrote:
bob wrote:
hgm wrote:
So your rule has certainly no absolute validity. I would even say that it has very limited validity. Almost always it is better to do things in places where they can be done most efficiently. Only for tasks that can be done equally efficiently everywhere, deferring might be a default tie breaker.
Perhaps after you work on your math a bit, you might reach a different conclusion about "limited validity". Do I really want to do extra work on 32 million moves, to avoid generating just over 100 total moves?
No but the main advantage of the legal move generator is not the fact that you generate less moves but the fact that you do not need to check the possiblity to capture the king after every move.

In your code you need to test after every move if capturing the king is possible.
With legal move generator this is not needed and you can start from capturing the queen or the biggest piece.

I do not claim which move generator is faster but I see no reason to assume as obvious that legal move generator is slower.

Uri
I don't check after every move. When I generate moves, I use a switch to factor in the piece being captured. If the piece is a king, I just abort things on the spot. A switch doesn't do extra work depending on the case being handled. It is just a jump table that jumps to a different place for king captures than when continuing normally after capturing something else...
Some comments:

1)If you can capture the king you abort but the common case is the case that you cannot capture the king and checking if you can capture the king after legal move cost time that you do not need to spend with legal move generator.

2)Note that H.G.Muller's move generator for perft is not something that is the best for chess programs because it has no special functions to generate only captures and to generate only non captures.


Practically many programs have a special function to generate non captures(I do not have it but I may write a function to do it when I rewrite movei)

I wonder if joker has function to generate only quiet moves.
Having 2 functions when one of them generate captures and one of them generate quiet moves is slower for calculating perft but common sense tells me that it is faster for games and I believe that most programmers use it(note that I did not try it and I have only functions to generate captures(for the qsearch) and generating all moves(for the normal search).

Uri
Gerd Isenberg
Posts: 2251
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: What is the best known speed of move generation?

Post by Gerd Isenberg »

Uri Blass wrote: 1)If you can capture the king you abort but the common case is the case that you cannot capture the king and checking if you can capture the king after legal move cost time that you do not need to spend with legal move generator.
If you already have a switch on piece (if any) on target square in makemove, adding a case to a jmp-table is almost free. Otherwise a conditional jump is good predictable.

Let say the extra condition and code size costs are 2 cycles per pseudo legal move in average - including the rare cases where invalid moves occur. That is N (let say 30) moves times 2 cycles per all-node == 60 cycles. You can compensate these 60 cycles easily if you consider pins outside the move-loop, but the all-node has childs as well. Assuming these childs are leaf-cuts at the horizon, there are a lot of cases where you already cut on (lazy) eval without the need of move-gen and probably to detect pinned pieces at all. If this is true, and you can generate legal moves with the extra cost of 60/2 cycles or less, legal move generation might be a break or even a winner.

Since Bob has no check evasion in qsearch - he has a point with the costs of determining pinned pieces - since otherwise pinned pieces may be scanned en passant while looking for sliding checkers. That seems all consistant with Bob's claims on the defectiveness or blindness of qsearch plus SEE on other tactical threats, like overloaded pieces, forks, pins against other valuable or hanging pieces, discoverd attacks in SEE etc. In the sense why to take care of pins if one overlooks similar threats as well. Which may yield to a somehow unbalanced search with superiour knowldege in some area (absolute pins), but lacking similar tactical issues elsewhere (eg. pins to queen). Otoh Bob has additional costs where winning captures in qsearch are invalid in check and returns alpha if mated.

The counter paradigma - if you can get something en-passant as a waste-product, "virtually" for free, take it - even if it is only used in let say 30% of the nodes. The idea is to re-use it for "better" move-ordering, more reliable evaluation, extension- and/or pruning decisions which are likely to reduce ebf.
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: What is the best known speed of move generation?

Post by hgm »

I did some statistics on the pin test:

Code: Select all

$ ./pinperft 6
 - - - - - - - - - - - -
 - - - - - - - - - - - -
 - - r n b q k b n r - -
 - - p p p p p p p p - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - P P P P P P P P - -
 - - R N B Q K B N R - -
 - - - - - - - - - - - -
 - - - - - - - - - - - -

Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

                   pins (in total tree):
Depth      moves   tests  candidates  confirmed     %   time
    1         20       0           0          0  0.00 ( 0.000 sec)
    2        400       0           0          0  0.00 ( 0.000 sec)
    3       8902       0           0          0  0.00 ( 0.000 sec)
    4     197281     148         148         96  1.08 ( 0.015 sec)
    5    4865609    4100        4091       1914  0.92 ( 0.063 sec)
    6  119060324  232325      231786     127209  2.58 ( 1.953 sec)


$ ./pinperft 5 "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w K
 - - - - - - - - - - - -
 - - - - - - - - - - - -
 - - r . . . k . . r - -
 - - p . p p q p b . - -
 - - b n . . p n p . - -
 - - . . . P N . . . - -
 - - . p . . P . . . - -
 - - . . N . . Q . p - -
 - - P P P B B P P P - -
 - - R . . . K . . R - -
 - - - - - - - - - - - -
 - - - - - - - - - - - -

Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

                   pins (in total tree):
Depth      moves   tests  candidates  confirmed     %   time
    1         48       1           1          0  0.00 ( 0.000 sec)
    2       2039       4           4          1  2.08 ( 0.000 sec)
    3      97862    1745        1703         42  2.01 ( 0.000 sec)
    4    4085603   15316       15028       3629  3.67 ( 0.078 sec)
    5  193690690 3079700     2949422     159469  3.81 ( 2.922 sec)


$ ./pinperft 6 "rnb1kbnr/ppppqppp/8/8/8/8/PPPPQPPP/RNB1KBNR w KQkq - 0 0"
 - - - - - - - - - - - -
 - - - - - - - - - - - -
 - - r n b . k b n r - -
 - - p p p p q p p p - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - P P P P Q P P P - -
 - - R N B . K B N R - -
 - - - - - - - - - - - -
 - - - - - - - - - - - -

Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

                   pins (in total tree):
Depth      moves   tests  candidates  confirmed     %   time
    1         24       1           1          1 99.01 ( 0.000 sec)
    2        547      24          24         24 95.79 ( 0.000 sec)
    3      13629     522         520        520 90.67 ( 0.015 sec)
    4     336781   12813       12766      12674 89.18 ( 0.000 sec)
    5    9109498  296000      293090     285048 80.88 ( 0.156 sec)
    6  246238592 7703967     7618770    7335264 77.39 ( 4.282 sec)
Of course I cannot count the number of illegal moves, as my perft does not generate them. But I can count how often the test is performed.

The figures above gives these numbers (for the entire tree, not only in the d=1 nodes). To interpret the numbers, they must be referred to the perft count of the previous depth, as perft(5) tells how many times the move generator has to be called to calculate perft(6), and thus how many pin tests could be possibly done.

I put in 4 counters. The first measures the number of times the in-check detection code finds a blocker on a slider-King ray. In the given positions this is then almost always an own piece (which then is a candidate for being pinned). In practical situations this is logical, as the ray is scanned starting from the King, and we tend to shelter our King amongst own pieces. (I could add that if a single blocker of a check ray is an opponent piece, in an engine it would certainly be very productive to know it, as a discovered check usually means you are in deep shit. So what you lose in going from tests to candidates would certainly not be wasted time in an engine, although it is such in a perft.)

If the ray is blocked by an own piece, the real pin test starts. In about half the cases it confirms the pin, in the other half there were more pieces between King and slider. So it is not like you are testing many times in vain, and have to earn back that time from a very few detected pins. It is about 50-50. And as soon as you detect the pinned piece, you start saving on the generation of its moves, as this now only has to be done in a (known) direction. Note that even if there were no moves in other directions (e.g. a Pawn pinned along a file), the move generator would still have spent time on attempting to generate moves (having to test if there was something to capture on the board for that Pawn).

Only in about 5% of the move generations control flow gets to the point where the test is done. (So compared to the number of generated moves, that is about 1:500).

Note that in perfts for a position where a pin is already present at the root, the numbers are quite different...