What is the best known speed of move generation?

Jacob · Post by **Jacob** » Mon Oct 15, 2007 11:47 pm

I doubt it. but I suppose we will never know. I am certain enough that moving an absolute pinned piece is rare enough it is not worth worrying about. If you feel differently, go for it. I suppose I could add a counter and run some positions to see how often it happens... I am pretty sure I know the answer already however...

Number of pinned pieces of the side to move, calculated before any moves are generated.

Code: Select all

r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
 1.            0 pins,             48 nodes,  0.00%
 2.            1 pins,           2039 nodes,  0.05%
 3.           41 pins,          97862 nodes,  0.04%
 4.         3587 pins,        4085603 nodes,  0.09%
 5.       155840 pins,      193690690 nodes,  0.08%

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
 1.            0 pins,             20 nodes,  0.00%
 2.            0 pins,            400 nodes,  0.00%
 3.            0 pins,           8902 nodes,  0.00%
 4.           96 pins,         197281 nodes,  0.05%
 5.         1818 pins,        4865609 nodes,  0.04%
 6.       125295 pins,      119060324 nodes,  0.11%

I could run more positions, but it seems pretty clear that they are indeed rare. Assuming that multiple illegal moves would be generated for each pinned piece, it'd probably average near 1%.

bob · Post by **bob** » Mon Oct 15, 2007 11:53 pm

Jacob wrote:

I doubt it. but I suppose we will never know. I am certain enough that moving an absolute pinned piece is rare enough it is not worth worrying about. If you feel differently, go for it. I suppose I could add a counter and run some positions to see how often it happens... I am pretty sure I know the answer already however...
Number of pinned pieces of the side to move, calculated before any moves are generated.
Code: Select all
r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -
 1.            0 pins,             48 nodes,  0.00%
 2.            1 pins,           2039 nodes,  0.05%
 3.           41 pins,          97862 nodes,  0.04%
 4.         3587 pins,        4085603 nodes,  0.09%
 5.       155840 pins,      193690690 nodes,  0.08%

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
 1.            0 pins,             20 nodes,  0.00%
 2.            0 pins,            400 nodes,  0.00%
 3.            0 pins,           8902 nodes,  0.00%
 4.           96 pins,         197281 nodes,  0.05%
 5.         1818 pins,        4865609 nodes,  0.04%
 6.       125295 pins,      119060324 nodes,  0.11% 
I could run more positions, but it seems pretty clear that they are indeed rare. Assuming that multiple illegal moves would be generated for each pinned piece, it'd probably average near 1%.

also, what is your definition of "pinned"??? "absolute pinned" (pinned on king) or just pinned on another but more valuable piece?

Another note. absolute pins don't last as long in real searches, as most programs view the limited mobility as bad and do something quickly to break the pin, which prunes away most of the lines with the pin remaining.

In any case, I don't think worrying about _that_ particular problem is worth any effort at all. Illegal moves is more interesting and even that is a small problem...

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Oct 15, 2007 11:55 pm

bob wrote: But I at least follow the old rule "never compute today what you can put off until tomorrow, because tomorrow might never come."

That is indeed a very strong argument for pseudo legal move-generation.
I think both approaches have pros and contras and are related to underlaying data-structures and qsearch policy and philosophie.

A few aspects. Even if invalid moves of absolute pinned pieces are rare, there is the cost to generate them to make and unmake plus generation and make/unmake of the invalid king capture to recognize the fault. Thus in patological but decisive positions like queen-ending it may become much more expensive than a precondition on determing pinned pieces.
In endings queens and rooks may mutually pinn each other, wasting a lot of time with illeagal moves. Yes, alpha-beta has cut-nodes. Otoh there are all-nodes, where it likely pays off to delegate stuff from the childs to the parents, aka move-loop hoisting at least with bitboards. With magic lookups it takes only two lookups from the king square, either for orthogonal or diagonal directions, to determine whether there are absolute pinners as a cheap precondition. Pinned piece information may be re-used in eval as well.

Jacob · Post by **Jacob** » Mon Oct 15, 2007 11:59 pm

also, what is your definition of "pinned"???

Pins to king of course, since only those are relevant to move generation. Isn't moving a pinned piece the main source of illegal moves? Other than check evasions?

bob · Post by **bob** » Tue Oct 16, 2007 12:02 am

Aleks Peshkov wrote:I want to ask Robert Hyatt about Crafty timing profile:
25% movegen/makemove/search
50% static eval

What is the rest? What part (search or eval) used the rest of time?

For accuracy, I just re-ran the profile using our latest source version. Here are the "biggies":

Code: Select all

Evaluate()          43%
Search()            13%
Swap()               9%
Generate()           8%
Make/Unmake          8%
Quiesce()            7%
Hash()               5%

Those are rounded so a couple of percentage points are missing. There are a few small contributors (2.5% here, 1% there, such as NextMove() that I omitted.

This is using gcc and gprof to view the results.

Dann Corbit · Post by **Dann Corbit** » Tue Oct 16, 2007 2:10 am

bob wrote: {snip}
So the idea becomes important to design the fastest perft possible to provide perft numbers for _others_ so that they can compare to those numbers? That sounds highly logical. Most people use perft for a short while and then never again, once they get their move generator debugged. The idea of spending a lot of time optimizing something that I will never use myself again leaves me blinking...

There are a lot of people very interested in deep searches to find out the actual answers to how many moves are at exactly ply n. You will find several posts from Steven J. Edwards on that topic (as well as several others).

The fastest full possible move calculation without error has become and end in itself.

There was even a distributed process to calculate the totals:
http://www.albert.nu/programs/dperft/main.htm

The latest programs have gotten so fast that even the 11 ply total is not inconceivable on modern hardware for a single machine (whereas before it took a great deal of time and multiple machines quite a long time).

bob · Post by **bob** » Tue Oct 16, 2007 3:56 am

Gerd Isenberg wrote:
bob wrote: But I at least follow the old rule "never compute today what you can put off until tomorrow, because tomorrow might never come."
That is indeed a very strong argument for pseudo legal move-generation.
I think both approaches have pros and contras and are related to underlaying data-structures and qsearch policy and philosophie.

A few aspects. Even if invalid moves of absolute pinned pieces are rare, there is the cost to generate them to make and unmake plus generation and make/unmake of the invalid king capture to recognize the fault.

Remember, that in Crafty what you are mentioning only happens in the q-search. In the normal search I verify legality when I make a move, as I can't do null-move if in check... So for me the cost is to generate the move, then make/unmake it, which is not that significant since it is so rare.

Thus in patological but decisive positions like queen-ending it may become much more expensive than a precondition on determing pinned pieces.
In endings queens and rooks may mutually pinn each other, wasting a lot of time with illeagal moves. Yes, alpha-beta has cut-nodes. Otoh there are all-nodes, where it likely pays off to delegate stuff from the childs to the parents, aka move-loop hoisting at least with bitboards. With magic lookups it takes only two lookups from the king square, either for orthogonal or diagonal directions, to determine whether there are absolute pinners as a cheap precondition. Pinned piece information may be re-used in eval as well.

Presently I do not do anything special with pinned pieces at all. Not sure that I ever will, as increasing depth exposes most pins easily enough if they are a problem...

hgm · Post by **hgm** » Tue Oct 16, 2007 1:57 pm

bob wrote:Err, I do it somewhere _else_???

Ach so, somewhere else... Is that suddenly "allowed"? So why are you trying to build a court case out of it when I happen to do the pin test "somewhere else"???

Determining "in check" is not very complicated in bitboards...

Neither is testing for pins or potential pins.

I do, and always have, generates pseudo-legal moves. For many reasons:

(1) at every other ply, only one move needs to be searched. Why screen all moves for legality as they are generated if I am only going to search one or two?

Because it is faster (even if only a little)?

In fact, I avoid generating moves a significant part of the time because I can try the hash move prior to generating _anything_.

Of course. So that has no bearing on the legal vs pseudo-legal issue at all.

(2) most moves are legal. Testing for legality when generating them is a waste. And when you consider (1) above culls about 1/2 of them on average anyway, this is further justification for putting off those tests until I am certain they are needed, so that I avoid doing it most of the time.

That argument would only have any validity if you would first generate the moves, and then test them one by one. If, otoh, one suppresses generating the moves with pinned pieces, like I do, the time saved on not generating those moves already is more than the time taken by testing for pinned _pieces_ (rather than testing for illegal moves). And the factor 1/2 is no issue, as even in cut nodes you generate _all_ the moves. So you save this time in every node.

(3) I certainly don't care about "in check" in the q-search as that is a simple part of the search only following captures. Most captures are not illegal, so again, testing them for legality wastes time over discovering this later, at least in my code. This gives me the _same_ move generator for normal search and quiescence search, which is cache-friendlier.

That is a design choice, and probably a good one. But note that for me, using the same generator in QS and main Search would actually mean that _both_ of them abstain from captures with pinned pieces.

Here's a hint. Not one single design feature in Crafty was done "just for the hell of it"... _EVERYTHING_ I (and now we) do in Crafty is thoroughly tested, both for performance in terms of speed, and then for performance in terms of playing against other opponents. I used to not do in_check at all, letting capturing the king at the next ply indicate that the previous move was illegal. But when I added null-move, making such when you are in check is bad and has to be avoided. But I at least follow the old rule "never compute today what you can put off until tomorrow, because tomorrow might never come." That is a golden rule when dealing with alpha/beta and the cutoffs it produces.

So that means that the cost per illegal pseudo-legal move is indeed very much higher than the cost of generating and making that single move: it in fact requires the generation of all replies. Which is, btw, also the way uMax does it, for reasons of code compactness. (And it is yet another case where moves are generated that are not made.

)

But this raises a quite different question: if you don't explicitly test for legality after making the move of the last ply, but wait for the move generation in the reply node to find the King capture, how can you ever get the correct perft value? It would mean you have to _generate_ a (pseudo-legal) perft(7) (but not making the final ply, just screening the moves for King capture...

), to calculate a perft(6).

You wouldn't have something in there that tests the moves for legality just after they are made, something that wouldn't be called in your normal search, but would only be there for the purpose of speeding up the perft, would you? No, that can't be...

Then last year I got myself a free truck. Didn't have to pay a penny the day I drove it off the lot. Of course I had to start paying some later on. So the definition of "free" you are using is bad. Free means _zero cost ever_. Not "zero cost now but I either did some work earlier or will do some more later."

Well, to me "free" can also mean paying for itself. When I bought an apartment building with one empty appartment from entirely borrowed money, and the tenants of the other apartments were paying off the mortgage, plus interest, plus maintenance cost, I indeed had the impression that I was living there for free.

As to the pin test: the code that does it is almost never reached. In the rare case that it is reached, executing it almost always saves a multiple of the time executing it, in the move genrator alone (not even considering make/unmake cost, move generation of the replies, testing them for King capture...). That is free enough for me.

I do that all the time. But I do it with profile runs rather than looking at raw ASM code, but then that is just as accurate either way. My overhead for advancing to the next ply is _really_ minimal. I don't do copy/make for example.

Well, I do that too. But considering how much difficulty I have to get across how I actually do things, I would be surprised if you actually timed my approach. That all code you have in your engine is optimally fast doesn't mean a thing, if the task that the code is performing could have been avoided alltogether.

Unless you do like _some_ and change them for perft. That is the case I was arguing against. Including the idea of hashing perft results from one node and using that if you reach the same position again rather than walking that part of the tree. That can even hide bugs...

Yes, but it can also reveal them. In particular hash bugs. So perft is both very useful without hashing, to determine the correctness and speed of your tree-walk, as well with hashing, to determine the correctness of your hash probing and the quality of your replacement algorithm.

Nope. If you can "test for free" can you not formulate that into a methodology to generate moves for free as well? The question is similar (which squares are occupied vs empty, and what is on them). There is absolute no free "test" on a computer. Unless you can show me a few instructions that execute in zero cycles...

Actually there are many instructions that take zero cycles of execution time. When they are hidden in the latency of unavoidable other instructions (e.g. the hash probe), for instance, or when they use a CPU resource that is scarcely used while other instructions queue up for scarce resources. Rarely taken conditional branches are usually completely free in that sense, as the code is usually limited by register-file access. But let us not divert into that, as it will open up an entirely fresh can of worms.

In the pin case I consider the test "free" because it is done (in the rare cases that it is done) at the point where the in-check test encounters an obstacle on the ray connecting slider and King. At that point there is a conditional branch that is taken if the obstacle is the slider. When you fall trhough that branch, you have automatically arrived in a place in the code where there is a very high probability for a pin, without doing anything specifically aimed at testing for pins yet. The control flow already exists, one just has to exploit it.

Ah, but most of the time it is not done. So you spend time trying to eliminate something that is not there, when I don't. That's my point. With alpha/beta, it is _always_ better to defer when possible/practical, because often it is a permanent deferral because of a cutoff.

As you pointed out above, about half of the nodes are all nodes. So by deferring, you at most save half the original cost, on the average. (And not all cut nodes do cutoff on the first move, especially if they fail low on the hash move so that you might start to profit from deferring something...) So if deferring something makes it more than twice as expensive, because you are deferring stuff that could have been done easily to a place where it has to be done with difficulty, you will lose on deferring it.

So your rule has certainly no absolute validity. I would even say that it has very limited validity. Almost always it is better to do things in places where they can be done most efficiently. Only for tasks that can be done equally efficiently everywhere, deferring might be a default tie breaker.

You are hung up on avoiding something I do that doesn't happen very often. I don't carry a meteor umbrella around since I don't expect to be hit by one very often. That's the idea here.

I don't have the feeling I am "hung up" on anything. In building a perft and an engine I had to make design choices, and this seemed the optimal way to handle it. I merely mentioned it, and it was you who got hung up about it. But I seem to detect a shift in your position now from "this only costs extra time" to "the time you gain by this is so small that I don't care". Well, the latter might very well be true. But when in a "don't care" situation, why would I intentionally choose for the solution that should be slightly worse?

Still going in circles however. If we don't do that very often, but you do your test for _every_ move, then I wonder which takes the most time? That's my point.

It is more a "beside the point"...

as I do _not_ test for every move. I do not even test once for every move generation. I only test something when my check test (wherever I do it) has already filtered out a situaton where an enemy slider aimed at my King is blocked by something. This very rarely happens. And if it happens, it is very likely to be a pin. If it is a pin in 10% of the cases, I only have to earn back 10 times the cost of the test if it actually is a pin to break even. That goal is easily achieved in the move generator alone.

No, but that is worst-case for my approach as well. If you start off in check, almost all moves are illegal. Trying them only adds 1%. What about normal positions where most positions are not "in check" positions? The difference can't be measured, at least on my code. I only see advantage to the "legal generator only" when the position is highly tactical with lots of deep checking lines. Then I get 1%. I didn't know that until I wrote the code or I would not have even done that...

Well, in any case we seem to agree that an in-check test is worth doing, even if only a little. And that check test happens to also inform me of strong pin candidates. I might as well use that information.

I doubt it. but I suppose we will never know. I am certain enough that moving an absolute pinned piece is rare enough it is not worth worrying about. If you feel differently, go for it. I suppose I could add a counter and run some positions to see how often it happens... I am pretty sure I know the answer already however...

Well, like I said, why intentionally prefer the suboptimal solution just because the difference is only small, if you have a choice?

Fine by me, so long it is the originally defined perft algorithm. Start at an initial position, generate all moves and make each one.

Well, one of the results I quoted obviously was.

After each move is made, change sides, and generate all moves and make each one. Repeat until some fixed depth. Illegal moves must be discarded and not counted. Give me the position and I'll run it.

Why not try the opening?

But I still would like to have clarity on if you decide upon the legality by running a capture generator for the reply and extracting the most-valuable victim, like you seem to do in your engine, or if you have special code there only for the benifit of making the perft faster.

Can I say this any clearer than:

"NOT IN PERFT"

We are talking about perft. We are not talking about alpha/beta. Not about anything else. Just "perft" which has been around since 1981 or 1982 I don't remember which.

Clear enough, but that means that in your definition of perft, the perft time is a pretty meaningless indicator of the performance of the search part of an engine. So you should not be surprised that they are using small variations on it. As you did not trademark the name 'perft', it is well within their rights to also call these variations 'perfts'.

Pretty infantile, eh? I spend 10% of my time or less in make move. If I make it 50% faster, my program becomes 5% faster. That is not going to be a big Elo boost, and I am not going to get 50% anyway. I don't believe I ever said that faster movgen had _zero_ effect. I did (and still do) say it has _minimal_ effect. That can be proven easily enough.

So what is your point? That you can make gratuitous comments when you run out of real data to talk about?

The point was that for the things one does to optimize and tune engines there are never any prizes awarded directly, and that forwarding this as a reason for not doing them is plain silly. You furthermore seem unable to escape from your own situation, which is totally different from that of people who are just starting to build their engine. (The people that typically are interested in threads like this.) For many people move generation is more than 50% of their search time. These are the people for which perft is important, and these are the people that use it. That you have gone through this stage long ago is no reason at all why others shoud not do it.

And, btw, I thought the whole purpose of your 100,000-game cluster testing was to be able to identify very small improvements. I am sure that a 5% speedup would actually show up as a big improvement on that scale.

Now I really have no idea where this is heading. SEE can be disposed of? And blow up my tree by a factor of 3-4?

Well, this number certainly has grown since you reported it last. But we have had discussions about this topic before, and not all people seemed to agree on that number. A SEE is certainly not an integral part of alpha-beta search. Alternatives have known to be used for it.

If you want, I can easily hack up my eval and compare the perft speed to actual search speed using just material to make it as favorable for you as possible. In fact:

rnbq1b1r/p1pp1p1p/4k3/1p1NP1p1/2QP1p2/5N2/PP1B1KPP/n6R w - - 0 1

perft 5 ->
total moves=69252968 time=11.10

By my reckoning, that is 6.2M nodes per second. Searching that position:

time=11.57 mat=-10 n=23518178 fh=98% nps=2.0M

So I search (material only) at just under 1/3 the speed of the perft code. According to my recent numbers I posted, eval was 50%. No eval in the above. That leaves 3x the work being done that is inside perft.

Since perft is doing make/unmake/movgen, the other 2/3 of the time being spent has to be the search itself, hash probing, repetition checking, etc.

So my comment about converting perft NPS to search was dead on. Saying "almost none" is actually pretty comical.

Yes, very funny. Except, of course, that your comparison between the non-hashing perft and the hashing engine is totally meaningless, and was certainly not covered by the original statement. That applied to comparing a perft not making / unmaking the final ply to an alpha-beta search under otherwise similar conditions (of hashing / no hashing). The numbers you quote can only be seen as support for my statement that one cannot get a good indicator for engine speed from a perft that makes/unmakes the last ply. So it seems the joke is on you...

I actually just went through the reverse process that you are describing last year, which is how I know: adding stuff to the perft that would turn it into an alpha-beta search. I had expected some of the indispensible tasks, in particular move sorting, to be expensive enough to severly degrade the nps. But this proved not to be the case at all, probably because there are typically only a few captures, and I did not sort the non-captures.

Good. Then I don't have to worry about direct competition from your program for many years. If I am doing things you are not doing, and I consider myself to be doing the bare minimum to keep speed up, where does that leave you? Or, again, does your comment become gratuitous but meaningless???

We will see...

OK, back up a few dozen lines. I now provided you with some data without evaluation included. Now dance and bobble around (again) real data, not what you "think". perft speed is only marginally related to actual search speed. I removed my eval, made a run, and the perft numbers were _still_ 3x faster than my program. Exactly as I had said they would be. Begin to think that I _do_ know what my program is doing???

I already addressed this above. You seem to be comparing apples and oranges. Btw, I did relay the real data that I obtained when converting the perft above into an engine. Not what i "think", and in fact opposite to my own expectation.

Based on this experience, I made my statement, which I will repeat here lest it'd be forgotten: "Adding the minimal stuff (with material+PST eval only) needed to make a perft that does _not_ make/unmale the final ply into an engine (preserving the hashing conditions) would give you an engine with a speed in leaf-nodes per second only slightly worse than the perft had in terms of d=1 nodes per second."

Jeez. No extensions, no effect. So extensions are not causing the effect. Let me get this right. Extensions on, effect observed. Extensions off, effect not observed. Yet the two are independent?

As you should know, the fact that two quantities correlate, does not always mean there is a causal relationship between the two. You seem to focus on check extensions here, which is tricky, as in-check positions have very different numbers of legal moves than others, and the nps is obviously sensitive to this number of moves.

So to test if extension per se affects the nps, you should test under conditions that keep the other parameters as closely the same as possible. E.g. test it on a recapture extension, or simply a random extension.

Aleks Peshkov · Post by **Aleks Peshkov** » Tue Oct 16, 2007 2:52 pm

hgm wrote:As you pointed out above, about half of the nodes are all nodes. So by deferring, you at most save half the original cost, on the average.

Some years ago someone (Vasik Rajlich) said that CUT-nodes number about 20 times greater then ALL-nodes number in TT... I did not test it myself, please correct if it is a wrong statement.

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Oct 16, 2007 3:57 pm

Aleks Peshkov wrote:
hgm wrote:As you pointed out above, about half of the nodes are all nodes. So by deferring, you at most save half the original cost, on the average.
Some years ago someone (Vasik Rajlich) said that CUT-nodes number about 20 times greater then ALL-nodes number in TT... I did not test it myself, please correct if it is a wrong statement.

What is in hashtable depends on what you store or replace.

I have some counters in my program:

In typical middlegame I have about 20-30% (of the total number of nodes) beta cuts after playing a move (with ~90..95 of them on first move). 30..40% leaf-cuts at or behind the horizon, where static (lazy) eval already fails high. Then about 5-10% other leaves. About 30% all-nodes, but some only investigate a few childs due captures only in qsearch and futility pruning.

What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?

Re: What is the best known speed of move generation?