OK, I will apologize for the "foul mood". But my point was this. I first ran it in a simplistic way which I thought was still reasonably close to to the truth. You did not like the result and claimed it was way off. I ran it a second way that was closer to the real-example we were talking about. The results were similar. You didn't like that either. So then I just modified the code to not stuff the PV and the results were the same as the other two. That was what I was talking about... it takes time to change and test and then run this stuff, all for naught.hgm wrote:I can't understand why this got you in such a foul mood. I only asked a question, which was closely related to the original question. Now I am not supposed to do that unless I already know the answer? I would think that this is exactly what forums and questions are for: to probe for knowledge you don't have... And when I did ask it, I only asked it because I had the impression that you would be able to answer it off hand. (You know, "nothing in Crafty is never done just for the heck of it...".) You know everything, you did it all before, but if someone asks for it you apparently feel compelled to remeasure it from scratch, and then start blaming the asker that he engages you in slave labor..bob wrote:First, it is not my job to run all these tests to prove to you that what I am doing is the most efficient way to address this issue. If you want to measure the cost, feel free. I've already done it. And reported on it years ago. The cost for maintaining the PV is tiny. The PV only gets backed up on non-fail-high/non-fail-low nodes, which within a PVS framework is _extremely_ rare.
....
But answer the questions posed here, not the questions you want to answer, regardless of what the poster actually asked...
Blame it on laziness. The PGO compile takes too long, but in a given position, Crafty's NPS is very stable. I often test like this until I have something that looks promising, then I will PGO the original and the new version so that the comparison is more apples-to-apples, and then I use time over a significant number of positions, although I do watch the node counts as well..
OK, I did not realize that. I thought the extra nodes were a lot faster (e.g. because most of them are IID nodes which all give cache hits on hash probing). So if fact stuffing the PV saves you 3M nodes out of 59M, which is 5%.You are looking at the wrong data. Look at the nodes searched. I don't take the time to do a profile-guided optimization compile when I make test changes like this. So the times are not comparable. But the nodes searched are, and 3M nodes is 1.5 secs on my core-2 duo laptop, using one processor on that particular position.
I still think this is a very surprising result, that I would like to understand. The original reasoning behind my original question seems sound and simple enough:
An alpha-beta tree of depth N with perfect move ordering and branching ratio B has B^(N/2) nodes (give or take a factor 2), while the absolute worst case of reverse move ordering needs B^N nodes (same as the unpruned minimax tree). So at the very very worst (move ordering extremely much worse than random) the cost of the un-informed search of a 7-ply subtree at the end of the PV branch of the previous iteration will be as large as that of a 14-ply perfectly informed search. And compared to the 20-ply search, even the cost of a 14-ply search should be quite negligible.
Even with an effective branching ratio as low as 3, the 6 extra ply (20-14) would make the search 3^6 = 729 times bigger, so that the uninformed 7-ply search at the end of the first 13 ply of the previous iteration's PV could only amount to 0.13% of the search time (node count). And then only if the move ordering in that un-informed 7-ply search was total crap; if it was merely random you woudl find on the average the cut-move after searching half the number of moves, saving you a factor 2 in every all-node (so a factor 8 in a 7-ply search). Thus 0.02% seems a reasonable _upper limit_ of what you can save compared to an un-informed depth-first search with random move ordering. And IID should already harvest most of that saving (and even if you didn't do IID, your move ordering is likely better than random...).
Do you see any obvious logical flaws in the argument above? (Note that I am not asking you to do any programming, just some thinking!) That you observe 5% where <<0.02% is expected does point to a serious lack of understanding of the underlying process. So it might be worthwile to figure out what is causing the discrepancy.
I think the problem is this: At nodes where I have no hash move, particularly on the PV, I am going to have to "hunt" around to find the best move. "good enough" won't cut it on the PV because I have to get real scores, not just prove each move is better or worse. And, based on that test, the effort of "hunting around" hurts when the N-1 iteration was done purely to provide better ordering for the N iteration.
I originally did this in Cray Blitz when I saw deep searches start off with no ordering info, and back then I did not use IID (had never heard of it in the 80's in fact). And it might be the IID overhead when there is no hash move on the PV is hurting when compared to just looking up a hash move... I will try to run some tests to better analyze what is going on.