Code: Select all
1.320 1.833 1.817
Code: Select all
1.295 1.749 1.792
Moderator: Ras
Code: Select all
1.320 1.833 1.817
Code: Select all
1.295 1.749 1.792
That had me flabbergasted first time I tried a two pass search. I was confused why two pass search was significantly slower than single pass search, and i was looking all the wrong places, too many TT calls? etc. In the end the culprit was wrong application of LMR, which is obvious in hindsight. You have to make sure to reset the legal_moves counter to 0 , and save the location of where you start testing for LMR reduction (usually non-capture phase). Since the moves are already generated in the second pass, you start reducing when legal_moves > non_cap_start. I guess simply generating the moves again may save some trouble for a little cost. It took me a day to solve all the nuisances to make sure every move gets reduced by the same amount when searched with single pass vs two pass scheme. I saved single pass reduction value when skipping the move, and compared it with current reduction amount during second pass and printed any mis-matches. That is how i caught all problems due to LMR.Michel wrote:All this seems pretty encouraging.
I have one more question. How do you deal with LMR and LMP?
I now control these with a move counter. However this will be wrong when a thread revisits moves (I would like to make the revisiting transparent by sending the delayed moves back to the move picker object).
Of course it is possible to fix this by letting the move picker keep track of the move count but I wonder if one should really worry about it.
It is almost too good to be true. Now with 4 threads and I got a speed up of 3x. The search times are getting shorter even though i used fixed depth of 20 so the result is not reliable. But it is indicative of what can be achieved. There are some rare crashes that i haven't fixed yet, and probably wouldn't have happened if i used processes. Also I synchronize the threads after every iteration of the iterative deepening but there is no need for that with processes.All this seems pretty encouraging.
Code: Select all
1.253 1.697 1.909
Code: Select all
1.607 3.010 3.424
Another paper mentions a speed up of 70 for 128 threads, if I remember correctly.I think that for 8 or more threads performance may start to degrade because the way I see it all processors tend to work on the same ply. We usually have a limit <= 4 processors at a split point etc for a YBW search , but I don't know how to apply that for ABDADA.
Well nproc gives the number of threads searching a node. I don't see how you can avoid this since it controls the effect of the exclusive access flag.just found out why I can't apply the limit on number of processors at a ply. The original algorithm stores 'nproc' in TT which I avoided since I didn't see any use for it.
Code: Select all
first pass:
for all moves (other than the first one) do
MakeMove
search(depth-1,exlusive=1)
If CRAP is set in child node
return to parent
Unmake move and store move for second pass
else
set CRAP flag in child
search child node
store result and reset CRAP
return to parent
second_pass:
search all stored moves with search(depth-1,exclusive=0); children
now now ignore the CRAP flag.