I've written a C language non recursive perft() routine for the perft(14) project. It can be easily added to an existing chess program which already has routines for Generate, Execute, and Retract.
The routine is called EMP() for "Enumerate move paths".
This was written for the OpenCL environment which does not support recursive routines or pointers to routines.
Much can be added. It would be possible to implement a checkpoint/restart capability which would save/restore the routine's internal state. Also, instead of doing perft(), a non recursive Search() routine could be implemented in the same style.
Testing the above non recursive perft() shows a very modest speed-up vs the recursive version. Apparently the total overhead needed for the state switch loop in the non recursive variant is slightly less than that caused by nested procedure invocation.
The recursive version is certainly easier to write and is attractive in part because it's the more "Lisp-ish" approach. But the non recursive version forces a bit more discipline on the coder and is easier to debug. (It can be stepped, one state change at a time; also, there are no objects hidden on the call stack because there is no call stack.)
As long as the objects internal to the non recursive code can be externalized in a portable format, then it would be possible to implement a checkpoint/restart capability. This could include a means for interrupting the routine on one machine and then restarting it on another.
I've produced a version of the above to handle bulk counting at the penultimate ply -- easy to do. I've done anther version which incorporates transposition table assistance, and that was a bit trickier because of the need to have per ply partial totals instead of just a single summation variable. The partial totals are the values stashed/fetched from a transposition table.
Writing a non recursive search() instead of a perft() needs rather more effort, but the overall scheme remains the same.
First, there are more input parameters, and each of these has to be internalized on an explicit, ply-indexed stack. The same is true for any output results.
Second, there are more ply-indexed variables, each temporally local to a given ply. The best idea is to bundle all the ply indexed objects into a single struct/object and have a single stack of these.
Third, the object values passed as inputs to a new (deeper) ply may not be simple scalars but could have complex representation and variable sizes. The same is true of results passed back to an earlier (shallower) ply.
Fourth, there is the bootstrap problem with the initialization code for setting up the ply zero values, and this code is probably going to be different from the code which does the same initialization at all other ply. On the other end, when the routine finally finises, there's a need for some special code for recovering the final results.
Fifth, a search routine likely has multiple points where it needs to invoke itself at deeper ply, each with their own return state. This adds two states for each invocation point: one for the ply increment code, and one for the ply decrement code. The obvious use: a null move search made prior to the full scan move search.
The sample perft() code has a slightly annoying limitation in that it has a fixed, compile-time limit on the depth (preprocessor constant PlyLen). This will either waste space if set too high or it will cause a crash if set too low.
Better is to use a dynamically allocated, variable length structure for storing ply indexed objects. For the C++ coder, there are templates for such in the C++ STL. My preference is to use my own two-way linked list template. Standard or custom, the effect is the same: a very slight overhead vs a fixed length vector, but no fixed limit on ply depth.
In Symbolic, there is a PIR class (Ply Indexed Record; holds all objects local to a given ply), and the non recursive search routine has both a ply variable and a pirptr variable which points to the current PIR object. Each time ply is changed, so is the value of pirptr. At any time, the next shallow ply PIR object is at pirptr->GetPrev() while the next deeper PIR record is at pirptr->GetNext().
Just before each ply increment, the PIR list is checked to see if there's already a PIR record in place; if not, one is created and appended. When the search routine exits, the C++ destructor for the two-way list cleans up everything.
Here's the C++ code from Symbolic which implements a non recursive perft() including bulk counting, transposition table assistance, and an linked list of dynamically allocated ply indexed records.
vittyvirus wrote:This might be helping, as I just skimmed through the code:
Use ++var instead of var++
Why do you think this would matter?
++x; is no better or worse than x++; At least that was the only way I saw ++ used. The net result, no matter what, is x = x + 1 in either case.
In the Old Days, some hardware like the pdp11 series had pre-decrement and post-increment addressing modes, but did not have post-decrement and pre-increment modes. So, it was common to see:
People often prefer ++var to var++ in C++, because if var is a complex iterator type, var++ requires making a copy of the old value of the iterator, while ++var doesn't.
Of course, the situation here is completely different, and the advice is ridiculous.