sje wrote:The run uses a 12 GB transposition table with 2^29 entries split evenly between WTM and BTM. Each half of the table is partitioned into 256 regions each guarded by a spinlock. There are eight counting threads running, one for each of the eight hyperthreads supported by the four core Intel i7-2600. Each thread is assigned one of the twenty ply zero moves; the moves are dealt in SAN order from a supervisory thread. When a counting thread completes its draft twelve calculation, it's given another ply zero move to handle (if any remain).
When a store is made into the transposition table, if the record is draft eight or higher then it's also written to the checkpoint file. Upon a restart, the contents of the checkpoint file are read into a fresh transposition table before counting resumes.
The transposition table is four-way associative; up to four entries may be tried with overwrite preference given to the entry having the lowest subtotal count. Unused entries have a subtotal count of zero. Only subtotals of draft two or greater are stored in the table.
I took a short look at PERFT myself recently as i rewrote a move generator mine so used perft to test the move generator, as perft is supposed to work.
Then i stumbled upon some record attempts with perft. I noticed that you can reorganize your hashtables much easier based upon a depthleft hierarchy, as a depthleft of 4 is going to need a cutoff at depthleft 4.
Furthermore i noticed it's possible to speedup the generating proces bigtime last few plies by not using a hashtable there and just use some clever counting tricks. Clever counting tricks aren't catching 100% of all transpositoins, yet they speedup the search bigtime.
Another alternative to that, which you could try first, is using the last few plies a very tiny hashtable that more or less fits in the L2 or L3 cache, as that will have the biggest impact. In the end you just save out a number of cycles there, with the last plies fitting easily in L1 and only being a core local proces.
Spinlocking is never needed and never clever of course.
The key to doing perft faster is a clever hashtable implementation that uses the properties of the perft counting proces.