Spinlocks galore

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Spinlocks galore

Post by sje »

A spinlock is preferred over a traditional mutex for situations where the contention rate is low and the lock time is short. These are relative terms of course, so some experimentation is needed to determine actual deployment strategy.

I've been working with using spinlocks to mediate access to a transposition table that holds partial movepath enumeration results. My tests have shown that there is a marked improvement when moving from a "lock the whole table" technique to a "lock a small table region" technique. Specifically, the program allocates a vector of 256 spinlocks and each spinlock is assigned to its own 1/256th region of the transposition table. Contention is low and locking an uncontested spinlock is very fast.

Why the magic number 256 for the table division? It's a power of two (handy for region splitting) and increasing the split count to 512 doesn't show any noticeable improvement.

Next: Using the same spinlock table division idea for the other transposition tables.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Spinlocks galore

Post by sje »

I have implemented the 256-way spinlock scheme for the main transposition table and have measured the timing results.

The slowdown due to locking is just under one percent. This might be improved by various hacks, but it looks pretty good at this point. This testing was done with Mac OS/X 10.6; performance under Linux could be better or worse depending on the underlying spinlock implementation.

Next: implementing the scheme for the evaluation transposition table and the pawn structure transposition table.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Spinlocks galore

Post by sje »

The remaining transposition tables (evaluation, pawn structure, deep evaluation, tablebase) have been converted to the 256-way spinlock scheme and all appear to be working correctly. There is a very slight slowdown, but in return there is a guarantee of absolute thread safety done without assembly language or memory architecture assumptions.

In Symbolic, the deep evaluation code is a no-frills, limited depth search that provides a position score and it's used in place of internal iterative deepening for purposes of move ordering at certain nodes.

The tablebase subsystem actually has two sets of spinlocks. The first set guards the tablebase score transposition table in the same way that the other tables are guarded. The second set of locks corresponds to each possible tablebase file (280 in the current implementation) and guards against simultaneous stream operations on any given file.