Indeed, and the Cerebellum book and in particular the automatic generation of it is obviously very nice work, but that is not why it was inserted into this thread.APassionForCriminalJustic wrote:Ronald's point is basically that Brainfish is nothing more than Stockfish development with the Cerebellum book. Brainfish's implementation of NUMA was actually taken from Mohammed Li (asmFish's author).
There is no question that Brainfish is Stockfish with:
- Cerebellum book code by Thomas;
- numa patch by Mohammed (I think itself based on Texel's code);
- Windows LP patch by I don't know who (possibly Thomas himself, but I suspect not).
The numa patch reverts the "per-thread CMH table" patch resulting in a per-node CMH table. On a non-NUMA system this means a single shared CMH table, which the official SF had until recently.
Going from shared to per-thread was known and accepted to lose a few Elo when using multiple threads on a non-NUMA machine (at least at STC, probably a much smaller effect at LTC where the per-thread tables get enough time to fill up). It was accepted because it removed a bottleneck on NUMA machines. It has no effect on single-threaded play or speed.
Everything can be verified by inspecting the source code. If I missed something, then I can easily be corrected by pointing out the relevant source file and line numbers or so.
The LP patch, once it is enabled via an UCI option, obviously does improve speed a lot when using larger hash tables. (And I can only applaud anyone who adds LP support to an engine. Not using this facility where it is available unnecessarily leaves the hardware underused.)
Yes, so far it has not shown itself to be a win on NUMA machines except on Windows systems with more than 64 logical cores (where it overcomes the processor group limitation discussed elsewhere).That NUMA patched failed to show an Elo gain when tested at the Fishtest framework using 32 cores if I am not mistaken.
I have no doubt that scaling can be improved by optimising memory use on NUMA machines, but it is not yet clear by how much. For example, if SF already scales close to perfectly on a two-node machine, then any gains from NUMA modifications on that system will necessarily be very limited.