Teemu Pudas wrote:Code: Select all
#if !defined(ZCT_WINDOWS) && defined(SMP)
GLOBALS *zct;
#else
THREAD_LOCAL GLOBALS zct[1];
#endif
Code: Select all
#if !defined(ZCT_WINDOWS) && defined(SMP)
zct = (GLOBALS *)shared_alloc(sizeof(GLOBALS));
#endif
I investigated, and now I'm confused. It _looks_ like the Windows version never initializes zct->hash_table for anything but the main thread. MSVC's debugger claims the exact opposite (NULL for main, okay for worker). In either case, it should instantly crash and burn.
Wow, that's actually a really old bug. I am almost certain that I had fixed that before. In any case, that bug is your fault, since you did the thread local changes.
"zct" should of course be global, not local. That's the reason the Unix version is a pointer--it has to be allocated in shared memory.
MSDN to the rescue:
You must use the thread attribute for the declaration and the definition of a thread local object, whether the declaration and definition occur in the same file or separate files.
Oops.
The THREAD_LOCAL should be removed. But should zct->killer_move[][] really be shared? The processors will be constantly fighting over those cache lines...
That's already changed.
After the thread on cache coherency earlier I looked through ZCT and found that all the move ordering was global, and even worse, all the statistics (fail_highs, hash_probes, etc.) were too. I changed the move ordering stuff to be inside BOARD (rather than making more thread local stuff, only needs to be one thing), and I commented out the statistics. NPS scaling improved significantly.
OTOH I had it hang on me earlier when doing benchmarks, but it was in heavily optimized code, and it was in an ssh session, so I didn't get enough time to walk through the disassembly...
I will make 2501 once I get around to fixing that, or at least adding back in the statistics inside BOARD.