Hmmm, I was thinking about this more. If analysis were driven with powerful engines, such as Stockfish vs LeelaZero, then maybe you can do something with tons and tons of cores.Dann Corbit wrote: ↑Mon Aug 19, 2019 10:47 pmThis leads to an interesting idea.dragontamer5788 wrote: ↑Mon Aug 19, 2019 9:10 pmI've done thought-experiments with such architectures. My main issue is how to organize the volume of data you get.Dann Corbit wrote: ↑Mon Aug 19, 2019 8:35 pm In fact, I might run 250 instances of a chess program with 1 thread each (leaving a few threads idle to keep the machine responsive).
The reason is that there is no SMP loss at one thread per engine instance.
1. If you are making an opening book, MCTS seems superior as a methodology to organize all those threads. You'll want the statistics of all threads to coalesce together on the more "interesting" branches of a tree.
2. In the case of Alpha-Beta search (finding the best move for a given position), the other threads would benefit from having alpha-bounds or beta-bounds set. If the threads could somehow "message pass" bounds forward. Ex: Thread #0 discovers that its evaluation is going to be somewhere between <-2, +1>, but hasn't figured out the exact bounds yet. Thread #250 may use that knowledge to set its alpha-beta bounds to <-2, +1>.
In both cases, your analysis benefits from thread communication. Granted, #2 doesn't seem to be super-well researched yet, but #1 is easy to do with MCTS virtual-loss as a coordination scheme.
Have a collection of engines performing independent searches, but still share the same hashtable by using shared memory.
I often analyze related positions, and in such a case there could be a clear benefit.
Lets say you have position P, with potential moves P1, P2, P3... P35. And position P1 has its children P1.1, P1.2, P1.3... all the way up to P1.35 as well. Your analysis can be:
* Play(P1.1, Stockfish, LeelaZero)
* Play(P1.1, LeelaZero, Stockfish)
* Play(P1.2, Stockfish, LeelaZero)
* Play(P1.2, LeelaZero, Stockfish)
...
* Play(P35.35, Stockfish, LeelaZero)
* Play(P35.35, LeelaZero, Stockfish)
Each position can collect win/loss information. In this case, you don't want the engines to share a transposition table, because it'd affect the "strength" of the engines. You'd want the engines to be as consistent as possible. In this case, you can benefit from many, many cores.
The best move will be analyzed with your typical NegaMax at the end. NegaMax(P) == -max(-NegaMax(P1), -NegaMax(P2), -NegaMax(P3) ... -NegaMax(P35) == -max(max(P1.1, P1.2, P1.3...), max(P2.1, P2.2, P2.3...), ...)
Obviously, this scheme can extend as far out as you desire. Play P1.1.1.1 (4-ply deep) if it aids your analysis, but it would take more and more CPU-power to go deeper and deeper.