Building PC

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

dragontamer5788
Posts: 201
Joined: Thu Jun 06, 2019 8:05 pm
Full name: Percival Tiglao

Re: Building PC

Post by dragontamer5788 »

Dann Corbit wrote: Mon Aug 19, 2019 10:47 pm
dragontamer5788 wrote: Mon Aug 19, 2019 9:10 pm
Dann Corbit wrote: Mon Aug 19, 2019 8:35 pm In fact, I might run 250 instances of a chess program with 1 thread each (leaving a few threads idle to keep the machine responsive).
The reason is that there is no SMP loss at one thread per engine instance.
I've done thought-experiments with such architectures. My main issue is how to organize the volume of data you get.

1. If you are making an opening book, MCTS seems superior as a methodology to organize all those threads. You'll want the statistics of all threads to coalesce together on the more "interesting" branches of a tree.

2. In the case of Alpha-Beta search (finding the best move for a given position), the other threads would benefit from having alpha-bounds or beta-bounds set. If the threads could somehow "message pass" bounds forward. Ex: Thread #0 discovers that its evaluation is going to be somewhere between <-2, +1>, but hasn't figured out the exact bounds yet. Thread #250 may use that knowledge to set its alpha-beta bounds to <-2, +1>.

In both cases, your analysis benefits from thread communication. Granted, #2 doesn't seem to be super-well researched yet, but #1 is easy to do with MCTS virtual-loss as a coordination scheme.
This leads to an interesting idea.
Have a collection of engines performing independent searches, but still share the same hashtable by using shared memory.
I often analyze related positions, and in such a case there could be a clear benefit.
Hmmm, I was thinking about this more. If analysis were driven with powerful engines, such as Stockfish vs LeelaZero, then maybe you can do something with tons and tons of cores.

Lets say you have position P, with potential moves P1, P2, P3... P35. And position P1 has its children P1.1, P1.2, P1.3... all the way up to P1.35 as well. Your analysis can be:

* Play(P1.1, Stockfish, LeelaZero)
* Play(P1.1, LeelaZero, Stockfish)
* Play(P1.2, Stockfish, LeelaZero)
* Play(P1.2, LeelaZero, Stockfish)
...
* Play(P35.35, Stockfish, LeelaZero)
* Play(P35.35, LeelaZero, Stockfish)

Each position can collect win/loss information. In this case, you don't want the engines to share a transposition table, because it'd affect the "strength" of the engines. You'd want the engines to be as consistent as possible. In this case, you can benefit from many, many cores.

The best move will be analyzed with your typical NegaMax at the end. NegaMax(P) == -max(-NegaMax(P1), -NegaMax(P2), -NegaMax(P3) ... -NegaMax(P35) == -max(max(P1.1, P1.2, P1.3...), max(P2.1, P2.2, P2.3...), ...)

Obviously, this scheme can extend as far out as you desire. Play P1.1.1.1 (4-ply deep) if it aids your analysis, but it would take more and more CPU-power to go deeper and deeper.