phhnguyen wrote: ↑Fri Jan 04, 2019 12:41 amFor mine I/O is not an issue since it is designed to work in memory only: everything it needs will be loaded into memory ...
That also holds for FairyGen. Yet it takes 2-5 min to generate a Pawnless (i.e. 8-fold symmetric) 5-men EGT. Marcel van Kervinck's "Pretty Fast" generator (which is bit-based, rather than byte-based) is a lot faster: a totally unsymmetric version of that needs about 110 msec for generating a 4-men EGT like KBNK or KQKR (IIRC). That is more than an order of magnitude faster. So speeds can vary a lot. Even between algorithms that in themselves appear well-optimized.
Extrapolating from the 110 msec would give ~6 sec for a 5-men, and about 6 min for a 6-men... Even for FairyGen a 6-men Pawnless EGT should only take 2-5 hours. (But I never tried that, as the memory requirement would exceed what you can address in a 32-bit environment, and I did not have a 64-bit compiler at that time, and only 4GB DRAM in my laptop anyway.)
Sure it is not really fast since it always misses all caches due to access randomly huge memory buffers (say, over 60 GB) and with multi threads (say over 10 threads).
The number of cache misses can be strongly reduced by treating moves in a smart order. (E.g. through the leap-frog algorithm.)
On the other hand, to process an index (typically an item of 1 byte data) in huge buffers, my generator needs to convert that index into a chess position, then generate moves for that position, for each move, do make into a new position, do incheck, then convert the new position into new index, get score, take back...
That sounds very bad. My generators do almost none of that. And doing nothing is always far faster than doing something, even on a GPU. I never convert indices to positions or vice versa, and I never set up entire positions. E.g. the work involved in setting up positions only consists of moving a single piece once every 64 positions. (There is no need to place the black King, as it cannot block any moves without being in check, and the next group of 64 positions in 63 out of 64 cases has only a single other piece in a different location.)
that is a lot calculating and I am sure it (calculating) takes much more time than reading/writing from/to (even huge) buffers. Multi threads are really helpful for my generator (proved via practising).
Well, if you would require several hundred times as much calculation per position as is really needed, I would have no difficulty believing that. So the key question is: how fast is your generator (on a single-threaded CPU)? If that would be orders of magnitude slower than optimum, trying to cure it with faster hardware IMO would not be the best strategy. I would first take the best conceivable algorithm, and then address the hardware requirements of that.
Note that for Xiangqi efficient creation of EGT is pretty much an unsolved problem. The rules for perpetual checking and chasing are basically incompatible with the idea of retrograde generation. So although it is possible to do some end-games (namely those where one side only has defenders) in the conventional way, the more interesting ones (such as KRPKR, KRCKR) would need a completely different treatment, and I am not sure whether it is actually public knowledge how exactly to do that, and how much the solution hurts speedwise compared to not having to deal with perpetuals.