Good luck with your new engine. Looks like you've done a lot of work on the subject.Edsel Apostol wrote: ↑Sat Sep 01, 2018 6:57 am So here are the test results from my implementation of ABDADA and Lazy SMP from the new engine and the modified YBWC from Hannibal.
I'm using the following set of positions (from Hannibal games and from some positions posted here in CCC):The positions are chosen based on their complexity where the engine would need at least a minute or two to complete the depth 20 iteration on single core.Code: Select all
"r3k2r/pbpnqp2/1p1ppn1p/6p1/2PP4/2PBPNB1/P4PPP/R2Q1RK1 w kq - 2 12", "2kr3r/pbpn1pq1/1p3n2/3p1R2/3P3p/2P2Q2/P1BN2PP/R3B2K w - - 4 22", "r2n1rk1/1pq2ppp/p2pbn2/8/P3Pp2/2PBB2P/2PNQ1P1/1R3RK1 w - - 0 17", "1r2r2k/1p4qp/p3bp2/4p2R/n3P3/2PB4/2PB1QPK/1R6 w - - 1 32", "1b3r1k/rb1q3p/pp2pppP/3n1n2/1P2N3/P2B1NPQ/1B3P2/2R1R1K1 b - - 1 32", "1r1r1qk1/pn1p2p1/1pp1npBp/8/2PB2QP/4R1P1/P4PK1/3R4 w - - 0 1", "3rr1k1/1b2nnpp/1p1q1p2/pP1p1P2/P1pP2P1/2N1P1QP/3N1RB1/2R3K1 w - - 0 1", "rn3rq1/p5k1/2p2bp1/1p4p1/8/2P1B1PQ/5PK1/3R3R w - - 0 1", "1r3rk1/3bb1pp/1qn1p3/3pP3/3P1N2/2Q2N2/2P3PP/R1BR3K w - - 0 1", "rn1q1rk1/2pbb3/pn2p3/1p1pPpp1/3P4/1PNBBN2/P1P1Q1PP/R4R1K w - - 0 1"
Results are from a dual Intel Xeon E5-2698V3 https://ark.intel.com/products/81060/In ... e-2_30-GHz
The machine is 32 cores with 64 threads.
Hannibal modified YBWC searched with hash size of 512MB and fixed depth of 25 for thread values, 1,2,4,8,16,32,64. Values are summed and divided by the number of positions and then divided by the result of the single thread search. This doesn't take into account the turbo boost from the single core run, so the result is probably a bit lower than the correct value. Every start of the test ucinewgame is being issued, so the search is being started from scratch as hashes are cleared. The result for the Threads: 1 has average time spent in seconds and nodes in kNPS. The succeeding values for higher thread counts are multipliers for nodes, and the inverse for time.As can be seen Hannibal NPS scaling is only good up to 16 cores. Maybe due to the machine being dual CPUs the engine struggled with NUMA due to Hannibal internal design of handling split points and repetition detection. Hannibal was only tested in an 8 core machine before this.Code: Select all
Threads: 1 time: 74.990600 nodes: 937.200000 Threads: 2 time: 2.149378 nodes: 1.942089 Threads: 4 time: 3.346066 nodes: 3.753011 Threads: 8 time: 5.226519 nodes: 6.973204 Threads: 16 time: 6.626765 nodes: 12.046264 Threads: 32 time: 8.139148 nodes: 16.006453 Threads: 64 time: 4.880303 nodes: 13.368858
This is the result for the new chess engine with the modified ABDADA as implemented by Tom Kerrigan. http://www.tckerrigan.com/Chess/Paralle ... ed_ABDADA/
Searched with 512MB hash and fixed depth of 20.NPS scaling is not perfect due probably to turbo boost in single core and the signalling to the threads to quit current iteration upon one thread completing that iteration. This is to synchronize and focus the effort into the next iteration. This is done without waiting for any threads.Code: Select all
Threads: 1 time: 108.299200 nodes: 3127.100000 Threads: 2 time: 2.168862 nodes: 1.787663 Threads: 4 time: 3.603405 nodes: 3.492227 Threads: 8 time: 6.197661 nodes: 6.876950 Threads: 16 time: 8.440115 nodes: 13.613154 Threads: 32 time: 10.724592 nodes: 26.575884 Threads: 64 time: 11.759754 nodes: 31.082229
This is the result for the LazySMP:LazySMP is implemented with 50% of the threads searching depth and another 50% on depth+1. As can be seen there is some kind of super linear speedup in Threads 2 and 4. This is probably why Lazy is so strong in 4 cores which is currently the standard with the rating lists.Code: Select all
Threads: 1 time: 109.826200 nodes: 3084.100000 Threads: 2 time: 3.847632 nodes: 1.844580 Threads: 4 time: 6.209981 nodes: 3.616316 Threads: 8 time: 6.069562 nodes: 7.212488 Threads: 16 time: 7.001232 nodes: 14.255216 Threads: 32 time: 13.694476 nodes: 27.848897 Threads: 64 time: 10.592256 nodes: 31.428827
Invictus chess engine source code can be found here:
ABDADA https://github.com/ed-apostol/InvictusChess
LazySMP https://github.com/ed-apostol/InvictusC ... ee/LazySMP
In your experience, does ABDADA gain any elo ? (not time to depth, I'm not interested in that metric)
For me, ABDADA is a regression compared to standard lazy SMP, including Tom K's form of it. So messing up the coding base to lose elo, no thanks...
What worked best for me is:
* work balancing: 1/2 workers at depth d, 1/2 workers at depth d+1.
* stop useless iterations: when (any) worker completes a given depth d, signal all other workers running depth <= d to stop immediately, and report back to base, where they can be assigned useful work (ie. >= d+1).