Interesting questions :)lucasart wrote:How to best parralelize your perft, how to best implement a hash replacement strategy for perft, making special program that do perft only in a super tuned way, etc.
This inspired me to make some more tests and it turns out that for my perft program it is better to keep the table for depth=1 small enough to fit in L3 cache. Some results:
Code: Select all
MB d1 KB time(s) perft(8)/time
0 0 317.25 267926000
512 256 29.14 2917220000
512 512 29.02 2928580000
512 1024 28.18 3016070000
512 2048 27.90 3046120000
512 4096 27.16 3129220000
512 8192 29.98 2835000000
512 Inf 40.99 2073700000
Code: Select all
Threads time(s) perft(8)/time
1 27.16 3129220000
2 14.70 5782640000
4 8.39 10134600000
8 8.09 10511900000
Code: Select all
Threads time(s) perft(8)/time
1 38.58 2202960000
2 19.92 4267870000
4 10.80 7873920000
8 8.77 9687600000

