I'm trying to investigate why Minic is slow. It is still copy/make but now use Magic BB instead of HQBB (but this change gives no crazy speed improvment). I wonder if the (too) big tables in Minic can be the root cause, causing cache miss.
Using perf, on shirov position until depth 25 on 256Mb TT with usual other table (material, pawn TT, ...) I get this
Code: Select all
23 176,20 msec task-clock # 0,996 CPUs utilized
1 332 context-switches # 0,057 K/sec
2 cpu-migrations # 0,000 K/sec
46 433 page-faults # 0,002 M/sec
102 002 645 618 cycles # 4,401 GHz
192 062 839 497 instructions # 1,88 insn per cycle
23 903 716 812 branches # 1031,391 M/sec
705 826 801 branch-misses # 2,95% of all branches
23,259313445 seconds time elapsed
Code: Select all
23 371,43 msec task-clock # 0,974 CPUs utilized
102 900 155 607 cycles # 4,403 GHz
192 077 762 835 instructions # 1,87 insn per cycle
1 738 195 101 cache-references # 74,373 M/sec
589 395 636 cache-misses # 33,908 % of all cache refs
23,986881200 seconds time elapsed
23,332399000 seconds user
0,035920000 seconds sys
But for example, igel is no better (here on start position to depth 18), with 39% misses
Code: Select all
3 300,62 msec task-clock # 0,294 CPUs utilized
14 551 909 458 cycles # 4,409 GHz
24 034 823 891 instructions # 1,65 insn per cycle
187 602 212 cache-references # 56,839 M/sec
74 768 541 cache-misses # 39,855 % of all cache refs
11,216000712 seconds time elapsed
3,256560000 seconds user
0,043845000 seconds sys
In fact Minic has less cache misses and less branch prediction misses than stockfish ...
Perft 6 of start pos in 4.7sec, so move gen + copy make at 25Mnps .... probably not the issue
Pure eval (during texel tuning) at 2.7Mnps, this is probably a bit slow maybe
Standard search on same hardware is at 1.9Mnps
I don't get why SMP Minic is so slow at TCEC ...only 70Mnps on 176 threads, where many others are around 120Mnps