I'm trying to investigate why Minic is slow. It is still copy/make but now use Magic BB instead of HQBB (but this change gives no crazy speed improvment). I wonder if the (too) big tables in Minic can be the root cause, causing cache miss.
Using perf, on shirov position until depth 25 on 256Mb TT with usual other table (material, pawn TT, ...) I get this
Code: Select all
         23 176,20 msec task-clock                #    0,996 CPUs utilized
             1 332      context-switches          #    0,057 K/sec
                 2      cpu-migrations            #    0,000 K/sec
            46 433      page-faults               #    0,002 M/sec
   102 002 645 618      cycles                    #    4,401 GHz
   192 062 839 497      instructions              #    1,88  insn per cycle
    23 903 716 812      branches                  # 1031,391 M/sec
       705 826 801      branch-misses             #    2,95% of all branches
      23,259313445 seconds time elapsed
Code: Select all
         23 371,43 msec task-clock                #    0,974 CPUs utilized
   102 900 155 607      cycles                    #    4,403 GHz
   192 077 762 835      instructions              #    1,87  insn per cycle
     1 738 195 101      cache-references          #   74,373 M/sec
       589 395 636      cache-misses              #   33,908 % of all cache refs
      23,986881200 seconds time elapsed
      23,332399000 seconds user
       0,035920000 seconds sys
But for example, igel is no better (here on start position to depth 18), with 39% misses
Code: Select all
          3 300,62 msec task-clock                #    0,294 CPUs utilized
    14 551 909 458      cycles                    #    4,409 GHz
    24 034 823 891      instructions              #    1,65  insn per cycle
       187 602 212      cache-references          #   56,839 M/sec
        74 768 541      cache-misses              #   39,855 % of all cache refs
      11,216000712 seconds time elapsed
       3,256560000 seconds user
       0,043845000 seconds sys
       In fact Minic has less cache misses and less branch prediction misses than stockfish ...
Perft 6 of start pos in 4.7sec, so move gen + copy make at 25Mnps .... probably not the issue
Pure eval (during texel tuning) at 2.7Mnps, this is probably a bit slow maybe
Standard search on same hardware is at 1.9Mnps
I don't get why SMP Minic is so slow at TCEC ...only 70Mnps on 176 threads, where many others are around 120Mnps
