TalkChess.com

Posted: **Tue Mar 10, 2020 4:43 am**

I was appalled by this recent change to Stockfish:

https://github.com/official-stockfish/S ... eb205c0c07

This "large" array of 64 integers,

Code: Select all

  constexpr int PushToEdges[SQUARE_NB] = {
    100, 90, 80, 70, 70, 80, 90, 100,
     90, 70, 60, 50, 50, 60, 70,  90,
     80, 60, 40, 30, 30, 40, 60,  80,
     70, 50, 30, 20, 20, 30, 50,  70,
     70, 50, 30, 20, 20, 30, 50,  70,
     80, 60, 40, 30, 30, 40, 60,  80,
     90, 70, 60, 50, 50, 60, 70,  90,
    100, 90, 80, 70, 70, 80, 90, 100
  };

is replaced with a function call which must be more than 200 times slower, on any platform.

Code: Select all

  inline int push_to_edge(Square s) {
      int rd = edge_distance(rank_of(s)), fd = edge_distance(file_of(s));
      return 90 - (7 * fd * fd / 2 + 7 * rd * rd / 2);
  }

Posted: **Tue Mar 10, 2020 6:18 am**

Without looking at where it is called, i'm guessing that since it relates to endgame positions, the slow-down isn't noticed as much, but then again, how much memory is actually saved?
if it is a memory/speed issues, one could use char or uchar to really save space at the expense of a slight slow-down, but i 'm guessing any CPU casting code is still going to be faster than the function.

Posted: **Tue Mar 10, 2020 6:47 am**

Please note speed is language-specific. In C/C++ arrays of precalculated values tend to gain speed. In C# they often happen to lose speed (I haven't tried that on piece/square tables, but for example I noticed that occluded fill bitboards outperformed kindergarten bitboards).

Posted: **Tue Mar 10, 2020 7:14 am**

Deberger wrote: ↑Tue Mar 10, 2020 4:43 am I was appalled by this recent change to Stockfish:

https://github.com/official-stockfish/S ... eb205c0c07

This "large" array of 64 integers,
Code: Select all
  constexpr int PushToEdges[SQUARE_NB] = {
    100, 90, 80, 70, 70, 80, 90, 100,
     90, 70, 60, 50, 50, 60, 70,  90,
     80, 60, 40, 30, 30, 40, 60,  80,
     70, 50, 30, 20, 20, 30, 50,  70,
     70, 50, 30, 20, 20, 30, 50,  70,
     80, 60, 40, 30, 30, 40, 60,  80,
     90, 70, 60, 50, 50, 60, 70,  90,
    100, 90, 80, 70, 70, 80, 90, 100
  };
is replaced with a function call which must be more than 200 times slower, on any platform.
Code: Select all
  inline int push_to_edge(Square s) {
      int rd = edge_distance(rank_of(s)), fd = edge_distance(file_of(s));
      return 90 - (7 * fd * fd / 2 + 7 * rd * rd / 2);
  }

did you benchmark it?

I guess it is faster, because a memory load is very slow, and the optimizer will reduce the math.

Posted: **Tue Mar 10, 2020 9:09 am**

If they'd make it an array of bytes instead and force cache aligment, that table would reduce to precisely 64 bytes [1 cache line] (instead of 256 4-byte aligned now).
memory load is only slow if it's not in cache, this is not the case.
Assuming the cryptic unreadable formula produces the same values, I don't see how that might have passed the test.

Posted: **Tue Mar 10, 2020 9:34 am**

It probably passes because it is never called, except when checkmating a bare King. (Which you would win anyway, no matter how large a slowdown you would introduce.)

BTW, the function does not result in the same values as the table. But what it produces has a similar trend.

Also note that there is no slowdown at all by making a table of char instead of int, whether signed or unsigned. Loading a byte into a 32-bit register with zero or sign-extension is just as fast as loading an int into it.

Posted: **Tue Mar 10, 2020 12:30 pm**

Loading a byte into a 32-bit register with zero or sign-extension is just as fast as loading an int into it.

Generally this is true, but not always. I remember having a case on older Opteron CPUs where using uint32_t instead of uint8_t would help, due to some port contention issue (the movzx could only run in a specific port).

Posted: **Tue Mar 10, 2020 12:31 pm**

Deberger wrote: ↑Tue Mar 10, 2020 4:43 am is replaced with a function call which must be more than 200 times slower, on any platform.

Why would it be? It really depends on whether the array is in L1d or not.

Posted: **Tue Mar 10, 2020 1:15 pm**

I saw this patch. And wondered how a simple array that even a dog can understand is worse than complex algorithm.

Indeed in STC it actually performs (slightly) worse: Total: 174724 W: 33325 L: 33404 D: 107995

But it is now more "professional" looking.

Posted: **Tue Mar 10, 2020 1:41 pm**

I've long disagreed with a certain users string of "simplifications".

Is an array of 64 values a bit big and bulky? Yes. But any adult can view that array and identify the trend in the values.

I would keep that array, because its very clear to the reader what is being done. Now, you have to have far more mathematical intuition to see that pattern quickly.

But I'de ask a broader question than "Should this have been commited". I ask, should this have even been tested? Why bother? It it fails it fails, and if it succeeds you gain nothing but obfuscate a complex code base even further.

TalkChess.com

Removing Large Arrays

Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays

Re: Removing Large Arrays