Removing Large Arrays

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Removing Large Arrays

Post by Ovyron »

At least they shouldn't call it "simplification", as this doesn't simplify anything under any definition. Looks like change for the sake of change.
RubiChess
Posts: 584
Joined: Fri Mar 30, 2018 7:20 am
Full name: Andreas Matthies

Re: Removing Large Arrays

Post by RubiChess »

Look at the discussion here: https://github.com/official-stockfish/S ... /pull/2570
Developers seem to agree that equation is faster than memory lookup to a table. But we agree that is not what we would call 'simplification'.
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Removing Large Arrays

Post by mar »

no way the formula is faster if the table is in L1
the patch seems to do more than what was posted here, we were misled.
it replaces two tables plus it allegedly fixes some EG bug, so there are functional changes (also tables differ from formulas).
so it's not about performance at all.
Martin Sedlak
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Removing Large Arrays

Post by mvanthoor »

AndrewGrant wrote: Tue Mar 10, 2020 1:41 pm I've long disagreed with a certain users string of "simplifications"....

if it succeeds you gain nothing but obfuscate a complex code base even further.
In the past, I've often noticed that C programmers tend to try and make code as brief as possible.

I can understand why you would to this back in the old days of the 90's, 80's or even before. (I'm just old enough to have experienced the tail end of the 80's with regard to computers.) Text screens were only 80 columns wide, and 25 lines high. Code completion wasn't as good as it is now, or didn't even exist. Compilers weren't as smart. Shorter code meant much more code on screen and faster editing. Less variables often meant more speed and less memory usage. With current IDE's and compilers, those things are often moot points.

A tiny piece of the magic bitboard generation/initialization in my engine (under development) looks like this:

Code: Select all

        let mask = if is_rook {
        	create_rook_mask(square)
        } else {
            create_bishop_mask(square)
        };
        let bits = mask.count_ones();
        let permutations = 2u64.pow(bits);
        let end_offset = offset + permutations - 1;
        ...
I could also have written it like this (if it's short enough, Rust will auto-format the if-statement as a ternary operator):

Code: Select all

let m = if isr { crrm(s) } else { crbm(s) };
let b = m.count_ones();
let p = 2u64.pow(b);
let e = o + p - 1;
or even shorter...

Code: Select all

let m = if isr { crrm(s) } else { crbm(s) };
let e = o + 2u64.pow(m.count_ones()) - 1;
There is no reason to write my code in the second way, let alone the third. Nobody (not even myself) will ever understand version 2 or 3 of this code, as opposed to version 1, where it's completely clear that I'm calculating the end of a range that starts at offset and ends at end_offset, and this range has something to do with all possible permutations of the rook mask for a certain square. (Further down in the code it'll be clear that each permutation of the mask is a blocker board, that I'm generating the attack board, and that the index of the attack board is going to fall from offset up to and including end_offset within the attack table.)

The compiler will do all of the inlines, stripping, and optimizations for me.



Watch this video, where the speaker actually shows how it works. A piece of code containing methods and structs and more stuff to make code readable and manageable for _humans_, is stripped down to "3 + input" by the compiler.

Because of that, I'm all for clarity over brevity, and clarity over cleverness.

So I agree: replacing that table with a function is a bad idea, because I can actually understand and surmise at first glance what that table is doing (especially when I know that this table comes from a chess program), but I can't do the same with the function.

I even disagree with the short syntax Rust uses:

fn: why not function?
impl: why not "implement"?
Vec and vec! (a type, and a macro to fill such a type): why not Vector and vector! ?
Implicit returns when the last line of a function is an expression:

So instead of (some random pseudo-code here):

Code: Select all

fn get_stuff() -> u64 {
	let mut number = random::generate();
	number += current_time_stamp();
	return hash(number);
}
The linter will complain that you're returning a variable that you didn't use for anything else but temporary storage, and it will suggest to remove it, and the return as well, because it's unneeded:

Code: Select all

fn get_stuff() -> u64 {
	hash(random::generate() + current_time_stamp())
}
And don't you dare put a ";" at the end of that line, because that would make it a statement, instead of a returnable expression, and you'd get the error that your function isn't returning a u64. IMHO, things like that are brevity for brevity's sake, and it creates cryptic code.

Everyone has to program the way they like it, especially if a project is personal. That's even true if it's an open source project, if one is the originator or even sole maintainer. People can join, or not, at their discretion. But, I personally think that writing code more clearly with less brevity is the better way to go if cooperation and having contribution from others to advance the project faster are the main goals.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
D Sceviour
Posts: 570
Joined: Mon Jul 20, 2015 5:06 pm

Re: Removing Large Arrays

Post by D Sceviour »

Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:

10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136

Code: Select all

// gcc -O3 -Wall -m64 test.c -o test -lm -s
// speed test method for different functions

#include <stdio.h>
#include <stdint.h>
#include <time.h>

#define rank_of(sq)                  ((sq) >> 3)
#define file_of(sq)                  ((sq) & 0x7)
#define _max(a, b)                   ((a) >= (b) ? (a) : (b))
#define _min(a, b)                   ((a) < (b) ? (a) : (b))

const int PushToEdges[64] = {
    100, 90, 80, 70, 70, 80, 90, 100,
     90, 70, 60, 50, 50, 60, 70,  90,
     80, 60, 40, 30, 30, 40, 60,  80,
     70, 50, 30, 20, 20, 30, 50,  70,
     70, 50, 30, 20, 20, 30, 50,  70,
     80, 60, 40, 30, 30, 40, 60,  80,
     90, 70, 60, 50, 50, 60, 70,  90,
    100, 90, 80, 70, 70, 80, 90, 100
  };

enum { RANK_1, RANK_2, RANK_3, RANK_4, RANK_5, RANK_6, RANK_7, RANK_8, N_RANK };
enum { FILE_A, FILE_B, FILE_C, FILE_D, FILE_E, FILE_F, FILE_G, FILE_H, N_FILE };

inline int edge_distance_f(int f) { return _min(f, (FILE_H - f)); } 
inline int edge_distance_r(int r) { return _min(r, (RANK_8 - r)); }

inline int push_to_edge(int s) {
   int rd = edge_distance_r(rank_of(s)), fd = edge_distance_f(file_of(s));
   return 90 - (7 * fd * fd / 2 + 7 * rd * rd / 2);
}

uint64_t time_in_ms() {
struct timespec t;
  clock_gettime(CLOCK_MONOTONIC, &t);
  return (uint64_t)t.tv_sec * 1000 + t.tv_nsec / 1000000;
}

int main() {
uint64_t c,d,e;
int start;

  printf("10 second test ...\n");
  start = time_in_ms();
  e = 0;

  while((time_in_ms() - start) < 10000) {
       for (c = 0; c < 64; c++) {
          d = PushToEdges[c];
          e++;
       }
  }
  printf(" PushToEdges  %I64d\n",e);
  start = time_in_ms();
  e = 0;

  while((time_in_ms() - start) < 10000) {
       for (c = 0; c < 64; c++) {
          d = push_to_edge(c);
          e++;
       }
  }
  printf(" push_to_edge %I64d\n",e);
  return 0;
}
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Removing Large Arrays

Post by mar »

D Sceviour wrote: Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:

10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
nonsense, the compiler completely eliminates the inner loops, so it boils down to nothing (busy-loop for 10 seconds).

EDIT: making the counters volatile, I got this:

Code: Select all

10 second test ...
 PushToEdges  3889291392
 push_to_edge 2689210048
since you count loops, more is better so table-based version (if in cache) is indeed significantly faster
the problem is that the patch probably succeeded due to a bugfix, so the "simplification" made it into the code undeservedly it seems.
Martin Sedlak
D Sceviour
Posts: 570
Joined: Mon Jul 20, 2015 5:06 pm

Re: Removing Large Arrays

Post by D Sceviour »

mar wrote: Tue Mar 10, 2020 8:39 pm
D Sceviour wrote: Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:

10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
nonsense, the compiler completely eliminates the inner loops, so it boils down to nothing (busy-loop for 10 seconds).
It is not nonsense. For one thing, you can understand the busy-loop. I only spent about 20 minutes writing this example, and there are better ways to do it. The timer test could be performed with much less frequency to give the busy-loop more activity. Still, I have been using this type of method for a long time and have no complaints in the methodology. The method does not describe the effects of internal optimization of the caches which could have an effect on results. However, that would be more due to hardware than software.

EDIT: I see your edit now, and am surpised you get even better results.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Removing Large Arrays

Post by Gerd Isenberg »

D Sceviour wrote: Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:

10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
In a real chess program with many instances fighting for the caches, things may turn out differently than a tiny loop test suggests.
Even small array lookups will throw other cachelines out of L1, computation may be done concurrently with other instructions improving ipc.
D Sceviour
Posts: 570
Joined: Mon Jul 20, 2015 5:06 pm

Re: Removing Large Arrays

Post by D Sceviour »

Gerd Isenberg wrote: Tue Mar 10, 2020 9:16 pm
D Sceviour wrote: Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:

10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
In a real chess program with many instances fighting for the caches, things may turn out differently than a tiny loop test suggests.
Even small array lookups will throw other cachelines out of L1, computation may be done concurrently with other instructions improving ipc.
I agree that optimization is a problem for comparison of small arrays and small loops. However, I used the timer type test to compare large syzygy reads with coded endgame table base calculations and demonstrated syzygy was about 30,000 times slower.

In the posted code, the volatile variable is similar to deleting the -O3 optimization.
User avatar
Deberger
Posts: 91
Joined: Sat Nov 02, 2019 6:42 pm
Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ

Re: Removing Large Arrays

Post by Deberger »

D Sceviour wrote: Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:

10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
The optimizer can remove what we are testing because the results are unused.

(warning: variable ‘d’ set but not used)

What happens if you propagate the edge values?

For example:

Code: Select all

int main() {
uint64_t c,d,e;
int start;

  printf("10 second test ...\n");
  start = time_in_ms();
  e = 0;
  d = 0;

  while((time_in_ms() - start) < 10000) {
       for (c = 0; c < 64; c++) {
          d += PushToEdges[c];
          e++;
       }
  }
  printf(" PushToEdges  %ld\n",e);
  printf(" prevent optimization removal %ld\n",d);
  start = time_in_ms();
  e = 0;
  d = 0;

  while((time_in_ms() - start) < 10000) {
       for (c = 0; c < 64; c++) {
          d += push_to_edge(c);
          e++;
       }
  }
  printf(" push_to_edge %ld\n",e);
  printf(" prevent optimization removal %ld\n",d);
  return 0;
}