Removing Large Arrays
Moderators: hgm, Rebel, chrisw
-
- Posts: 4556
- Joined: Tue Jul 03, 2007 4:30 am
Re: Removing Large Arrays
At least they shouldn't call it "simplification", as this doesn't simplify anything under any definition. Looks like change for the sake of change.
-
- Posts: 584
- Joined: Fri Mar 30, 2018 7:20 am
- Full name: Andreas Matthies
Re: Removing Large Arrays
Look at the discussion here: https://github.com/official-stockfish/S ... /pull/2570
Developers seem to agree that equation is faster than memory lookup to a table. But we agree that is not what we would call 'simplification'.
Developers seem to agree that equation is faster than memory lookup to a table. But we agree that is not what we would call 'simplification'.
-
- Posts: 2554
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Removing Large Arrays
no way the formula is faster if the table is in L1
the patch seems to do more than what was posted here, we were misled.
it replaces two tables plus it allegedly fixes some EG bug, so there are functional changes (also tables differ from formulas).
so it's not about performance at all.
the patch seems to do more than what was posted here, we were misled.
it replaces two tables plus it allegedly fixes some EG bug, so there are functional changes (also tables differ from formulas).
so it's not about performance at all.
Martin Sedlak
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Removing Large Arrays
In the past, I've often noticed that C programmers tend to try and make code as brief as possible.AndrewGrant wrote: ↑Tue Mar 10, 2020 1:41 pm I've long disagreed with a certain users string of "simplifications"....
if it succeeds you gain nothing but obfuscate a complex code base even further.
I can understand why you would to this back in the old days of the 90's, 80's or even before. (I'm just old enough to have experienced the tail end of the 80's with regard to computers.) Text screens were only 80 columns wide, and 25 lines high. Code completion wasn't as good as it is now, or didn't even exist. Compilers weren't as smart. Shorter code meant much more code on screen and faster editing. Less variables often meant more speed and less memory usage. With current IDE's and compilers, those things are often moot points.
A tiny piece of the magic bitboard generation/initialization in my engine (under development) looks like this:
Code: Select all
let mask = if is_rook {
create_rook_mask(square)
} else {
create_bishop_mask(square)
};
let bits = mask.count_ones();
let permutations = 2u64.pow(bits);
let end_offset = offset + permutations - 1;
...
Code: Select all
let m = if isr { crrm(s) } else { crbm(s) };
let b = m.count_ones();
let p = 2u64.pow(b);
let e = o + p - 1;
Code: Select all
let m = if isr { crrm(s) } else { crbm(s) };
let e = o + 2u64.pow(m.count_ones()) - 1;
The compiler will do all of the inlines, stripping, and optimizations for me.
Watch this video, where the speaker actually shows how it works. A piece of code containing methods and structs and more stuff to make code readable and manageable for _humans_, is stripped down to "3 + input" by the compiler.
Because of that, I'm all for clarity over brevity, and clarity over cleverness.
So I agree: replacing that table with a function is a bad idea, because I can actually understand and surmise at first glance what that table is doing (especially when I know that this table comes from a chess program), but I can't do the same with the function.
I even disagree with the short syntax Rust uses:
fn: why not function?
impl: why not "implement"?
Vec and vec! (a type, and a macro to fill such a type): why not Vector and vector! ?
Implicit returns when the last line of a function is an expression:
So instead of (some random pseudo-code here):
Code: Select all
fn get_stuff() -> u64 {
let mut number = random::generate();
number += current_time_stamp();
return hash(number);
}
Code: Select all
fn get_stuff() -> u64 {
hash(random::generate() + current_time_stamp())
}
Everyone has to program the way they like it, especially if a project is personal. That's even true if it's an open source project, if one is the originator or even sole maintainer. People can join, or not, at their discretion. But, I personally think that writing code more clearly with less brevity is the better way to go if cooperation and having contribution from others to advance the project faster are the main goals.
-
- Posts: 570
- Joined: Mon Jul 20, 2015 5:06 pm
Re: Removing Large Arrays
Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:
10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
Code: Select all
// gcc -O3 -Wall -m64 test.c -o test -lm -s
// speed test method for different functions
#include <stdio.h>
#include <stdint.h>
#include <time.h>
#define rank_of(sq) ((sq) >> 3)
#define file_of(sq) ((sq) & 0x7)
#define _max(a, b) ((a) >= (b) ? (a) : (b))
#define _min(a, b) ((a) < (b) ? (a) : (b))
const int PushToEdges[64] = {
100, 90, 80, 70, 70, 80, 90, 100,
90, 70, 60, 50, 50, 60, 70, 90,
80, 60, 40, 30, 30, 40, 60, 80,
70, 50, 30, 20, 20, 30, 50, 70,
70, 50, 30, 20, 20, 30, 50, 70,
80, 60, 40, 30, 30, 40, 60, 80,
90, 70, 60, 50, 50, 60, 70, 90,
100, 90, 80, 70, 70, 80, 90, 100
};
enum { RANK_1, RANK_2, RANK_3, RANK_4, RANK_5, RANK_6, RANK_7, RANK_8, N_RANK };
enum { FILE_A, FILE_B, FILE_C, FILE_D, FILE_E, FILE_F, FILE_G, FILE_H, N_FILE };
inline int edge_distance_f(int f) { return _min(f, (FILE_H - f)); }
inline int edge_distance_r(int r) { return _min(r, (RANK_8 - r)); }
inline int push_to_edge(int s) {
int rd = edge_distance_r(rank_of(s)), fd = edge_distance_f(file_of(s));
return 90 - (7 * fd * fd / 2 + 7 * rd * rd / 2);
}
uint64_t time_in_ms() {
struct timespec t;
clock_gettime(CLOCK_MONOTONIC, &t);
return (uint64_t)t.tv_sec * 1000 + t.tv_nsec / 1000000;
}
int main() {
uint64_t c,d,e;
int start;
printf("10 second test ...\n");
start = time_in_ms();
e = 0;
while((time_in_ms() - start) < 10000) {
for (c = 0; c < 64; c++) {
d = PushToEdges[c];
e++;
}
}
printf(" PushToEdges %I64d\n",e);
start = time_in_ms();
e = 0;
while((time_in_ms() - start) < 10000) {
for (c = 0; c < 64; c++) {
d = push_to_edge(c);
e++;
}
}
printf(" push_to_edge %I64d\n",e);
return 0;
}
-
- Posts: 2554
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: Removing Large Arrays
nonsense, the compiler completely eliminates the inner loops, so it boils down to nothing (busy-loop for 10 seconds).D Sceviour wrote: ↑Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:
10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
EDIT: making the counters volatile, I got this:
Code: Select all
10 second test ...
PushToEdges 3889291392
push_to_edge 2689210048
the problem is that the patch probably succeeded due to a bugfix, so the "simplification" made it into the code undeservedly it seems.
Martin Sedlak
-
- Posts: 570
- Joined: Mon Jul 20, 2015 5:06 pm
Re: Removing Large Arrays
It is not nonsense. For one thing, you can understand the busy-loop. I only spent about 20 minutes writing this example, and there are better ways to do it. The timer test could be performed with much less frequency to give the busy-loop more activity. Still, I have been using this type of method for a long time and have no complaints in the methodology. The method does not describe the effects of internal optimization of the caches which could have an effect on results. However, that would be more due to hardware than software.mar wrote: ↑Tue Mar 10, 2020 8:39 pmnonsense, the compiler completely eliminates the inner loops, so it boils down to nothing (busy-loop for 10 seconds).D Sceviour wrote: ↑Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:
10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
EDIT: I see your edit now, and am surpised you get even better results.
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Removing Large Arrays
In a real chess program with many instances fighting for the caches, things may turn out differently than a tiny loop test suggests.D Sceviour wrote: ↑Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:
10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
Even small array lookups will throw other cachelines out of L1, computation may be done concurrently with other instructions improving ipc.
-
- Posts: 570
- Joined: Mon Jul 20, 2015 5:06 pm
Re: Removing Large Arrays
I agree that optimization is a problem for comparison of small arrays and small loops. However, I used the timer type test to compare large syzygy reads with coded endgame table base calculations and demonstrated syzygy was about 30,000 times slower.Gerd Isenberg wrote: ↑Tue Mar 10, 2020 9:16 pmIn a real chess program with many instances fighting for the caches, things may turn out differently than a tiny loop test suggests.D Sceviour wrote: ↑Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:
10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
Even small array lookups will throw other cachelines out of L1, computation may be done concurrently with other instructions improving ipc.
In the posted code, the volatile variable is similar to deleting the -O3 optimization.
-
- Posts: 91
- Joined: Sat Nov 02, 2019 6:42 pm
- Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ
Re: Removing Large Arrays
The optimizer can remove what we are testing because the results are unused.D Sceviour wrote: ↑Tue Mar 10, 2020 8:30 pm Often, I test the speed of different function methods. The results indicate the fixed array PushToEdges[] seems to be faster on average:
10 second test ...
PushToEdges 16519006400
push_to_edge 16610155136
(warning: variable ‘d’ set but not used)
What happens if you propagate the edge values?
For example:
Code: Select all
int main() {
uint64_t c,d,e;
int start;
printf("10 second test ...\n");
start = time_in_ms();
e = 0;
d = 0;
while((time_in_ms() - start) < 10000) {
for (c = 0; c < 64; c++) {
d += PushToEdges[c];
e++;
}
}
printf(" PushToEdges %ld\n",e);
printf(" prevent optimization removal %ld\n",d);
start = time_in_ms();
e = 0;
d = 0;
while((time_in_ms() - start) < 10000) {
for (c = 0; c < 64; c++) {
d += push_to_edge(c);
e++;
}
}
printf(" push_to_edge %ld\n",e);
printf(" prevent optimization removal %ld\n",d);
return 0;
}