Compiler Optimization Question
Moderators: hgm, Rebel, chrisw
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Compiler Optimization Question
A really large function has many local variables and accesses many global variables. Will modern compilers be able to optimize register usage effectively or is it better to break up the large function into a few subfunctions with fewer variables so the compiler can better optimize register usage?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Compiler Optimization Question
I would not worry about trying to out-think the compiler.
Modern compilers can even eliminate tail recursion.
The first rule of optimization is "Don't do it."
The second rule, for experts only, is "Don't do it yet."
Of course it is kind of a gag, but the point is never to waste time optimizing until you have run the code through the profiler.
You might make a routine 100 times faster and have no measurable impact on the performance of the program.
The most important thing for writing fast, correct code is to choose good algorithms and debug them carefully.
Modern compilers can even eliminate tail recursion.
The first rule of optimization is "Don't do it."
The second rule, for experts only, is "Don't do it yet."
Of course it is kind of a gag, but the point is never to waste time optimizing until you have run the code through the profiler.
You might make a routine 100 times faster and have no measurable impact on the performance of the program.
The most important thing for writing fast, correct code is to choose good algorithms and debug them carefully.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Compiler Optimization Question
That is good advice Dann! However, my question is not about me optimizing the code but helping the compiler to optimize better which is almost the exact same but not quite. Let me rephrase, can the compiler optimize better one really huge function or several smaller functions? Or does it not matter?Dann Corbit wrote: ↑Mon Apr 13, 2020 1:45 am I would not worry about trying to out-think the compiler.
Modern compilers can even eliminate tail recursion.
The first rule of optimization is "Don't do it."
The second rule, for experts only, is "Don't do it yet."
Of course it is kind of a gag, but the point is never to waste time optimizing until you have run the code through the profiler.
You might make a routine 100 times faster and have no measurable impact on the performance of the program.
The most important thing for writing fast, correct code is to choose good algorithms and debug them carefully.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 4367
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Compiler Optimization Question
It is not likely to matter much. Usually you have a few performance-critical parts in any program. Optimizing anything else doesn't change the overall performance very much.Michael Sherwin wrote: ↑Mon Apr 13, 2020 3:09 am Let me rephrase, can the compiler optimize better one really huge function or several smaller functions? Or does it not matter?
The one place function size matters is in relation to inlining. Compilers have rules that determine whether a function can be inlined or not. Generally small ones are inlined by the optimizer, larger ones may not be (at least with gcc, though, this is adjustable via options).
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Compiler Optimization Question
I don't think you can get a good answer on talkchess. There are as many such questions as there are examples of such code. Just measure it, and answer your own question.Michael Sherwin wrote: ↑Mon Apr 13, 2020 1:11 am A really large function has many local variables and accesses many global variables. Will modern compilers be able to optimize register usage effectively or is it better to break up the large function into a few subfunctions with fewer variables so the compiler can better optimize register usage?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Compiler Optimization Question
Thanks guys. I found this online.
WRITE CLEAR, SIMPLE CODE
Some of the very things that make code clear and readable to humans also make it clear and readable to compilers. Complicated expressions are harder to optimize and can cause the compiler to "fallback" to a less intense mode of optimization. I've seen one compiler that has translation-unit limits which when overrun will cause the entire module to be de-optimized by one level.
Part of the clarity is making hunks of code into functions when appropriate. The cost of a function call is extremely small on modern machines, so optimization is NOT a valid excuse for writing ten-page functions.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 199
- Joined: Sun Nov 03, 2013 9:32 am
Re: Compiler Optimization Question
regardless of any optimising issues, its just good practice to brake things into small, easy readable/debug functions.
One function = one job. if it spans more than a screen, break it up into small logical parts. they are easier to prove correct and then
your mind can abstract them away.
One function = one job. if it spans more than a screen, break it up into small logical parts. they are easier to prove correct and then
your mind can abstract them away.
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Compiler Optimization Question
And maybe easier for the compiler to do the same?lauriet wrote: ↑Mon Apr 13, 2020 3:31 am regardless of any optimising issues, its just good practice to brake things into small, easy readable/debug functions.
One function = one job. if it spans more than a screen, break it up into small logical parts. they are easier to prove correct and then
your mind can abstract them away.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: Compiler Optimization Question
Here is the function I'm concerned about. It is almost complete, just have to add the castling code. It generates all moves one at a time and calls Qsearch(). Only legal moves are stored in the move list. The variable "t" is the thread index and it uses arrays rather than a structure.
Code: Select all
void GenAllMovesQ(s32 t, s32 alpha, s32 beta, s32 depth) {
s32 pce, typ, target, adjtyp, score, i;
u64 bb, pieces, wPieces, bPieces, aPieces, empty, notme;
u32 fs, ts, sq;
i = first[t][ply[t]];
pieces = piece[t][wtm[t]];
wPieces = piece[t][WHITE];
bPieces = piece[t][BLACK];
aPieces = wPieces | bPieces;
empty = 0xffffffffffffffff ^ aPieces;
notme = 0xffffffffffffffff ^ pieces;
do {
_BitScanForward64(&fs, pieces);
pce = board[t][fs];
switch (pce) {
case OO: break;
case WP:
typ = fs >> 3;
switch (typ) {
case RANK1: break;
case RANK2:
_BitScanForward64(&sq, wPawnMoves[fs] & aPieces);
bb = (wPawnMoves[fs] & below[sq]) | (wPawnCapts[fs] & bPieces);
typ = WDOU;
break;
case RANK3:
bb = (wPawnMoves[fs] & empty) | (wPawnCapts[fs] & bPieces);
typ = WSGL;
break;
case RANK4:
bb = (wPawnMoves[fs] & empty) | (wPawnCapts[fs] & bPieces);
typ = WSGL;
break;
case RANK5:
bb = (wPawnMoves[fs] & empty) | (wPawnCapts[fs] & (bPieces | epbit[t][ply[t]]));
typ = WCEP;
break;
case RANK6:
bb = (wPawnMoves[fs] & empty) | (wPawnCapts[fs] & bPieces);
typ = WSGL;
break;
case RANK7:
bb = (wPawnMoves[fs] & empty) | (wPawnCapts[fs] & bPieces);
typ = WPRO;
break;
}
break;
case WN:
bb = knightMoves[fs] & notme;
typ = WNMV;
break;
case WB:
bb = qss[fs][(aPieces >> (fs & 56)) & 127][0]
& qss[fs][(aPieces >> 8) & 255][1]
& qss[fs][(aPieces >> 16) & 255][2]
& qss[fs][(aPieces >> 24) & 255][3]
& qss[fs][(aPieces >> 32) & 255][4]
& qss[fs][(aPieces >> 40) & 255][5]
& qss[fs][(aPieces >> 48) & 255][6]
& bob[fs]
& notme;
typ = WBMV;
break;
case WR:
bb = qss[fs][(aPieces >> (fs & 56)) & 127][0]
& qss[fs][(aPieces >> 8) & 255][1]
& qss[fs][(aPieces >> 16) & 255][2]
& qss[fs][(aPieces >> 24) & 255][3]
& qss[fs][(aPieces >> 32) & 255][4]
& qss[fs][(aPieces >> 40) & 255][5]
& qss[fs][(aPieces >> 48) & 255][6]
& rob[fs]
& notme;
typ = WRMV;
break;
case WQ:
bb = qss[fs][(aPieces >> (fs & 56)) & 127][0]
& qss[fs][(aPieces >> 8) & 255][1]
& qss[fs][(aPieces >> 16) & 255][2]
& qss[fs][(aPieces >> 24) & 255][3]
& qss[fs][(aPieces >> 32) & 255][4]
& qss[fs][(aPieces >> 40) & 255][5]
& qss[fs][(aPieces >> 48) & 255][6]
& notme;
typ = WQMV;
break;
case WK:
bb = kingMoves[fs] & notme;
typ = WKMV;
break;
case BP:
typ = fs >> 3;
switch (typ) {
case RANK1: break;
case RANK2:
bb = (bPawnMoves[fs] & empty) | (bPawnCapts[fs] & wPieces);
typ = BPRO;
break;
case RANK3:
bb = (bPawnMoves[fs] & empty) | (bPawnCapts[fs] & wPieces);
typ = BSGL;
break;
case RANK4:
bb = (bPawnMoves[fs] & empty) | (bPawnCapts[fs] & (wPieces | epbit[t][ply[t]]));
typ = BCEP;
break;
case RANK5:
bb = (bPawnMoves[fs] & empty) | (bPawnCapts[fs] & wPieces);
BSGL;
break;
case RANK6:
bb = (bPawnMoves[fs] & empty) | (bPawnCapts[fs] & wPieces);
typ = BSGL;
break;
case RANK7:
_BitScanForward64(&sq, bPawnMoves[fs] & aPieces);
bb = (bPawnMoves[fs] & above[sq]) | (bPawnCapts[fs] & wPieces);
typ = BPRO;
break;
}
break;
case BN:
bb = knightMoves[fs] & notme;
typ = BNMV;
break;
case BB:
bb = qss[fs][(aPieces >> (fs & 56)) & 127][0]
& qss[fs][(aPieces >> 8) & 255][1]
& qss[fs][(aPieces >> 16) & 255][2]
& qss[fs][(aPieces >> 24) & 255][3]
& qss[fs][(aPieces >> 32) & 255][4]
& qss[fs][(aPieces >> 40) & 255][5]
& qss[fs][(aPieces >> 48) & 255][6]
& bob[fs]
& notme;
typ = BBMV;
break;
case BR:
bb = qss[fs][(aPieces >> (fs & 56)) & 127][0]
& qss[fs][(aPieces >> 8) & 255][1]
& qss[fs][(aPieces >> 16) & 255][2]
& qss[fs][(aPieces >> 24) & 255][3]
& qss[fs][(aPieces >> 32) & 255][4]
& qss[fs][(aPieces >> 40) & 255][5]
& qss[fs][(aPieces >> 48) & 255][6]
& rob[fs]
& notme;
typ = BRMV;
break;
case BQ:
bb = qss[fs][(aPieces >> (fs & 56)) & 127][0]
& qss[fs][(aPieces >> 8) & 255][1]
& qss[fs][(aPieces >> 16) & 255][2]
& qss[fs][(aPieces >> 24) & 255][3]
& qss[fs][(aPieces >> 32) & 255][4]
& qss[fs][(aPieces >> 40) & 255][5]
& qss[fs][(aPieces >> 48) & 255][6]
& notme;
typ = BQMV;
break;
case BK:
bb = kingMoves[fs] & notme;
typ = BKMV;
break;
}
while (bb) {
_BitScanForward64(&ts, bb);
board[t][fs] = OO;
piece[t][wtm[t]] ^= one << fs;
piece[t][wtm[t]] ^= one << ts;
adjtyp = adjTyp[typ];
switch (adjtyp) {
case WMOV:
target = board[t][ts];
mat[t] += pceval[target];
piece[t][1 - wtm[t]] ^= shftv[target] << ts;
board[t][ts] = pce;
break;
case WDUO:
board[t][ts] = pce;
epbit[t][ply[t] + 1] = (u64)(ts - fs == 16) << (fs + 8); // method from H.G.Muller
break;
case WEPC:
sq = ts - ((u64)(epbit[t][ply[t]] == one << ts) << 3); // method from H.G.Muller
target = board[t][sq];
mat[t] += pceval[target];
piece[t][1 - wtm[t]] ^= shftv[target] << sq;
board[t][ts] = pce;
break;
case WPRM:
target = board[t][ts];
mat[t] += pceval[target];
piece[t][wtm[t]] ^= shftv[target] << ts;
board[t][ts] = WQ;
mat[t] += 800;
break;
case BMOV:
target = board[t][ts];
mat[t] += pceval[target];
piece[t][1 - wtm[t]] ^= shftv[target] << ts;
board[t][ts] = pce;
break;
case BDUO:
board[t][ts] = pce;
epbit[t][ply[t] + 1] = (u64)(fs - ts == 16) << (fs - 8); // method from H.G.Muller
break;
case BEPC:
sq = ts + ((u64)(epbit[t][ply[t]] == one << ts) << 3); // method from H.G.Muller
target = board[t][sq];
mat[t] += pceval[target];
piece[t][1 - wtm[t]] ^= shftv[target] << sq;
board[t][ts] = pce;
break;
case BPRM:
target = board[t][ts];
mat[t] += pceval[target];
piece[t][wtm[t]] ^= shftv[target] << ts;
board[t][ts] = WQ;
mat[t] -= 800;
break;
}
wtm[t] = 1 - wtm[t];
ply[t]++;
score = Qsearch(t, alpha, beta);
ply[t]--;
wtm[t] = 1 - wtm[t];
mat[t] -= pceval[target];
board[t][fs] = OO;
piece[t][wtm[t]] ^= one << fs;
piece[t][wtm[t]] ^= one << ts;
switch (adjtyp) {
case WMOV:
board[t][ts] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << ts;
break;
case WDUO:
board[t][ts] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << ts;
epbit[t][ply[t] + 1] = 0;
break;
case WEPC:
board[t][sq] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << sq;
break;
case WPRM:
board[t][ts] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << ts;
mat[t] -= 800;
break;
case BMOV:
board[t][ts] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << ts;
break;
case BDUO:
board[t][ts] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << ts;
epbit[t][ply[t] + 1] = 0;
break;
case BEPC:
board[t][sq] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << sq;
break;
case BPRM:
board[t][ts] = target;
piece[t][1 - wtm[t]] ^= (u64)shftv[target] << ts;
mat[t] += 800;
break;
}
}
if (score != -INFINITY && (score > alpha || depth > 0)) {
frsq[t][i] = fs;
tosq[t][i] = ts;
trgt[t][i] = target;
type[t][i] = typ;
scor[t][i] = score;
i++;
}
pieces ^= (one << fs);
} while (pieces);
// castling code goes here
}
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Compiler Optimization Question
Here's the simple answer from a compiler guy. Small functions are just fine. If they SHOULD be inlined, the compiler will do it during the compilation process. If it won't help, it won't. Not inlining can improve cache efficiency if the function is executed infrequently. Better to have the natural instruction stream sequential rather than broken up into blocks with interspaced dead code.
With today's optimizers, you can do pretty well to just write readable code you can understand and work on. Let the compiler do its thing.
For example
Might be compiled as is. Or if the condition is mostly false, the compiler might turn that into this:
I used to have to write code like the latter in the early FORTRAN days, the optimizers were not very good. Today the compilers can do this so that your code looks readable, and then gets reorganized when converting from source to object code, leaving the source easy to read.
Yes, there are places you might do better. In particular if you have information the compiler doesn't have. IE you know that ALL the cases in your switch statement are covered explicitly. The compiler still has to check for the rest and skip all the way to the closing "}". You don't. That can eliminate instructions.
With today's optimizers, you can do pretty well to just write readable code you can understand and work on. Let the compiler do its thing.
For example
Code: Select all
if (condition) {
do some stuff
}
Code: Select all
if (! condition)
go to xx);
getback:
Somewhere else in memory:
xx:
do some stuff
go to getback;
Yes, there are places you might do better. In particular if you have information the compiler doesn't have. IE you know that ALL the cases in your switch statement are covered explicitly. The compiler still has to check for the rest and skip all the way to the closing "}". You don't. That can eliminate instructions.