To reduce the risk for cache thrashing, I would like to merge all stacks into one. In particular, the move stack should go into the stack frame of local variables. Problem is that the number of moves per ply is variable, with a huge worst case. And leaving large unused holes in the combined stack, would defeat the purpose, as it would quickly make this stack 'wrap around' the cache, and collide with itself.
So I was thinking about allocating space on the system stack through an assembler routine that adjusts the stack pointer, and returns a pointer to the area thus allocated. I can then allocate ample space, pass the pointer to my move generator, (which will put its own locals on top of the expanded stack), and let it fill the area with all moves it generates. After that, I call the assembly routine again, to deallocate the unused part of the move array. The recursive call will then put its locals directly on top of the moves, without any hole. Before the final return, I deallocate the used part of the move array.
Problem is that I have to know how much space gcc wants to use near the top of the stack, to know which (say) 100 bytes to use for the move list if I expand the stack by 100 bytes. If I look how my version of gcc handles the stack frame, there seems an inconsistency. I compiled the following routine (gcc -O2):
Code: Select all
int test(int a, int b)
{
int m[10];
m[a] += m[b];
printf("hello\n");
return m[b];
}
Code: Select all
.file "test.c"
.section .rdata,"dr"
LC0:
.ascii "hello\0"
.text
.p2align 4,,15
.globl _test
.def _test; .scl 2; .type 32; .endef
_test:
pushl %ebp
movl %esp, %ebp
pushl %ebx
subl $68, %esp
movl 12(%ebp), %ebx
movl $LC0, (%esp)
movl 8(%ebp), %edx
movl -56(%ebp,%ebx,4), %eax
addl %eax, -56(%ebp,%edx,4)
call _puts
movl -56(%ebp,%ebx,4), %eax
addl $68, %esp
popl %ebx
popl %ebp
ret
.def _puts; .scl 3; .type 32; .endef
0(%ebp): the saved old frame pointer (that was just pushed).
4(%ebp): the return address
8(%ebp): argument a
12(%ebp): argument b
The rest is then used, after expanding the stack, as:
-4(%ebp): the saved register %ebp (which is used as a temporary)
-56(%ebp): the array m[]. As this is 40 bytes, the first location behind it is -16(%ebp).
now -16(%ebp), -12(%ebp) and -8(%ebp) are unused. In more complicated routines, though, it seems these are used for saving other registers that are used as scratch registers in the routine: %edi, %esi, and %ecx. (%edx and %eax are never saved, even if they are used as scratch registers, as they will contain the returned value). A bit sloppy that the optimizer does not squeeze out this unused space, but at least it is understandable.
Now the puzzle is that in allocating the stack frame, 68 is subtracted from %esp. Together with the pushed %ebp, this makes the new top of the stack the same as -72(%ebp). This top is used to pass the argument LC0 for the call to printf (which apparently is optimized to puts, as there are no variables printed). but from -68(%ebp) to -56(%ebp), there seems another unused area.
I have no idea what purpose that area serves. If I squeeze in the move list, should I leave that mystery area of 12 bytes before it (closer to the top of the stack) or after it (adjacent to the other locals)? What is this area for? Is it a bug in gcc that it wastes this space? Or is it reserved for returning values that wouldn't fit a register?