Page 2 of 2

Re: cache alignment of tt

Posted: Mon Mar 12, 2012 4:54 pm
by bob
rbarreira wrote:
bob wrote: Pre-fetching is an unclear issue.
From what I've heard, it's quite clear that prefetch speeds up many engines. I guess Crafty might not benefit much from it since it doesn't access the tt in Qsearch?
Hard to say. I'd expect prefetch to work in the large majority of the cases, unless it is a speculative pre-fetch. For example, if you do a repetition test before a hash probe, there is a risk in pre-fetching the tt entry because you might not need it. And in the right kinds of positions, it will certainly hurt. Probably not when averaged over many games of course...

However, most prefetch at a sub-optimal place, such as at the top of search. Ideally you would prefetch right after making a move, where you just updated the hash signature. And you could perhaps improve that with a hash signature update done outside of make move, where the hash signature update is responsible for the prefetch, as early as is possible... But now you have changed the actual structure of the program and I have not noticed anyone going that far, although it is possible.

I'll try to run some tests to see if it helps today...

Re: cache alignment of tt

Posted: Tue Mar 13, 2012 1:33 am
by Cardoso
Hi Bob, sorry, to go back to the AlignedMalloc.
As you might remember I posted here a question about your AlignedMalloc not working on Windows 7, x64 using the MS visual C++ 2010.
I made a compile of crafty 23.4 and couldn't allocante more than 1Gb of hash.
I wonder if the bug is on the "(long)" casting in your code.
Shouldn't in x64 Windows be an _int64?

best regards,
Alvaro

Re: cache alignment of tt

Posted: Thu Mar 15, 2012 3:15 pm
by bob
Cardoso wrote:Hi Bob, sorry, to go back to the AlignedMalloc.
As you might remember I posted here a question about your AlignedMalloc not working on Windows 7, x64 using the MS visual C++ 2010.
I made a compile of crafty 23.4 and couldn't allocante more than 1Gb of hash.
I wonder if the bug is on the "(long)" casting in your code.
Shouldn't in x64 Windows be an _int64?

best regards,
Alvaro
I believe you are correct. I think that "long" on 64 bit windows is STILL 32 bits for reasons I can't fathom...

I will fix this in 23.5 since I use the int64_t type anyway... Will break things for 32 bit machines it would appear, but those machines are pretty much going away anyway...

Re: cache alignment of tt

Posted: Thu Mar 15, 2012 5:42 pm
by Rémi Coulom
bob wrote:
Cardoso wrote:Hi Bob, sorry, to go back to the AlignedMalloc.
As you might remember I posted here a question about your AlignedMalloc not working on Windows 7, x64 using the MS visual C++ 2010.
I made a compile of crafty 23.4 and couldn't allocante more than 1Gb of hash.
I wonder if the bug is on the "(long)" casting in your code.
Shouldn't in x64 Windows be an _int64?

best regards,
Alvaro
I believe you are correct. I think that "long" on 64 bit windows is STILL 32 bits for reasons I can't fathom...

I will fix this in 23.5 since I use the int64_t type anyway... Will break things for 32 bit machines it would appear, but those machines are pretty much going away anyway...
I am not sure exactly what this discussion is about, but it seems to me that using size_t would be the proper way.

Rémi

Re: cache alignment of tt

Posted: Thu Mar 15, 2012 6:03 pm
by bob
Rémi Coulom wrote:
bob wrote:
Cardoso wrote:Hi Bob, sorry, to go back to the AlignedMalloc.
As you might remember I posted here a question about your AlignedMalloc not working on Windows 7, x64 using the MS visual C++ 2010.
I made a compile of crafty 23.4 and couldn't allocante more than 1Gb of hash.
I wonder if the bug is on the "(long)" casting in your code.
Shouldn't in x64 Windows be an _int64?

best regards,
Alvaro
I believe you are correct. I think that "long" on 64 bit windows is STILL 32 bits for reasons I can't fathom...

I will fix this in 23.5 since I use the int64_t type anyway... Will break things for 32 bit machines it would appear, but those machines are pretty much going away anyway...
I am not sure exactly what this discussion is about, but it seems to me that using size_t would be the proper way.

Rémi
I took a quick look at stdint.h, and it looks like "uintptr_t" is the correct thing to portably declare a pointer. I saw examples of 32 bit processors with > 32 bit address spaces, and even a 64 bit processor with a < 32 bit address space (Cray 1 series through the T90 in fact).

size_t isn't guaranteed to be as big as a pointer, just big enough to hold the largest sizeof() value that can be returned... There is also a ptrdiff.t that might work as well as it is guaranteed to be able to hold the difference between any two pointers, but it is signed and might cause an issue...

I tried uintptr_t and it worked on my linux box...

Actually after looking at this, I am not sure what I am currently doing is actually correct for all platforms. Here's my "AlignedMalloc()" function:

void AlignedMalloc(void **pointer, int alignment, size_t size) {
segments[nsegments][0] = malloc(size + alignment - 1);
segments[nsegments][1] =
(void *) (((uintptr_t) segments[nsegments][0] + alignment -
1) & ~(alignment - 1));
*pointer = segments[nsegments][1];
nsegments++;
}

You pass it 3 values. A pointer to a pointer where it should store the address of the malloc'ed memory, an alignment value (64 is currently used, and a size (how much memory to malloc).

I am not sure size_t is correct for the third argument after reading a bit. It almost seems that this should be uintptr_t as well, or at least ptrdiff_t. But a negative value makes no sense...

Will have to do some research on this (again)... but for the moment, the above works on a 12 gig 64 bit processor and I tested with hash=8192M with no problems...

Re: cache alignment of tt

Posted: Thu Mar 15, 2012 6:08 pm
by Daniel Shawul
Yes uintptr_t is what I used too.

Re: cache alignment of tt

Posted: Thu Mar 15, 2012 9:12 pm
by diep
jdart wrote:I do align hash tables and other large structures.

I think it helps but you should not expect a large gain. Most times your threads are not accessing overlapping cache lines in the hash table, anyway.

--Jon
It might help at old hardware.

At newer hardware i measure no difference with diep, which aligns itself.