Hi.
In the last version of Andscacs I added a little hash of evaluations. It helps quite to win speed because the function of evaluation does a lot of things.
I tried to do it with dynamic allocation (calloc) but it’s quite slower than a fixed length array.
For the moment I keep a fixed array but small, 4MB, just to not disturb anything in the final user system.
So the questions are, if it’s normal that exists such a speed difference and it can be arranged in any way, and if it cannot be arranged, which you consider it’s the maximum size I can use for this array without considering that the engine is using unfair advantage over the other engines, or disturbing the system.
I understand that the best way is to do it dynamical, and allow the user/gui to configure the size.
As additional data, if I increase it to 8MB, the general speed increases easily 2% or more.
As a complementary question, as all the engines use memory for a lot of static variables and arrays, is there any non writed rule of what is considered maximum MB of this static memory?
Thanks!
Hash speed and memory usage
Moderator: Ras
-
cdani
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Hash speed and memory usage
Daniel José -
http://www.andscacs.com
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hash speed and memory usage
I am not sure I understand what you mean by dynamic. If you create such a hash table by a static declaration (global variable in the form of a large array) or if you use malloc() to grab the same amount of memory, you should see exactly zero difference in performance. If you mean allocating a new ENTRY each time you store something, that would be horribly inefficient.cdani wrote:Hi.
In the last version of Andscacs I added a little hash of evaluations. It helps quite to win speed because the function of evaluation does a lot of things.
I tried to do it with dynamic allocation (calloc) but it’s quite slower than a fixed length array.
For the moment I keep a fixed array but small, 4MB, just to not disturb anything in the final user system.
So the questions are, if it’s normal that exists such a speed difference and it can be arranged in any way, and if it cannot be arranged, which you consider it’s the maximum size I can use for this array without considering that the engine is using unfair advantage over the other engines, or disturbing the system.
I understand that the best way is to do it dynamical, and allow the user/gui to configure the size.
As additional data, if I increase it to 8MB, the general speed increases easily 2% or more.
As a complementary question, as all the engines use memory for a lot of static variables and arrays, is there any non writed rule of what is considered maximum MB of this static memory?
Thanks!
-
cdani
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Hash speed and memory usage
I will explain with an example.bob wrote: I am not sure I understand what you mean by dynamic. If you create such a hash table by a static declaration (global variable in the form of a large array) or if you use malloc() to grab the same amount of memory, you should see exactly zero difference in performance. If you mean allocating a new ENTRY each time you store something, that would be horribly inefficient.
Code: Select all
int _tmain(int argc, _TCHAR* argv[])
{
char * var_dynamic;
int var_static[10];
char var_index, var_resul;
var_dynamic = (char *)malloc(10);
var_index = 5;
var_resul = var_dynamic[var_index];
var_resul = var_static[var_index];
return 0;
}
Code: Select all
var_resul = var_dynamic[var_index];
009D13F3 movsx eax,byte ptr [ebp-45h]
009D13F7 mov ecx,dword ptr [ebp-0Ch]
009D13FA mov dl,byte ptr [ecx+eax]
009D13FD mov byte ptr [ebp-51h],dl
var_resul = var_static[var_index];
009D1400 movsx eax,byte ptr [ebp-45h]
009D1404 mov cl,byte ptr [ebp+eax*4-3Ch]
009D1408 mov byte ptr [ebp-51h],cl
So it must be slower, and is noticeable slower in my tests, unless there is a workaround that I was not able to find.
Thanks!
Daniel José -
http://www.andscacs.com
-
Gerd Isenberg
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Hash speed and memory usage
There might be copy-on-write issues with calloc. Have you tried malloc and memset?
http://blogs.fau.de/hager/archives/825
Otherwise, beside of "chaotic" changes due to compiler optimization and code-alignment issues etc., there should be no big difference except loading the pointer which takes a register on x86-32, which is possibly nil on x86-64.
On Win32/64 memory sizes:
http://www.viva64.com/en/k/0036/
http://blogs.fau.de/hager/archives/825
Otherwise, beside of "chaotic" changes due to compiler optimization and code-alignment issues etc., there should be no big difference except loading the pointer which takes a register on x86-32, which is possibly nil on x86-64.
On Win32/64 memory sizes:
http://www.viva64.com/en/k/0036/
-
Antonio Torrecillas
- Posts: 92
- Joined: Sun Nov 02, 2008 4:43 pm
- Location: Madrid
- Full name: Antonio Torrecillas
Re: Hash speed and memory usage
note that your var_static is allocated in the stack, a pointer exist (the stack pointer ) that indicate the memory region your are looking for. Your var_dinamic doesn't benefit from this and need to be loaded first in a register. I'm not an assembler expert but this is probably the extra instruction that you've noticed. Defining var_static outside the main function must make this two approach similar.
have a look at http://www.agner.org/optimize/optimizing_cpp.pdf
there is so much things to learn
have a look at http://www.agner.org/optimize/optimizing_cpp.pdf
there is so much things to learn
-
cdani
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Hash speed and memory usage
Yes. The overhead defining inside or outside the function is the same; always need an instruction more to load the pointer, so it's slower. In Andscacs it's defined outside, so it's not in the stack.Antonio Torrecillas wrote:note that your var_static is allocated in the stack, a pointer exist (the stack pointer ) that indicate the memory region your are looking for. Your var_dinamic doesn't benefit from this and need to be loaded first in a register. I'm not an assembler expert but this is probably the extra instruction that you've noticed. Defining var_static outside the main function must make this two approach similar.
Did not know this! It's very interesting. So I tested:Gerd Isenberg wrote:There might be copy-on-write issues with calloc. Have you tried malloc and memset?
In milliseconds, depth 20 from start position, average of 3 measures for each case:
static 35064
malloc 35875
calloc 36085
Slightly better with malloc, but nowhere near static array.
Many times it's not only having to access the memory to load the pointer, but also using a register that will not be available for other functions, as you know.
I think this cannot be fixed easily, because will be necessary to change the assembly instructions dynamically on runtime to point to the allocated memory region.
Daniel José -
http://www.andscacs.com
-
cdani
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Hash speed and memory usage
Just to highlight more this performance issue, even with 32 MB of hash and malloc I don't get the same performance gain as with 4MB! of static array.
It must be similar with the transposition table hash, I think.
But may be it's another thing that I'm not able to see...
It must be similar with the transposition table hash, I think.
But may be it's another thing that I'm not able to see...
Daniel José -
http://www.andscacs.com
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hash speed and memory usage
I don't see how 1 instruction can make it "noticeably slower". You only do this once per node at most. It takes a LOT of instructions to handle a single node. move generation, ordering, search recursion, ttable probe, repeat check, legality checks, evaluation.cdani wrote:I will explain with an example.bob wrote: I am not sure I understand what you mean by dynamic. If you create such a hash table by a static declaration (global variable in the form of a large array) or if you use malloc() to grab the same amount of memory, you should see exactly zero difference in performance. If you mean allocating a new ENTRY each time you store something, that would be horribly inefficient.
This code when in assembly translates to something like:Code: Select all
int _tmain(int argc, _TCHAR* argv[]) { char * var_dynamic; int var_static[10]; char var_index, var_resul; var_dynamic = (char *)malloc(10); var_index = 5; var_resul = var_dynamic[var_index]; var_resul = var_static[var_index]; return 0; }
There is one line plus of assembly code when using memory allocated by malloc, because it's necessary to load the address of the pointer. This should be true regardless of the compiler.Code: Select all
var_resul = var_dynamic[var_index]; 009D13F3 movsx eax,byte ptr [ebp-45h] 009D13F7 mov ecx,dword ptr [ebp-0Ch] 009D13FA mov dl,byte ptr [ecx+eax] 009D13FD mov byte ptr [ebp-51h],dl var_resul = var_static[var_index]; 009D1400 movsx eax,byte ptr [ebp-45h] 009D1404 mov cl,byte ptr [ebp+eax*4-3Ch] 009D1408 mov byte ptr [ebp-51h],cl
So it must be slower, and is noticeable slower in my tests, unless there is a workaround that I was not able to find.
Thanks!
In fact, you would make it faster if you got rid of the char nonsense. Char subscripts are bad from the get-go.
You are sort of apple and oranging things here too. That local variable is allocated on the stack, so that EBP always points close to it. Declare it globally outside the procedure... for the dynamic variable, malloc() it outside the procedure. Then compare to see how it looks.
Looks the same to me. I'd get rid of char subscripts to get rid of the mogsx stuff.
-
cdani
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Hash speed and memory usage
Sorry if I did not explained it. I don't use an array of chars. This was just an example I wrote to show what the compiler does. But the result in assembly it's exactly the same as if you see what the compiler does when Stockfish accesses his TT.bob wrote: In fact, you would make it faster if you got rid of the char nonsense. Char subscripts are bad from the get-go.
What you suggest is exactly what my program does when I use malloc, and it's slower as you can see from the post I put before.
May be it's another thing, and not this added instruction.
Daniel José -
http://www.andscacs.com
-
cdani
- Posts: 2204
- Joined: Sat Jan 18, 2014 10:24 am
- Location: Andorra
Re: Hash speed and memory usage
I have found more or less a solution! I suppose it's related on how the cpu handles the memory.
It was enough to manually setting the hash memory to 0 with a for loop after doing the same with memset, and now it goes quite faster. Following the example I put before, now requires 35250 milliseconds. It's slower than with a static array, but a lot better.
I tried in different ways, and the faster was this dirty one, at least on my system:
If I do with only the second for loop, it's slower. And if I add a third for loop, also it's slower.
With static array this trick it's not necessary, so clearly the system handles different the two types of memory.
But all those things sure will be different from system to system, so I suppose the best approach is to use static arrays for hash that will be little in size, and malloc hash for the bigger ones.
At last, may be this will be useful to someone, at least to avoid crashing his head
And now we return to one of the starting questions, about the size of static memory usage that you consider fair.
Thanks for your patience!
It was enough to manually setting the hash memory to 0 with a for loop after doing the same with memset, and now it goes quite faster. Following the example I put before, now requires 35250 milliseconds. It's slower than with a static array, but a lot better.
I tried in different ways, and the faster was this dirty one, at least on my system:
Code: Select all
memset(thishash, 0, numofbytes);
char *j;
int i;
j=(char*)uintptr_t(thishash);
for (i=0;i<numofbytes;i++)
*j=1;
for (i=0;i<numofbytes;i++)
*j=0;
With static array this trick it's not necessary, so clearly the system handles different the two types of memory.
But all those things sure will be different from system to system, so I suppose the best approach is to use static arrays for hash that will be little in size, and malloc hash for the bigger ones.
At last, may be this will be useful to someone, at least to avoid crashing his head
And now we return to one of the starting questions, about the size of static memory usage that you consider fair.
Thanks for your patience!
Daniel José -
http://www.andscacs.com