Lets say you have a Q6700 system running at 2.66GHz with 8GB ram. What should Hashtables be set at for the following Time Controls:
3 miniute game
15 miniute game
30 miniute game
120 miniute game
Lastly would these values change if you Over Clock?
What is the formula to calulate hashtbles for Quad Core Systems?
Best Hashtable Settings
Moderator: Ras
-
- Posts: 6363
- Joined: Mon Mar 13, 2006 2:34 pm
- Location: Acworth, GA
Best Hashtable Settings
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
__________________________________________________________________
Ted Summers
-
- Posts: 2094
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Best Hashtable Settings
Ted,
Your query requires knowledge of the engine that you will use.
While giving an engine all that you can for a TC works, some notice
degraded performance with small TCs with largre amounts of memory
used for hashtables.
Here is an equation that works. Notice it doesn't have any term
for number of processors. That is factored into the speed term.
The issue is that various engines have different sizes for a hash
entry. Crafty may have one size while Rybka may have another
and so on.
Here you have to guess unless the author has divulged that info.
What is a reasonable guess? Telepath uses a hash entry size of
81 bytes but it uses a 3 way hash. Some engines use a 1 way or a
two way hash. Three way hash meaning each hash entry holds
three positions. Thus, Telepath uses 27 bytes per position.
The idea is this: if an engine at a given TC will search an average
of 10 million positions, then set the hash size so that it can hold that
number of positions or slightly larger.
Yes, over clocking changes the result because it increases the
equation's speed term.
Your query requires knowledge of the engine that you will use.
While giving an engine all that you can for a TC works, some notice
degraded performance with small TCs with largre amounts of memory
used for hashtables.
Here is an equation that works. Notice it doesn't have any term
for number of processors. That is factored into the speed term.
Code: Select all
Speed = number of nodes per second that the engine goes through
in a middlegame position.
Time = Average amount of time spent on a position for a given TC.
NP = Number of positions examined per move = Speed x Time.
HS = HashTable size = NP x Size of a hash entry x 1.2
entry. Crafty may have one size while Rybka may have another
and so on.
Here you have to guess unless the author has divulged that info.
What is a reasonable guess? Telepath uses a hash entry size of
81 bytes but it uses a 3 way hash. Some engines use a 1 way or a
two way hash. Three way hash meaning each hash entry holds
three positions. Thus, Telepath uses 27 bytes per position.
The idea is this: if an engine at a given TC will search an average
of 10 million positions, then set the hash size so that it can hold that
number of positions or slightly larger.
Yes, over clocking changes the result because it increases the
equation's speed term.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Best Hashtable Settings
The simple formula for programs that hash in the q-search is as follows:AdminX wrote:Lets say you have a Q6700 system running at 2.66GHz with 8GB ram. What should Hashtables be set at for the following Time Controls:
3 miniute game
15 miniute game
30 miniute game
120 miniute game
Lastly would these values change if you Over Clock?
What is the formula to calulate hashtbles for Quad Core Systems?
hash size = entry_size * NPS *time per move in seconds.
In a 120 minute game, a program will average at least 180 seconds per move and usually closer to 300 thanks to early book moves and correct pondering predictions. So 300 seconds, times (say) 2M nodes per second is a tree of some 600M positions and you need a hash table that big. For a program that doesn't hash in q-search. (Crafty, Junior, who knows who else) you can divide that by 10. That is a hash table big enough to hold everything. And going bigger only helps in collision cases where two signatures lead to the same table address.
In the case of crafty, a hash entry is 16 bytes, which is probably typical. So 600M positions becomes 7.2 Gigabytes of hash. In my case, on the 8-core box I use, NPS goes up to about 20M, but then I don't hash in q-search so divide by 10, and 8 gigs would be about right for crafty on that particular box.
This formula holds true for any program. the down-side is big hash tables will thrash the TLB and increase memory latency. Smaller hash tables will avoid that but will result in excessive overwriting and loss of data that would speed up the search.
I'd suggest for any program, that you pick a set of typical positions, and search them to the target time you plan on using, and run the test several times increasing the hash size by 2x each time. Pick the hash size that gives the fastest time to reach a particular depth. And don't test with SMP search. Test with one processor, then adjust the final answer by the ratio of SMP NPS / serial NPS...
-
- Posts: 1480
- Joined: Thu Mar 09, 2006 5:33 am
Re: Best Hashtable Settings
This is interesting. I guess the TLB and possible effects on it, from very large hash, are different for different x86 CPUs? Does the impact of the effect depend on the memory speed (clock rate), too?bob wrote: the down-side is big hash tables will thrash the TLB and increase memory latency.
Is there a typical large hash size where that effect is starting to happen, significantly? Or in reverse, up to which size is it safe that it doesn't happen?
(Unfortunately I understand nothing from cpu architecture.)
Regards, Mike
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Best Hashtable Settings
Different processors have different numbers of TLB entries. I believe I have seen one that goes up to 2048 entries, where AMD opterons are typically 1024. My older PIV xeon claims to have something like 66 TLB entries.Mike S. wrote:This is interesting. I guess the TLB and possible effects on it, from very large hash, are different for different x86 CPUs? Does the impact of the effect depend on the memory speed (clock rate), too?bob wrote: the down-side is big hash tables will thrash the TLB and increase memory latency.
Is there a typical large hash size where that effect is starting to happen, significantly? Or in reverse, up to which size is it safe that it doesn't happen?
(Unfortunately I understand nothing from cpu architecture.)
The effect also depends on how the page tables work. On AMD, A TLB miss basically will multiply memory latency by a factor of 5. As it takes 4 memory accesses to convert a virtual page number to a real page number if the TLB doesn't have the info. That is 4 extra memory accesses so you can do your one access. On 32 bit systems, the factor is about 3x, since there are just two levels of page tables (typically, could be one if your system will use huge memory pages).
That's why I suggest running the same set of positions multiple times, trying successively larger and larger hash sizes to see where the time to a specific depth starts to decay. Pick the largest hash size that does not start to slow things down and you are doing as good as you can do for that search time limit and hardware configuration.
-
- Posts: 1480
- Joined: Thu Mar 09, 2006 5:33 am
Re: Best Hashtable Settings
Thanks for the info. - Again we find: Maximum is not optimum. (I think similar about how many tablebases to use in the search.)
Regards, Mike
-
- Posts: 6363
- Joined: Mon Mar 13, 2006 2:34 pm
- Location: Acworth, GA
Re: Best Hashtable Settings
Thanks for all of the info Bob. So alot depends on the engine you use more than the hardware. Not that hardware is not a factor.bob wrote:The simple formula for programs that hash in the q-search is as follows:AdminX wrote:Lets say you have a Q6700 system running at 2.66GHz with 8GB ram. What should Hashtables be set at for the following Time Controls:
3 miniute game
15 miniute game
30 miniute game
120 miniute game
Lastly would these values change if you Over Clock?
What is the formula to calulate hashtbles for Quad Core Systems?
hash size = entry_size * NPS *time per move in seconds.
In a 120 minute game, a program will average at least 180 seconds per move and usually closer to 300 thanks to early book moves and correct pondering predictions. So 300 seconds, times (say) 2M nodes per second is a tree of some 600M positions and you need a hash table that big. For a program that doesn't hash in q-search. (Crafty, Junior, who knows who else) you can divide that by 10. That is a hash table big enough to hold everything. And going bigger only helps in collision cases where two signatures lead to the same table address.
In the case of crafty, a hash entry is 16 bytes, which is probably typical. So 600M positions becomes 7.2 Gigabytes of hash. In my case, on the 8-core box I use, NPS goes up to about 20M, but then I don't hash in q-search so divide by 10, and 8 gigs would be about right for crafty on that particular box.
This formula holds true for any program. the down-side is big hash tables will thrash the TLB and increase memory latency. Smaller hash tables will avoid that but will result in excessive overwriting and loss of data that would speed up the search.
I'd suggest for any program, that you pick a set of typical positions, and search them to the target time you plan on using, and run the test several times increasing the hash size by 2x each time. Pick the hash size that gives the fastest time to reach a particular depth. And don't test with SMP search. Test with one processor, then adjust the final answer by the ratio of SMP NPS / serial NPS...
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
__________________________________________________________________
Ted Summers
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Best Hashtable Settings
One of my future plans is to run some of these long runs on the cluster, and add in 3, then 4, then 5 piece endings. Can't do 6's as that would be an NFS filesystem and then network traffic would become a big performance hit... But I will at least be able to say whether or not 3-4-5 piece tables actually help or hurt or have no influence on a program's overall performance level.Mike S. wrote:Thanks for the info. - Again we find: Maximum is not optimum. (I think similar about how many tablebases to use in the search.)
Also let me add, maximum _is_ optimal if there are no architectural details that get in the way. for example, if you use a linux kernel that will use the huge page size (2mb) then there are 1024x fewer page table entries, which means 1024x less load on the TLB. Which might mean huge ram sizes can be used with no penalty. I will also add that beyond some point, bigger won't help since once you have enough hash to store any possible tree you can search, going bigger doesn't help. But it doesn't hurt unless the architecture dislikes large memory programs because of things like the TLB issue.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Best Hashtable Settings
More discussion:AdminX wrote:Thanks for all of the info Bob. So alot depends on the engine you use more than the hardware. Not that hardware is not a factor.bob wrote:The simple formula for programs that hash in the q-search is as follows:AdminX wrote:Lets say you have a Q6700 system running at 2.66GHz with 8GB ram. What should Hashtables be set at for the following Time Controls:
3 miniute game
15 miniute game
30 miniute game
120 miniute game
Lastly would these values change if you Over Clock?
What is the formula to calulate hashtbles for Quad Core Systems?
hash size = entry_size * NPS *time per move in seconds.
In a 120 minute game, a program will average at least 180 seconds per move and usually closer to 300 thanks to early book moves and correct pondering predictions. So 300 seconds, times (say) 2M nodes per second is a tree of some 600M positions and you need a hash table that big. For a program that doesn't hash in q-search. (Crafty, Junior, who knows who else) you can divide that by 10. That is a hash table big enough to hold everything. And going bigger only helps in collision cases where two signatures lead to the same table address.
In the case of crafty, a hash entry is 16 bytes, which is probably typical. So 600M positions becomes 7.2 Gigabytes of hash. In my case, on the 8-core box I use, NPS goes up to about 20M, but then I don't hash in q-search so divide by 10, and 8 gigs would be about right for crafty on that particular box.
This formula holds true for any program. the down-side is big hash tables will thrash the TLB and increase memory latency. Smaller hash tables will avoid that but will result in excessive overwriting and loss of data that would speed up the search.
I'd suggest for any program, that you pick a set of typical positions, and search them to the target time you plan on using, and run the test several times increasing the hash size by 2x each time. Pick the hash size that gives the fastest time to reach a particular depth. And don't test with SMP search. Test with one processor, then adjust the final answer by the ratio of SMP NPS / serial NPS...
(1) bigger hash sizes should never hurt a program, unless the program is guilty of clearing the hash after each move, and the game is very fast so that the time taken to clear the hash is a significant part of the time used to choose a move. In that case the big hash hurts performance and offers nothing if the game is that fast.
(2) architectural details are the stumbling block. The TLB is one issue, although the huge/jumbo page size can help. This is beyond the program's ability to handle however, and requires testing.
(3) bigger hash sizes don't hurt directly as once you can store the entire tree, bigger hash is just wasted memory. But if you use EGTBs, then it can actually hurt to have excessive hash, because this steals memory that could be used for filesystem cache to avoid EGTB I/O.