objective: find deep ply checkmate
I have a large TT which allows for up to 500 collisions per key. Works fine until I get up to 20-ply and above. I used to have a LIFO stack after 500 collisions; however, the engine spent almost all of the time managing the stack instead of working on the solution. If I clear the TT table when it gets to the max, it finds the solution, but it still takes way too much time after losing the benefit of the prior history.
I assume most of you overwrite the hash table, most likely from the earliest entry since the latest entries are the ones most likely to be hit. When I tried that approach, I sometimes miss the solution at higher depth. Also, is a 500 key width separation too much. What size works best for speed and accuracy?
Any suggestions on a better way to manage the TT table on deep ply searches would be appreciated.
			
			
									
						
										
						TT management ?
Moderator: Ras
- 
				AAce3  
- Posts: 80
- Joined: Fri Jul 29, 2022 1:30 am
- Full name: Aaron Li
Re: TT management ?
I would not use a super large stack for TTs. A TT is not your traditional hash table, where you have to store every entry. A TT accepts that most entries will be overwritten at some point. So, rather than a queue, most people use a bucket aligned to 64 bytes (a cache-line) so that you can retrieve the entire bucket with one RAM access, minimizing the time spent querying the TT. Any larger means that multiple accesses are required, slowing everything down. 
Alternatively, always-replace is really fast if you have a sufficiently large TT. Even if higher depth entries get overwritten, e.g. depth 10 entry is overwritten by a depth 3 entry, you still have all the depth 9 entries so it's effectively a depth 1 search. There was a really good thread on this awhile back, forum3/viewtopic.php?f=7&t=80171&hilit=bucket
			
			
									
						
										
						Alternatively, always-replace is really fast if you have a sufficiently large TT. Even if higher depth entries get overwritten, e.g. depth 10 entry is overwritten by a depth 3 entry, you still have all the depth 9 entries so it's effectively a depth 1 search. There was a really good thread on this awhile back, forum3/viewtopic.php?f=7&t=80171&hilit=bucket
- 
				hgm  
- Posts: 28396
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT management ?
The TT will not be able to store more than what fits in memory. And your search tree will not fit in memory. So at some point you either will have to overwrite old entries, or stop storing new ones. Clearing the entire table when you reach some overflow condition, rather than just clearing a single entry to make room for the new one, is very stupid. It takes a lot of tome, and needlessly destroys tons of useful information.
Using a stack for each index is a very poor design for any hash table. The stack size is limited anyway, and when one stack gets filled, you must lose data when a new entry maps to that index, while there is still plenty of room at other indexes. Plus, that for large stack size the hasing degenerates to a linear search through the stack. Which is exactly what hashing is supposed to prevent. Add-the-hash rehash or quadratic rehash are vastly superior to it.
For a TT even just overwriting whenever you get an index collision is better. It is a mistake to rehash too many times for finding an empty slot; you would only need that when the table gets nearly 100% filled. The effort to utilize the last few percent of available space is more expensive than what you gain by the extra table size. Search speed only depends on TT size as the 12th root. So 12% better utilization of the available table size only speeds you up by 1%. Memory access is comparatively slow, and if you double the number of memory accesses needed to utilize that extra 12%, it might easily slow the speed by 10%.
			
			
									
						
										
						Using a stack for each index is a very poor design for any hash table. The stack size is limited anyway, and when one stack gets filled, you must lose data when a new entry maps to that index, while there is still plenty of room at other indexes. Plus, that for large stack size the hasing degenerates to a linear search through the stack. Which is exactly what hashing is supposed to prevent. Add-the-hash rehash or quadratic rehash are vastly superior to it.
For a TT even just overwriting whenever you get an index collision is better. It is a mistake to rehash too many times for finding an empty slot; you would only need that when the table gets nearly 100% filled. The effort to utilize the last few percent of available space is more expensive than what you gain by the extra table size. Search speed only depends on TT size as the 12th root. So 12% better utilization of the available table size only speeds you up by 1%. Memory access is comparatively slow, and if you double the number of memory accesses needed to utilize that extra 12%, it might easily slow the speed by 10%.
- 
				Chessnut1071
- Posts: 313
- Joined: Tue Aug 03, 2021 2:41 pm
- Full name: Bill Beame
Re: TT management ?
You are right about losing data when the entire TT is erased; however, "always replace" doesn't work for me either. Most of this board is focused on ELO maximization and always erase usually is the correct choice. I have a different issue. I can't prune and need to rely on the evaluation function to reduce the legal move set. If I overwrite even 1 success with a failure, I lose the solution. At deep depth odds are very good I'll see a collusion.hgm wrote: ↑Sun Sep 11, 2022 10:38 pm The TT will not be able to store more than what fits in memory. And your search tree will not fit in memory. So at some point you either will have to overwrite old entries, or stop storing new ones. Clearing the entire table when you reach some overflow condition, rather than just clearing a single entry to make room for the new one, is very stupid. It takes a lot of tome, and needlessly destroys tons of useful information.
Using a stack for each index is a very poor design for any hash table. The stack size is limited anyway, and when one stack gets filled, you must lose data when a new entry maps to that index, while there is still plenty of room at other indexes. Plus, that for large stack size the hasing degenerates to a linear search through the stack. Which is exactly what hashing is supposed to prevent. Add-the-hash rehash or quadratic rehash are vastly superior to it.
For a TT even just overwriting whenever you get an index collision is better. It is a mistake to rehash too many times for finding an empty slot; you would only need that when the table gets nearly 100% filled. The effort to utilize the last few percent of available space is more expensive than what you gain by the extra table size. Search speed only depends on TT size as the 12th root. So 12% better utilization of the available table size only speeds you up by 1%. Memory access is comparatively slow, and if you double the number of memory accesses needed to utilize that extra 12%, it might easily slow the speed by 10%.
I'm trying a different approach. If I push a move to the table and it has a different success/failure than the current record, I disable that hash key for good. At least I'm not trashing the entire table, but I have to avoid mixed messages. The only way to reduce the collusions is a very efficient evaluation function which I currently don't have. Some of the 18 - 24 ply mates claimed on this board search 10,000 x fewer nodes than my evaluation function on these deep ply searches. My engine takes way too much time to efficiently optimize the evaluation function right now.
What do you think about the disabled key approach? It may also be inefficient, but it still finds the solutions.
- 
				hgm  
- Posts: 28396
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT management ?
There must be something badly wrong with the way your program uses the TT, then. With correct usage, the only effect of losing an entry would only be a slowdown. Because you then search a second toime what you would otherwise have gotten from the table. But that would give you the same result.
You can only lose beyond-the-horizon mates that were discovered by grafting. But these are not 'solutions' anyway, because you have no guarantee the are the fastest mates. So why would you care if you lose them?
Whatever you think about erasing, a design that uses stacks with a fixed size at each index is a very inferior design. Even linear rehash would be vastly better. With a maximum stack size of 50 you would on average need to search through 25 entries to add a new one when the table is 50% filled. With linear rehash that would only be 2.
			
			
									
						
										
						You can only lose beyond-the-horizon mates that were discovered by grafting. But these are not 'solutions' anyway, because you have no guarantee the are the fastest mates. So why would you care if you lose them?
Whatever you think about erasing, a design that uses stacks with a fixed size at each index is a very inferior design. Even linear rehash would be vastly better. With a maximum stack size of 50 you would on average need to search through 25 entries to add a new one when the table is 50% filled. With linear rehash that would only be 2.
- 
				dangi12012
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: TT management ?
TT is a very indirect lookup from a hash to a position. 
I wish people started to research more into storing the tree in memory all the time.
Sounds wasteful at first glance - but pruning and walking the positions becomes much faster - and nothing is forgotten ever, making cooperation possible where alphabeta is limited.
It is only usable starting from 32gb of ram - and with the use of memory mapped IO and pcie 4.0 SSDs there is no reason to not go to 512GB or beyond.
Once you can do IO with a few GB/s you can grow and read a tree without performance loss.
			
			
									
						
							I wish people started to research more into storing the tree in memory all the time.
Sounds wasteful at first glance - but pruning and walking the positions becomes much faster - and nothing is forgotten ever, making cooperation possible where alphabeta is limited.
It is only usable starting from 32gb of ram - and with the use of memory mapped IO and pcie 4.0 SSDs there is no reason to not go to 512GB or beyond.
Once you can do IO with a few GB/s you can grow and read a tree without performance loss.
Worlds-fastest-Bitboard-Chess-Movegenerator 
Daniel Inführ - Software Developer
			
						Daniel Inführ - Software Developer
- 
				Chessnut1071
- Posts: 313
- Joined: Tue Aug 03, 2021 2:41 pm
- Full name: Bill Beame
Re: TT management ?
FYIhgm wrote: ↑Mon Sep 12, 2022 7:48 am There must be something badly wrong with the way your program uses the TT, then. With correct usage, the only effect of losing an entry would only be a slowdown. Because you then search a second toime what you would otherwise have gotten from the table. But that would give you the same result.
You can only lose beyond-the-horizon mates that were discovered by grafting. But these are not 'solutions' anyway, because you have no guarantee the are the fastest mates. So why would you care if you lose them?
Whatever you think about erasing, a design that uses stacks with a fixed size at each index is a very inferior design. Even linear rehash would be vastly better. With a maximum stack size of 50 you would on average need to search through 25 entries to add a new one when the table is 50% filled. With linear rehash that would only be 2.
My delete conflicting keys, keys with both success and failure codes, apparently worked! My best solution time deleting an overflowed bucked was 48,363 seconds. The delete conflicting keys clocked 1,285 seconds to solution. I'm searching for your recommendation "linear rehash" for some improvement. I never got to 22-ply before with a solution under 8 hours.
I still need millions of nodes to solve the 18-ply mate that Myriddin solved in 7 seconds examining only 75,000 + nodes. Not sure how he did that unless he's using a multiprocessor which I don't have right now.
I really appreciate all the suggestions, I'm a light year ahead of where I started over a year ago; however, everyone here else seems to be a light year ahead of me.
- 
				hgm  
- Posts: 28396
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT management ?
Using multi-processor usually only drives up the number of nodes to find a solution, since some processors will search the same nodes. It shortens the time, though.
If Myrddin is special in this respect I would not know. Myrddin is a regular chess engine developed for game play, so I suspect most other engines that are not too primitive would find the solution with a similar number of nodes. Have you tried any (e.g. Fruit 2.1)?
If the time to solution seems too long, it might be a problem of low speed (nodes/sec). But if the number of nodes to solution seems too high, is always means your search sucks. If it sucks very badly it is usually a matter of move ordering. But reductions or extensions can also speed up finding a solution. Many deep mate problems involve a lot of checks. So check extension might help to find the mating line at low nominal depth (without wasting too much time on the non-mating lines). After which refuting all the alternatives becomes comparatively easy, as these only have to prove they would not end in checkmate, but don't care how much you sacrificed to achieve that. Before you have found any mating line, the search will work very hard to prove alternative lines do not gain enough material.
			
			
									
						
										
						If Myrddin is special in this respect I would not know. Myrddin is a regular chess engine developed for game play, so I suspect most other engines that are not too primitive would find the solution with a similar number of nodes. Have you tried any (e.g. Fruit 2.1)?
If the time to solution seems too long, it might be a problem of low speed (nodes/sec). But if the number of nodes to solution seems too high, is always means your search sucks. If it sucks very badly it is usually a matter of move ordering. But reductions or extensions can also speed up finding a solution. Many deep mate problems involve a lot of checks. So check extension might help to find the mating line at low nominal depth (without wasting too much time on the non-mating lines). After which refuting all the alternatives becomes comparatively easy, as these only have to prove they would not end in checkmate, but don't care how much you sacrificed to achieve that. Before you have found any mating line, the search will work very hard to prove alternative lines do not gain enough material.
- 
				Chessnut1071
- Posts: 313
- Joined: Tue Aug 03, 2021 2:41 pm
- Full name: Bill Beame
Re: TT management ?
Most of the deep ply mates are check intensive. In fact, many 18 - 30 ply mates are all checks. My problem is some of the mates were deliberately structured to befuddle a computer. My engine was optimized on check, slider pieces, enemy king moves, enemy legal moves, enemy king safety and material balance. The mates I'm interested in are the hard-to-find, deep-ply mates. I suspect that the normal optimization of the evaluation function will fail ignominiously and only speed will make a significant difference. I really don't see how some of these top engines have much better evaluation functions unless they have an array of functions each running on separate cores.hgm wrote: ↑Mon Sep 12, 2022 8:39 pm Using multi-processor usually only drives up the number of nodes to find a solution, since some processors will search the same nodes. It shortens the time, though.
If Myrddin is special in this respect I would not know. Myrddin is a regular chess engine developed for game play, so I suspect most other engines that are not too primitive would find the solution with a similar number of nodes. Have you tried any (e.g. Fruit 2.1)?
If the time to solution seems too long, it might be a problem of low speed (nodes/sec). But if the number of nodes to solution seems too high, is always means your search sucks. If it sucks very badly it is usually a matter of move ordering. But reductions or extensions can also speed up finding a solution. Many deep mate problems involve a lot of checks. So check extension might help to find the mating line at low nominal depth (without wasting too much time on the non-mating lines). After which refuting all the alternatives becomes comparatively easy, as these only have to prove they would not end in checkmate, but don't care how much you sacrificed to achieve that. Before you have found any mating line, the search will work very hard to prove alternative lines do not gain enough material.
That "replace keys with offsetting entries" allowed me to start the hash at 3-ply instead of 9-ply. Until that change I had too many collisions for my small hash table size. I no longer have a bucket and that really sped up the process. I'm not sure if I can get that much improvement from here, so, I'm going to focus on the evaluation function. Unfortunately, puzzles designed to befuddle computer solution probably have unique parameters and probably can't be generalized.
Looking to download some of those public domain engines, but, not on my main computer, I don't trust anybody from awful experience.
- 
				AAce3  
- Posts: 80
- Joined: Fri Jul 29, 2022 1:30 am
- Full name: Aaron Li
Re: TT management ?
I'm not sure how successful that would be. Alpha beta relies a lot on previous lines that have been examined. With conditions different at every node, Move ordering, alpha beta bounds, reductions, etc. etc. all depend on the result of previous searches, so storing the tree might not actually help you because you might have to rewrite it again anyways due to different bounds. If you can make it work though I would be excited to see what you come up with.dangi12012 wrote: ↑Mon Sep 12, 2022 5:45 pm TT is a very indirect lookup from a hash to a position.
I wish people started to research more into storing the tree in memory all the time.
Sounds wasteful at first glance - but pruning and walking the positions becomes much faster - and nothing is forgotten ever, making cooperation possible where alphabeta is limited.
It is only usable starting from 32gb of ram - and with the use of memory mapped IO and pcie 4.0 SSDs there is no reason to not go to 512GB or beyond.
Once you can do IO with a few GB/s you can grow and read a tree without performance loss.