If you work with the 1/8000 idea, you will still have to contend with WAY more than 8000 buckets. IE in the parallel search testing I was reporting on a few months ago, I was using 64G of hash, which is 4G entries. Or 1G "buckets". So I would expect to see that a pretty significant number of times. But you only need two entries in a bucket with the same draft before the problem occurs, since that group could be the over-populated draft positions, and you have to choose which of the two to replace. If you assume say a 32 ply search, then you could see drafts between 1 to 32 (ignoring the edge case at the root). The probability of any two entries in a bucket having the same draft is fairly low (6 * (1/32) ^ 2) = about 1/200 if I did my mental math reasonably. With 1G buckets, that is about 5M buckets that meet that criterion.hgm wrote:The point is that this is all pretty unlikely. If every draft occurs about equally often, and fills, say, 5% of the table, the probability that all 4 entries in a bucket have the same draft is 1 in 8,000.bob wrote:Here's a question: Suppose you use a bucket size of 4, which is typical although some are beginning to go with 8, using an 8 byte uber-compressed table entry (different topic). All four entries in that bucket represent the same draft, so how do you choose which one to replace? Age might be a usable alternative when draft alone isn't enough to break the tie. What if two entries in the table represent the same draft (which is over-populated and should cause a replacement). How do you choose which one? Again age might help. Overpopulated and produced by the previous search would be a better choice it would seem.
The drafts higher than a certain N are not yet 'saturated', i.e. they would occur less frequently than every draft below N, and in fact all drafts above N together would occur hardly more than any single draft below N, because the production rate decreases exponentially with draft. Each next higher draft, say, 3 times less than the previous one. So roughly each draft takes 1/(N+1.5) part of the table. But the exponential production rate also applies to the lower drafts. If draft N was produced enough to exactly hit the 1/(N+1.5) mark, draft N-1 would have been produced 3 times as much, so that 2/3 of it must already have been replaced. And draft N-3 already overloaded the table by a factor 27. So if you are at the 27th move of the game (excluding the opening moves) the amount of draft N-3 entries that can be held in the table corresponds to a single move. And for draft N-4 an entry would typically already have been 3 times overwritten before the current search completes.
So when N=20, the drafts 1-16 would basically all be from the current search. Even in draft 17 only about 1/e of the entries from a previous search would have survived. Only for drafts 18, 19 and 20 you would have a good chance that ther might be an stale entry around that you would preferably like to replace. But it is very unlikely that would be in the same bucket. And on a global scale you gain very little, just a bit more efficiency in 3 of the 20 drafts.
It would seem that you need some rational way to choose, whether it be random or age or something else. Age is trivial to do.
