Fixing bug in TT-store function dropped playing strength...

mvanthoor · Post by **mvanthoor** » Mon Jul 12, 2021 1:29 am

Fixing an inaccuracy/bug in my TT dropped my engine's Elo-rating by about 30 points, or I'm missing something here.

This is the old function for storing an entry in a bucket (ENTRIES_PER_BUCKET = 4):

OLD:

Code: Select all

pub fn store(&mut self, verification: u32, data: D, used_entries: &mut usize) {
	let mut idx_lowest_depth = 0;

	// Find the index of the entry with the lowest depth.
	for entry in 1..ENTRIES_PER_BUCKET {
		if self.bucket[entry].data.depth() < data.depth() {
			idx_lowest_depth = entry
		}
	}

	// If the verification was 0, this entry in the bucket was never
	// used before. Count the use of this entry.
	if self.bucket[idx_lowest_depth].verification == 0 {
		*used_entries += 1;
	}

	// Store.
	self.bucket[idx_lowest_depth] = Entry { verification, data }
}

This function starts at entry 1, and compares the depth of the data in that entry to the depth of the _incoming_ data. If the data in the entry has a lower depth than the incoming data, I save the index.

In the end this means that the incoming data will be saved in the last entry that happened to have a lower depth than this incoming data. If no entry has a lower depth, the incoming data will be saved in entry 0.

This obviously wasn't the intention. I wanted to save the incoming data in the entry having the lowest depth. (I noticed this while refactoring.) So I changed the function:

NEW:

Code: Select all

pub fn store(&mut self, verification: u32, data: D, used_entries: &mut usize) {
	let mut low = 0;

	// Find the index of the entry with the lowest depth.
	for i in 1..ENTRIES_PER_BUCKET {
		if self.bucket[i].data.depth() < self.bucket[low].data.depth() {
			low = i
		}
	}

	// If the verification was 0, this entry in the bucket was never
	// used before. Count the use of this entry.
	if self.bucket[low].verification == 0 {
		*used_entries += 1;
	}

	// Store.
	self.bucket[low] = Entry { verification, data }
}

Apart from the variable name change, I now don't compare against the depth of the incoming data, but one entry against another. The data in the entry "i" is compared to the data in the entry "low", and if "i" has a lower depth, it becomes the new "low".

So in the end, the incoming data overwrites the entry with the lowest depth.

This seems to have dropped the playing strength of the engine by 30 points.

What am I missing here? It seems the second function is correct.

The only thing I can think about is that the hash table is filling up with ever higher and higher entries that at some point are of no use anymore, effectively rendering the buckets useless, because it's always the same entry being replaced. I would have to implement an aging system. The bug in the first function inadvertently seems to circumvents this (at least for some time), because it will save the incoming data into the entry that happens to be the last one with a lower depth than the incoming one, even if it is an entry with a very high depth just 1 lower than the incoming one.

Am I correct here, or could it be something else?

mvanthoor · Post by **mvanthoor** » Mon Jul 12, 2021 2:18 am

I've been testing the current dev version of Rustic in some gauntlets. After the "fix" above, the engine was at -79 Elo against the field after 1000 of 4000 games. The previous "buggy" version has been tested at at -35 Elo against the field. In the past I've noticed that the Elo rating hardly changes after the first 1000 games. It changes by maybe +/- 5 points while the error margin keeps decreasing.

I aborted the test for that version, and replaced the "Store" function with this:

Code: Select all

    pub fn store(&mut self, verification: u32, data: D, used_entries: &mut usize) {
        // See if the first or second entry is the lowest in the bucket.
        let first = self.bucket[0].data.depth() <= self.bucket[1].data.depth();
        let low = if first { 0 } else { 1 };

        // If the verification was 0, this entry in the bucket was never
        // used before. Count the use of this entry.
        if self.bucket[low].verification == 0 {
            *used_entries += 1;
        }

        // Store.
        self.bucket[low] = Entry { verification, data }
    }

So I basically dropped back to two entries/bucket, and I just replace the one with the lowest depth in it. (If they're equal, slot 0 will be used because it will obviously be found first when probing the table.) I've restarted the test, and up until now, it's looking OK; similar to the one with the buggy version. Now a bucket could still have an entry stuck on a huge depth. However, I assume implementing an aging system for only two entries/bucket will take a lot less time to update than one for 4-5 entries/bucket.

In the paper "Replacement schemes in transposition tables" (Van Herik et al, from the beginning of the 90's) the authors mention that replacement schemes basically come to nothing ("difference will be at most 3% in extra TT hits") as soon as the TT is big enough (where "big enough" is >= 1024 kB...). I wonder if it's actually useful to have more than 2 buckets if the TT is 32 MB at the very least, and probably, 128 MB or bigger.

Joost Buijs · Post by **Joost Buijs** » Mon Jul 12, 2021 6:46 am

When storing you should take a look if an entry with the same key is already there and then store immediately, otherwise you can get the situation that a bucket contains more than one entry with the same key.

Nowadays with very fast multicore computers you have store to over a billion entries per minute, compared to the 90's memory and cache architecture changed completely, so I think that the article of Jaap van den Herik is not very valid anymore.

In practice I always use a TT of 1 to 4 GB for blitz and rapid, for longer time controls it even has to be larger. Of course it depends upon storing in quiescence or not, if you don't store in quiescence it can probably be smaller.

With TT's > 1 GB the speed severely drops because the TLB gets thrashed, you can solve this by using large pages, this is what I default do.

hgm · Post by **hgm** » Mon Jul 12, 2021 10:39 am

One important thing to keep in mind is whether your testing conditions are such that the replacement scheme should have any influence at all. In the experiments I did this was usually only the case when the typical tree size (node count) was at least 10 times larger than the number of TT entries. For larger TT there will be almost no replacement.

What Joost says is important, and from the code you show it is not obvious that you do it (but maybe you do it elsewhere): don't allow duplicats in the bucket; if the position is alreay there, always overwrite that.

lithander · Post by **lithander** » Mon Jul 12, 2021 1:20 pm

hgm wrote: ↑Mon Jul 12, 2021 10:39 am One important thing to keep in mind is whether your testing conditions are such that the replacement scheme should have any influence at all. In the experiments I did this was usually only the case when the typical tree size (node count) was at least 10 times larger than the number of TT entries. For larger TT there will be almost no replacement.

I have implemented the 2-bucket system you suggested to replace the simple "always replace" TT that I had in version 0.5 of MinimalChess. I think it works fine but it's only a definite improvement in my tests when I limit the hash size to 5 MB or 10 MB to compensate for the fast time controls (5s + 0.5s inc) that I use for testing. With default hash sizes of 50 MB no improvement is measurable. And now Joost is talking about gigabytes of memory...

Since, for example CCRL testers also use rather fast time controls but large standard hashsizes of 256 MB, I wonder if maybe my original always-replace scheme wasn't actually a very decent solution?
I collect the PV now in a simpler fashion and don't play any moves from the PV directly but instead store the PV in the TT before each iteration. So the PV moves are guaranteed to be in there when they are needed! And with a TT size measured in gigabytes like Joost suggests I can't see how the "overhead" of having buckets, a replacement scheme and aging would be justified. Could that simple always-replace scheme be valid even for competitive engines, too?

j.t. · Post by **j.t.** » Mon Jul 12, 2021 1:46 pm

lithander wrote: ↑Mon Jul 12, 2021 1:20 pmCould that simple always-replace scheme be valid even for competitive engines, too?

I am not sure that you can call Nalwald competitive, but there for non-PV nodes I always replace, except if the old entry has the same zobrist key and a higher depth. I found that that works nice together with storing all PV nodes in a separate "non-lossy" hash table (where it doesn't matter if it isn't memory efficient, as there are only a few thousand PV nodes total each search).

mvanthoor · Post by **mvanthoor** » Mon Jul 12, 2021 2:12 pm

hgm wrote: ↑Mon Jul 12, 2021 10:39 am What Joost says is important, and from the code you show it is not obvious that you do it (but maybe you do it elsewhere): don't allow duplicats in the bucket; if the position is alreay there, always overwrite that.

I didn't take that into account. I assume an existing entry with the same Zobrist key as the incoming one should only be overwritten if the incoming one has a higher depth?

lithander wrote: ↑Mon Jul 12, 2021 1:20 pm I have implemented the 2-bucket system you suggested to replace the simple "always replace" TT that I had in version 0.5 of MinimalChess. I think it works fine but it's only a definite improvement in my tests when I limit the hash size to 5 MB or 10 MB to compensate for the fast time controls (5s + 0.5s inc) that I use for testing. With default hash sizes of 50 MB no improvement is measurable. And now Joost is talking about gigabytes of memory...

Same. As soon as I increase the TT above 8 MB, there is no difference if I use buckets or not. (Or, more correctly: there are a few Elo points of difference, but they're well within the margin of error.)

Since, for example CCRL testers also use rather fast time controls but large standard hashsizes of 256 MB, I wonder if maybe my original always-replace scheme wasn't actually a very decent solution?

I don't know about MinimalChess (it does not report the UCI "hashfull" stat to the GUI), but Rustic can easily fill up all entries in a 256 MB TT (wiih 4 buckets) on an i7-6700K when running at 2m+1s. Therefore, replacement schemes should come into play at some point.

hgm · Post by **hgm** » Mon Jul 12, 2021 2:36 pm

lithander wrote: ↑Mon Jul 12, 2021 1:20 pmI have implemented the 2-bucket system you suggested to replace the simple "always replace" TT that I had in version 0.5 of MinimalChess. I think it works fine but it's only a definite improvement in my tests when I limit the hash size to 5 MB or 10 MB to compensate for the fast time controls (5s + 0.5s inc) that I use for testing. With default hash sizes of 50 MB no improvement is measurable. And now Joost is talking about gigabytes of memory...

Since, for example CCRL testers also use rather fast time controls but large standard hashsizes of 256 MB, I wonder if maybe my original always-replace scheme wasn't actually a very decent solution?
I collect the PV now in a simpler fashion and don't play any moves from the PV directly but instead store the PV in the TT before each iteration. So the PV moves are guaranteed to be in there when they are needed! And with a TT size measured in gigabytes like Joost suggests I can't see how the "overhead" of having buckets, a replacement scheme and aging would be justified. Could that simple always-replace scheme be valid even for competitive engines, too?

It depends on what your goal is. In OTB tournament like WCCC you might have several minutes per move, and that gives larger trees than when you want an engine that only works optimally during hyper-bullet testing, or on a blitz rating list.

But the original motivation was that you want the PV to be played first. The bucket-of-2 sytem is an almost trivial enhancement to achieve that, without the need for additional methods for collecting the PV, follwing PV moves in addition to hash moves, stuffing back PVs into the hash table, doing Internal Iterative Deepening, etc. All these other method would probably require more code, or more complexity.

lithander · Post by **lithander** » Mon Jul 12, 2021 3:30 pm

hgm wrote: ↑Mon Jul 12, 2021 2:36 pm But the original motivation was that you want the PV to be played first. The bucket-of-2 sytem is an almost trivial enhancement to achieve that, without the need for additional methods for collecting the PV, follwing PV moves in addition to hash moves, stuffing back PVs into the hash table, doing Internal Iterative Deepening, etc. All these other method would probably require more code, or more complexity.

I want to play the PV first. But I also need to report it to the GUI.

I found that not even with bucket-of-2 system it was guaranteed that the complete PV is found in the table. (or did I do it wrong?) When I extracted the PV from the TT (extra code I had to add) to report it to the GUI the line was often cut short. I tried protecting it against overwrites but found no really elegant way to do that.

So I looked into your and Marcels suggestion of returning the PV-line from the search directly. I now use C# tuples to return (score, pv) instead of just score. The search is a bit more complicated because of the added responsibility. But how the pv-line is created is actually easier to understand than when it still was done in the tri-table. And it's a lot less code in total: No tri-table and no code to extract the PV from the TT for the GUI.

So, now that I had a PV line after each iteration stuffing it back into the TT was just two more lines of code. With it I can be sure that the PV line get's always played first regardless of how the TT is implemented. And if the TT doesn't *have* to remember the PV nodes anymore I could simplify it again to a 1-bucket always replace scheme. Back to the roots. Of course I would still return the PV from the search directly and not accumulate it in a tri-table. And I wouldn't play pv-moves from the tri-table anymore but as TT best moves. That means this version would be both faster and smaller/simpler than version 0.4.9 that plays in Zatour or 0.5 that was tested by CCRL.

But the decision between always-replace and your 2-bucket-system seems hard to make. As you said it depends on the context. If you give the engine several minutes per move or only a few megabytes of memory for the hash then the more complicated TT would perform better. But if you give it a gigabyte of memory or play on fast time controls than the simple scheme would be better.

Do some engines make a choice like that after the context is known? e.g. only when the time control and hash size to be used in the current game is known? (not a solution for a minimal engine but for a competitive one, maybe)

Joost Buijs · Post by **Joost Buijs** » Mon Jul 12, 2021 3:52 pm

lithander wrote: ↑Mon Jul 12, 2021 1:20 pm I have implemented the 2-bucket system you suggested to replace the simple "always replace" TT that I had in version 0.5 of MinimalChess. I think it works fine but it's only a definite improvement in my tests when I limit the hash size to 5 MB or 10 MB to compensate for the fast time controls (5s + 0.5s inc) that I use for testing. With default hash sizes of 50 MB no improvement is measurable. And now Joost is talking about gigabytes of memory...

The optimal TT-size depends upon a lot of things. For instance I use 32 cores this will fill the TT approximately 32 times as fast. Another thing is whether you store in quiescence or not, this will make a huge difference. With lazy-SMP all inter-thread communication is done via the TT this has influence as well.

It could very well be when you don't use more than 1 thread and you don't store in quiescence that a TT-size of 50 MB is sufficient.

Fixing bug in TT-store function dropped playing strength...

Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...

Re: Fixing bug in TT-store function dropped playing strength...