lithander wrote: ↑Mon Jul 12, 2021 6:22 pm
I try to avoid changes that will only make an expected 10 ELO difference and can only be tested in actual play. At the level our engines are currently playing at there are still a lot of low hanging fruits that can be picked for rather large gains. (Add nullMove pruning if you want to see a huge jump in strength for example)
Sure enough, but I want at least any bugs or inaccuracies to be gone. I changed the store function to this:
Code: Select all
pub fn store(&mut self, verification: u32, data: D, used_entries: &mut usize) {
let mut low = 0;
let mut found: Option<usize> = None;
// Find the index of the entry with the lowest depth.
for (i, entry) in self.bucket.iter().enumerate() {
// We found the position.
if entry.verification == verification {
found = Some(i);
// So we can stop looking.
break;
}
// We may have a new entry with lowest depth.
if entry.data.depth() < self.bucket[low].data.depth() {
low = i;
}
}
// If the incoming position was found...
if let Some(f) = found {
// Then replace it if incoming depth is greater.
if data.depth() > self.bucket[f].data.depth() {
self.bucket[f] = Entry { verification, data };
}
} else {
// Position not found. Overwrite entry with lowest depth.
if self.bucket[low].verification == 0 {
*used_entries += 1;
}
self.bucket[low] = Entry { verification, data };
}
}
Basically it runs through the entries in the buckets, and finds the one with the lowest depth. It also keeps an eye out for the position; if it is found, then the index is saved and the loop is broken. In the next part, the position is replaced if the incoming version of that position has a higher depth; otherwise, the incoming data is dropped so there are no duplicates. If the position wasn't found, the incoming data is saved in the entry having the lowest depth. I have 4 buckets.
Now test with an 8MB TT...
Code: Select all
Score of Rustic Alpha 3.2.106 8M vs Rustic Alpha 3.2.105 8M: 684 - 687 - 629 [0.499]
... Rustic Alpha 3.2.106 8M playing White: 392 - 300 - 308 [0.546] 1000
... Rustic Alpha 3.2.106 8M playing Black: 292 - 387 - 321 [0.453] 1000
... White vs Black: 779 - 592 - 629 [0.547] 2000
Elo difference: -0.5 +/- 12.6, LOS: 46.8 %, DrawRatio: 31.4 %
2000 of 2000 games finished.
LOL. The correct code is -0.5 Elo. Up until 1700 games or so, the 3.2.106 version had been 7-8 Elo ahead during the entire run. It seems that _HAVING_ the TT with two buckets or more is the thing: the replacement scheme doesn't seem to affect performance at all. The old function just put the incoming data into the entry that happened to be the last one to have a lower depth than the incoming data. The new function specifically updates an entry with a new higher-depth version of the position, or stuffs the data in the actual entry with the lowest depth. The first one is almost a random replacement scheme; the second is the "correct" one, but with an 8MB TT (full at move 13...) they both seem to work the same.
But your explicit goal to get the most out of every feature you include is setting the bar very high. If that's what you need to prove then oftentimes there won't be any good shortcuts. For example these fast games that you do with only 100ms increment to get thousands of games quicker is probably hurting your measurements more than they help! Especially when it is about something like the TT. The error bars go down quickly but your tests are now based on unrealistic conditions and you wont know how well they transfer to more realistic scenarios like the rating lists we all care so much about.
Most engines seem to test with 10s + 0.1s. I don't know if your 5s + 0.5s is better... those 5 seconds are gone in the blink of an eye, and then the game will basically be finished with 0.5 seconds/move. Maybe to make sure that the engine actually thinks for longer than 50ms, I'll test with 10s+1s.