have you ever accepted a change knowing it's an ELO loss?

jswaff · Post by **jswaff** » Mon Sep 13, 2021 8:34 pm

For the engine authors- have you ever made a change, tested it and determined it's going to cost a few ELO, but accepted the change anyway? In other words, a change that would be (very) beneficial in some cases but hurt slightly in general. Assume you have verified the code is working as designed and you have a very high degree of confidence in the test results -- bugs and/or confidence intervals are not the issue.

I am potentially facing this dilemma now. I'll avoid the details until I do more verification / testing to be sure, but the specifics are kind of besides the point anyway. It just got me thinking about the the philosophical difference between "every change must demonstrate an ELO increase" and "I'd be willing to accept a small hit to play better in specific types of positions." And, where I stand on that spectrum.

xr_a_y · Post by **xr_a_y** » Mon Sep 13, 2021 11:15 pm

Yes! I already fixed bugs and see Elo decrease, more than once.

I have also simplified code with little Elo loss.

Or once increase mate finding performance lossing some overall Elo.

Desperado · Post by **Desperado** » Tue Sep 14, 2021 12:22 am

jswaff wrote: ↑Mon Sep 13, 2021 8:34 pm For the engine authors- have you ever made a change, tested it and determined it's going to cost a few ELO, but accepted the change anyway? In other words, a change that would be (very) beneficial in some cases but hurt slightly in general. Assume you have verified the code is working as designed and you have a very high degree of confidence in the test results -- bugs and/or confidence intervals are not the issue.

I am potentially facing this dilemma now. I'll avoid the details until I do more verification / testing to be sure, but the specifics are kind of besides the point anyway. It just got me thinking about the the philosophical difference between "every change must demonstrate an ELO increase" and "I'd be willing to accept a small hit to play better in specific types of positions." And, where I stand on that spectrum.

Hello.

One reason may be to implement a key feature on which other elements are built.
Another reason was that I implemented a feature based on experience from older engines that the engine I was working on was not ready for.
(e.g. history heuristic, nullmove pruning with higher R or dynamic R).

Also, a system often adapts an idea. By this I mean that a slight Elo reduction in version x can no longer be taken out in version x+n without Elo loss.

Style issues can of course also be a criterion, in the code and in the gameplay of the engine.

In the end, you should just have a plausible reason and be careful with this kind of decisions if you want the engine to be strong.

In contrast, bugs are always to be eliminated, even if it costs Elo.
In the long run, a more bug free engine will be stronger.

Regards

mvanthoor · Post by **mvanthoor** » Tue Sep 14, 2021 11:45 am

jswaff wrote: ↑Mon Sep 13, 2021 8:34 pm For the engine authors- have you ever made a change, tested it and determined it's going to cost a few ELO, but accepted the change anyway?

Not yet, but I'm about to do that.

In the current Rustic version, the transposition table has a mistake. It has 4 entries per bucket. Instead of putting new data into the entry with the lowest depth, it puts it into the last entry that has a lower depth than the incoming data. Entry with lowest depth is 0 by default, so it starts the comparison at entry 1 in the bucket, comparing 3 entries instead of 4. So, if it finds no entry with a lower depth than the incoming data in entry 1-3, it will put the data in entry 0.

The correct behavior would be to first find the entry with the lowest depth, and then replace this IF the incoming data has a higher depth.

If I make this change, the transposition table will use the entries better, but trawling through the entries becomes slower, which costs about 10 Elo, if I keep the entries per bucket at 4. Maybe I'll compensate this by dropping the number of entires per bucket to 3.

I also don't know yet if I should refactor the TT; it now uses buckets (indexes) with 4 entries / bucket. Maybe I should use entries (indexes) with 4 buckets / entry. There doesn't seem to be a consensus.

In the near future I'll be testing a bit with the TT to see what works best, and maybe I'll add aging in the process.

R. Tomasi · Post by **R. Tomasi** » Tue Sep 14, 2021 11:53 am

mvanthoor wrote: ↑Tue Sep 14, 2021 11:45 am
jswaff wrote: ↑Mon Sep 13, 2021 8:34 pm For the engine authors- have you ever made a change, tested it and determined it's going to cost a few ELO, but accepted the change anyway?
Not yet, but I'm about to do that.

In the current Rustic version, the transposition table has a mistake. It has 4 entries per bucket. Instead of putting new data into the entry with the lowest depth, it puts it into the last entry that has a lower depth than the incoming data. Entry with lowest depth is 0 by default, so it starts the comparison at entry 1 in the bucket, comparing 3 entries instead of 4. So, if it finds no entry with a lower depth than the incoming data in entry 1-3, it will put the data in entry 0.

The correct behavior would be to first find the entry with the lowest depth, and then replace this IF the incoming data has a higher depth.

If I make this change, the transposition table will use the entries better, but trawling through the entries becomes slower, which costs about 10 Elo, if I keep the entries per bucket at 4. Maybe I'll compensate this by dropping the number of entires per bucket to 3.

I also don't know yet if I should refactor the TT; it now uses buckets (indexes) with 4 entries / bucket. Maybe I should use entries (indexes) with 4 buckets / entry. There doesn't seem to be a consensus.

In the near future I'll be testing a bit with the TT to see what works best, and maybe I'll add aging in the process.

I would be very interested in the results of your testing. I use the exact same strategy as you describe with "correct behaviour" and am wondering how many buckets is the optimal number. Unfortunately I cannot really test that at the moment, since my TT lookups have inefficiencies that dominate any effects that are associated with changing the number of buckets. Currently I am using two buckets.

mvanthoor · Post by **mvanthoor** » Tue Sep 14, 2021 2:44 pm

R. Tomasi wrote: ↑Tue Sep 14, 2021 11:53 am I would be very interested in the results of your testing. I use the exact same strategy as you describe with "correct behaviour" and am wondering how many buckets is the optimal number. Unfortunately I cannot really test that at the moment, since my TT lookups have inefficiencies that dominate any effects that are associated with changing the number of buckets. Currently I am using two buckets.

I ran some preliminary tests about two months ago.

Currently, the transposition table does work, it just uses its buckets "less correctly." It sometimes doesn't traverse all the entries in a bucket. The effect is that the TT is faster, but bucket usage isn't optimal. Fixing this made the TT correct, but slower, and the speed impact cost 100 Elo.

There's an old paper called "Replacement Schemes for Transposition Tables" (can be found online) by Jaap van den Herik and colleagues, where they test different replacement schemes in the beginning of the 90's. When a "big" transposition table is used (>= 1 MB at that time), the difference between the best replacement scheme and no replacement scheme (just replace always) is less than 3 percent. There is only a measurable difference if the TT is tiny.

Therefore I'm going to make the replacement scheme as simple as possible, and probably use two buckets only, because my previous tests indicate that a faster TT is better than one with a better replacement scheme. I'll post the tests in the Progress on Rustic topic (in my sig). I don't see a point to include each and every test in my documentation on the website; which; by the way, also needs some more stuff written.

klx · Post by **klx** » Tue Sep 14, 2021 3:04 pm

For the specific situation you describe, no not really. I'd typically focus on the overall Elo, or come up with a different measure that accurately reflects what I'm optimizing. An exception would be if you have a "niche" type of engine, let's say you're not trying to build the best engine overall, but focus on some style of play which you want to optimize. Though I'd question if it wouldn't be possible to selectively apply your change to the situations where it'd be beneficial.

When I typically am ok with accepting a loss is when there's potential for a minor performance optimization, but it would either complicate or duplicate the code considerably, and thus introduce risks for bugs and add maintenance cost. For example, is it worth adding 1000 lines of code for a 5% gain overall? Likely not.

klx · Post by **klx** » Tue Sep 14, 2021 3:08 pm

mvanthoor wrote: ↑Tue Sep 14, 2021 11:45 am If I make this change, the transposition table will use the entries better, but trawling through the entries becomes slower, which costs about 10 Elo

mvanthoor wrote: ↑Tue Sep 14, 2021 2:44 pm The effect is that the TT is faster, but bucket usage isn't optimal. Fixing this made the TT correct, but slower, and the speed impact cost 100 Elo.

When measuring speed, I'd strongly recommend using NPS instead of Elo for much better accuracy. Or both, but at least include NPS.

There was another poster here who claimed that 5% speedup corresponds to 4 Elo. In which case your 10 Elo would be a ~12.5% slowdown, and 100 Elo would be something huge, which sounds suspect for that type of change. There is a risk then that the Elo changes you see are due to insufficient testing, or an actual weaker engine which might not have been what you expected.

mvanthoor · Post by **mvanthoor** » Tue Sep 14, 2021 3:41 pm

klx wrote: ↑Tue Sep 14, 2021 3:08 pm
mvanthoor wrote: ↑Tue Sep 14, 2021 11:45 am If I make this change, the transposition table will use the entries better, but trawling through the entries becomes slower, which costs about 10 Elo

mvanthoor wrote: ↑Tue Sep 14, 2021 2:44 pm The effect is that the TT is faster, but bucket usage isn't optimal. Fixing this made the TT correct, but slower, and the speed impact cost 100 Elo.
When measuring speed, I'd strongly recommend using NPS instead of Elo for much better accuracy. Or both, but at least include NPS.

There was another poster here who claimed that 5% speedup corresponds to 4 Elo. In which case your 10 Elo would be a ~12.5% slowdown, and 100 Elo would be something huge, which sounds suspect for that type of change. There is a risk then that the Elo changes you see are due to insufficient testing, or an actual weaker engine which might not have been what you expected.

Oops. Typo. Sorry. The speed impact cost 10 Elo.

But that was still within the margin of error (~15 Elo), so the engine might actually not be weaker. Therefore I will indeed be running some more tests on positions, using NPS as the main measure. (This is also the reason why I have haven't pulled this fix into the master branch yet.)

have you ever accepted a change knowing it's an ELO loss?

have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?

Re: have you ever accepted a change knowing it's an ELO loss?