Can principal variation search be worth no Elo?

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 1:18 am

algerbrex wrote: ↑Thu Sep 23, 2021 12:55 am
mvanthoor wrote: ↑Wed Sep 22, 2021 10:12 pm At what time control do you test? If you play really fast games like I do, at 10s+0.1s (at this time), it could be that a 64 MB TT doesn't get filled up during a game. Then it doesn't actually matter how large it is. To make sure the TT and its replacement scheme are fully used, I test with only a 16 MB TT. That fills up in about about 30 moves on my computer. (That is intentional, as 30 moves is about half the average game length. On a faster computer that can see farther in the same time, it may fill up faster, so I'd make the TT a bit larger.)
I usually test at 3+0.08s and 15+0.1s time controls, so the larger TT size I'm using may be underperforming. I'll try reducing the size, although I remember the Elo gain was fine when I manually set the TT size before each game to 128 MB. I got around 55 Elo from the TT-cuts, 100 Elo from the move order, and 160 Elo from both combined.

That is in line with my experience and that of others who've recently implemented a TT. I got 41 Elo from the TT cuts, 102 Elo from the move ordering, and 140 from the combination. That was a bit lower than the 160 some other engines were getting... and now I know why. I lost about 20 points due to that mate handling bug.

When I implemented the TT in Rustic Alpha 2, I noticed that some of the utterly devastating mating attacks were gone. I just ascribed this to the TT and its possible instability, etc... but now that I fixed the mate handling, these are back again! I've been watching some of the test games, and Rustic sometimes plays as if it's 1851 again: sacrifice everything, and mate with the last piece that's left.

The reason is that Rustic is a very fast engine compared to most others in its current rating range. These other engines get their rating from an extensive evaluation (i.e., good positional play), where Rustic gets its rating from search depth. So if Rustic doesn't get positionally outplayed early on, there's a high chance that it will see a mating attack long before the opponent does.

To be honest, I'm more happy with Rustic regaining that part of its personality than with the 15-20 Elo points

Strangely enough, now that I've rewritten my search routine to no longer have a dedicated negamax function for the root, the Elo gain from enabling vs disabling the TT is pretty awful, like ~15 ELo, so I'll need to look into that too, as I think going forward it'll be easier to just have one negamax function.

Personally I never saw any use in a separate root search. The root is when the number of ply's is 0, so I just have a variable "let is_root = (ply == 0)". When I want something to happen (or not happen) in the root, I just guard it with an if-statement. For me, that's much less confusing than having two almost the same, but not quite identical search functions.

First get the search completely right, then the TT, and then try PVS.

Note: PVS should gain somewhere between 50-55 Elo. At least, it did for me. If you implement PVS, it can be that Aspiration Windows don't work (anymore), or they can interfere with PVS. If you have those, I'd remove them before implementing PVS, and then try re-adding them later, in different stages of development. Sometimes, adding function X will suddenly make function Y work, or work better, but it depends on the engine. (Speed, evaluation function, the specific implementation of functions...)

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 1:23 am

Mergi wrote: ↑Thu Sep 23, 2021 1:07 am I took a quick glance at your TT code, and while this probably has nothing to do with your root position troubles, you do have a bug in your TT. When probing the TT and comparing the stored score to your current alpha/beta values, you need to adjust the stored score for mate first, before comparing them (you only adjust afterwards).

That would be the cause of Blunder 5.0.0 missing mates in my tests. This will cause Elo loss in some games. I've seen Blunder 5... eh... blundering around with K+R vs K, but I didn't report it because I'm not running the newest version. (I'lll test against version 6 when it has a CCRL rating, so I can use it to gauge the strength of the current development version of Rustic.) I did run some preliminary tests against Blunder 6, and I think it has more features than needed to reach a 2150 rating. It is about the same strength as Rustic-dev, and my engine does not yet have null move pruning or any version of futility pruning.

If you fix this bug, it could gain you a sizable amount of rating points. I've missed it in my cursory check of the TT code. (This bug however is not the cause of PVS not working correctly. This bug only makes the engine miss mates in some endgames.)

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 1:33 am

The bug in the TT with regard to mate handling is that the adjustment of the score is in the wrong place.

All scores are adjusted, even the ones that were set in the if's with ALPHA and BETA flag. The mate score adjustment should only be done for the EXACT flag, because a mate score is an EXACT score: you can't be "approximately checkmate". You are checkmate, or not; and if not, it's not a mate score.

Mergi · Post by **Mergi** » Thu Sep 23, 2021 1:38 am

mvanthoor wrote: ↑Thu Sep 23, 2021 1:33 am The bug in the TT with regard to mate handling is that the adjustment of the score is in the wrong place.

All scores are adjusted, even the ones that were set in the if's with ALPHA and BETA flag. The mate score adjustment should only be done for the EXACT flag, because a mate score is an EXACT score: you can't be "approximately checkmate". You are checkmate, or not; and if not, it's not a mate score.

You can be checkmated in many different ways with different mate lengths. You need to adjust the alpha/beta scores as well, and they need to be adjusted before the comparison to the current values.

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 1:48 am

Mergi wrote: ↑Thu Sep 23, 2021 1:38 am
mvanthoor wrote: ↑Thu Sep 23, 2021 1:33 am The bug in the TT with regard to mate handling is that the adjustment of the score is in the wrong place.

All scores are adjusted, even the ones that were set in the if's with ALPHA and BETA flag. The mate score adjustment should only be done for the EXACT flag, because a mate score is an EXACT score: you can't be "approximately checkmate". You are checkmate, or not; and if not, it's not a mate score.
You can be checkmated in many different ways with different mate lengths. You need to adjust the alpha/beta scores as well, and they need to be adjusted before the comparison to the current values.

Are you sure about that? As far as I've understood, the mate score is always an EXACT score, and if a TT-entry is EXACT, you always return the score. You don't compare the saved score to the incoming score. You do this for alpha and beta however.

Here's my TT: (the relevant part is in one of the get() functions that actually extracts information from the TT, starting at line 138.)
https://github.com/mvanthoor/rustic/blo ... osition.rs

(It's a bit more complicated than the one in Blunder, because it is a generic that can also store PerftData, and later, PawnData.)

I'll try moving the mate adjustment to take the alpha/beta scores into account, but I'm not sure it gains anything.

edit: I've just changed the TT. I get the stored value, and adjust it for mate handling if it's under or over the checkmate threshold. I do so before comparing this value with the incoming alpha / beta values. I'm still not convinced it's going to change anything. A series of moves leading to checkmate, and certainly the checkmate move itself, should be part of the PV, which is always an EXACT score. So I can't see how alpha or beta can be under or over the checkmate threshold. We'll see; I have a 2000 game test running until tomorrow.

algerbrex · Post by **algerbrex** » Thu Sep 23, 2021 4:19 am

Mergi wrote: ↑Thu Sep 23, 2021 1:17 am
Code: Select all
TTEntrySize = 14
I'm not a go programmer, but are you sure this constant is correct? Usually structs would be memory-aligned regardless of programming language. The one you are using would be usually aligned to 16 bytes.

Oops. This completely slipped my mind. I need to check out the Go docs, but you're right, I'm pretty sure the struct will be memory aligned to 16 bytes. The bug I had adjusting mate scores also didn't occur to me as well. Thanks for pointing out both of these things. Looks like Blunder 6.1.0 will have quite a few bug fixes.

algerbrex · Post by **algerbrex** » Thu Sep 23, 2021 4:25 am

mvanthoor wrote: ↑Thu Sep 23, 2021 1:18 am That is in line with my experience and that of others who've recently implemented a TT. I got 41 Elo from the TT cuts, 102 Elo from the move ordering, and 140 from the combination. That was a bit lower than the 160 some other engines were getting... and now I know why. I lost about 20 points due to that mate handling bug.

When I implemented the TT in Rustic Alpha 2, I noticed that some of the utterly devastating mating attacks were gone. I just ascribed this to the TT and its possible instability, etc... but now that I fixed the mate handling, these are back again! I've been watching some of the test games, and Rustic sometimes plays as if it's 1851 again: sacrifice everything, and mate with the last piece that's left.

The reason is that Rustic is a very fast engine compared to most others in its current rating range. These other engines get their rating from an extensive evaluation (i.e., good positional play), where Rustic gets its rating from search depth. So if Rustic doesn't get positionally outplayed early on, there's a high chance that it will see a mating attack long before the opponent does.

To be honest, I'm more happy with Rustic regaining that part of its personality than with the 15-20 Elo points

Very nice to hear! I've always loved watching engines that can launch and successfully execute violent mate attacks. I'll have to take some more time to check out Rustic playing again during my testing.

mvanthoor wrote: ↑Thu Sep 23, 2021 1:18 am Personally I never saw any use in a separate root search. The root is when the number of ply's is 0, so I just have a variable "let is_root = (ply == 0)". When I want something to happen (or not happen) in the root, I just guard it with an if-statement. For me, that's much less confusing than having two almost the same, but not quite identical search functions.

Blunder having a separate root search function is an early artifact of reading about the method in the CPW and never trying anything different. At the time when I first started writing the engine, it seemed like the easiest method. You'll see I used it all the way back in Blunder 1 and never changed it.

mvanthoor wrote: ↑Thu Sep 23, 2021 1:18 am Note: PVS should gain somewhere between 50-55 Elo. At least, it did for me. If you implement PVS, it can be that Aspiration Windows don't work (anymore), or they can interfere with PVS. If you have those, I'd remove them before implementing PVS, and then try re-adding them later, in different stages of development. Sometimes, adding function X will suddenly make function Y work, or work better, but it depends on the engine. (Speed, evaluation function, the specific implementation of functions...)

Right, I remember seeing you reporting your test results on your website. That's the Elo I'm currently aiming for. I'll make those TT fixes you and Mergi pointed out and see if those fix my current TT issues, and then I'll move towards implementing PVS search.

algerbrex · Post by **algerbrex** » Thu Sep 23, 2021 4:40 am

mvanthoor wrote: ↑Thu Sep 23, 2021 1:23 am
Mergi wrote: ↑Thu Sep 23, 2021 1:07 am I took a quick glance at your TT code, and while this probably has nothing to do with your root position troubles, you do have a bug in your TT. When probing the TT and comparing the stored score to your current alpha/beta values, you need to adjust the stored score for mate first, before comparing them (you only adjust afterwards).
That would be the cause of Blunder 5.0.0 missing mates in my tests. This will cause Elo loss in some games. I've seen Blunder 5... eh... blundering around with K+R vs K,
If you fix this bug, it could gain you a sizable amount of rating points. I've missed it in my cursory check of the TT code. (This bug however is not the cause of PVS not working correctly. This bug only makes the engine miss mates in some endgames.)

I like the pun

I knew Blunder's name would be funny eventually...

And I noticed that Blunder would falter in certain endgames, but the reason never occurred to me until now.

mvanthoor wrote: ↑Thu Sep 23, 2021 1:23 am but I didn't report it because I'm not running the newest version. (I'lll test against version 6 when it has a CCRL rating, so I can use it to gauge the strength of the current development version of Rustic.) I did run some preliminary tests against Blunder 6, and I think it has more features than needed to reach a 2150 rating. It is about the same strength as Rustic-dev, and my engine does not yet have null move pruning or any version of futility pruning.

Before it gets a CCRL rating, I really need to release a bugfix, as I've realized Blunder has some pretty glaring bugs. I'm not too busy this week with Uni work, so I'm working towards releasing a bugfix sometime towards the end of this week. So far I know I need to fix how the TT adjusts mate scores, the size of tt entries, loading TT sizes from the UCI protocol. Additionally, I need to bound the history heuristics, as Amanj pointed out to me that during longer time controls, they overflowed and caused many issues.

And the amount of features I've needed to reach certain milestones has bugged me. I wanted to try to break 2000 without any sort of pruning, which I think I did since I estimated that null move pruning gave me ~50 Elo, and Blunder 5 is rated at 2055 Elo. But I largely agree with you, which is why for the time being I've stopped testing new features so I can (1) refractor the search routine and (2) try seeing what Elo gain I'm getting after fixing some bugs.

amanjpro · Post by **amanjpro** » Thu Sep 23, 2021 6:45 am

algerbrex wrote: ↑Thu Sep 23, 2021 4:40 am
mvanthoor wrote: ↑Thu Sep 23, 2021 1:23 am
Mergi wrote: ↑Thu Sep 23, 2021 1:07 am I took a quick glance at your TT code, and while this probably has nothing to do with your root position troubles, you do have a bug in your TT. When probing the TT and comparing the stored score to your current alpha/beta values, you need to adjust the stored score for mate first, before comparing them (you only adjust afterwards).
That would be the cause of Blunder 5.0.0 missing mates in my tests. This will cause Elo loss in some games. I've seen Blunder 5... eh... blundering around with K+R vs K,
If you fix this bug, it could gain you a sizable amount of rating points. I've missed it in my cursory check of the TT code. (This bug however is not the cause of PVS not working correctly. This bug only makes the engine miss mates in some endgames.)
I like the pun I knew Blunder's name would be funny eventually...

And I noticed that Blunder would falter in certain endgames, but the reason never occurred to me until now.

mvanthoor wrote: ↑Thu Sep 23, 2021 1:23 am but I didn't report it because I'm not running the newest version. (I'lll test against version 6 when it has a CCRL rating, so I can use it to gauge the strength of the current development version of Rustic.) I did run some preliminary tests against Blunder 6, and I think it has more features than needed to reach a 2150 rating. It is about the same strength as Rustic-dev, and my engine does not yet have null move pruning or any version of futility pruning.
Before it gets a CCRL rating, I really need to release a bugfix, as I've realized Blunder has some pretty glaring bugs. I'm not too busy this week with Uni work, so I'm working towards releasing a bugfix sometime towards the end of this week. So far I know I need to fix how the TT adjusts mate scores, the size of tt entries, loading TT sizes from the UCI protocol. Additionally, I need to bound the history heuristics, as Amanj pointed out to me that during longer time controls, they overflowed and caused many issues.

And the amount of features I've needed to reach certain milestones has bugged me. I wanted to try to break 2000 without any sort of pruning, which I think I did since I estimated that null move pruning gave me ~50 Elo, and Blunder 5 is rated at 2055 Elo. But I largely agree with you, which is why for the time being I've stopped testing new features so I can (1) refractor the search routine and (2) try seeing what Elo gain I'm getting after fixing some bugs.

I don't do any special handling for mate score in TT in Zahak, and I have never witnessed Zahak miss a single mate. Somehow whenever I try to "fix" this bug, I lose elo

Mergi · Post by **Mergi** » Thu Sep 23, 2021 8:25 am

mvanthoor wrote: ↑Thu Sep 23, 2021 1:48 am Are you sure about that? As far as I've understood, the mate score is always an EXACT score, and if a TT-entry is EXACT, you always return the score. You don't compare the saved score to the incoming score. You do this for alpha and beta however.

Now that you mention it, i'm actually not 100% certain. But there's an easy way to test this. When returning alpha/beta from TT, check whether it is a mating score and display some message. I'll try to do it later today.

Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?