Can principal variation search be worth no Elo?

algerbrex · Post by **algerbrex** » Fri Oct 15, 2021 8:17 pm

mvanthoor wrote: ↑Fri Oct 15, 2021 6:41 pm
algerbrex wrote: ↑Fri Oct 15, 2021 4:31 pm Update.

...

Although the second test hasn't finished, I think it's fair to say that it's clear the version with Rustic's move scoring code and PVS is a win. So now the challenge will be to figure out where I went wrong scoring moves in my original code, even though I'm sure I spent a good bit of time scanning over my scoring code weeks ago
Good to know you've narrowed down the bug to a single function

Rustic scores it's moves over (almost) the full range of an unsigned 16 bit int, so there should be enough room for heuristics to score below the killers. (That is what the MVV_LVA_OFFSET does: it shifts all the MVV_LVA values upwards, scores the TT-move right above that, and the killers right below.)

Thanks.

Since I'll be adding HH back into my engine eventually, and as it stands right now I might end up nicking part of your move scoring code, to bound the HH using Rusic's move scoring method, was your plan to make sure the maximum score a value from the history table could reach would be MVV_LVA_OFFSET - BiggestPossibleKillerValue? So you would always have the order of:

PV move
MvvLva/Captures
Killers
History Heuristics

This is essentially what I was doing before with HH's to bound them but in a bit of a different way.

mvanthoor · Post by **mvanthoor** » Fri Oct 15, 2021 9:32 pm

algerbrex wrote: ↑Fri Oct 15, 2021 8:17 pm Thanks.

Since I'll be adding HH back into my engine eventually, and as it stands right now I might end up nicking part of your move scoring code, to bound the HH using Rusic's move scoring method, was your plan to make sure the maximum score a value from the history table could reach would be MVV_LVA_OFFSET - BiggestPossibleKillerValue? So you would always have the order of:

PV move
MvvLva/Captures
Killers
History Heuristics

This is essentially what I was doing before with HH's to bound them but in a bit of a different way.

Yes.

Code: Select all

const MVV_LVA_OFFSET: u32 = u32::MAX - 256;

I reserved 256 values to stuff things in above the highest MVV-LVA value. At the moment, the only thing there is the TT-move / PV-move (if any). It gets there because its value is 60, and the highest MVV-LVA value is 55.

The lowest MVV-LVA value sits at MVV_LVA_OFFSET + 15. The 0's in the table are for "capturing" on an empty square (so I don't have to check for this before indexing), and "capturing the king", which should never happen. (Thus there is no value at (say) MVV_LVA_OFFSET, or MVV_LVA_OFFSET + 5.)

Then come the killers, which sit at MVV_LVA_OFFSET - (killer_number * 10). So the first killer sits 10 points below MVV_LVA_OFFSET, the second sits 20 poitns below.

And the heuristics will thus indeed be capped at MVV_LVA_OFFSET - (biggest_killer_nr * 10) - 10 or something. (So probably MVV_LVA_OFFSET - 30 is the highest value the history heuristic can reach.)

algerbrex · Post by **algerbrex** » Mon Oct 18, 2021 7:11 am

Update. And hopefully the last one in this "saga."

So it seems I've finally tracked down the very nasty bug that was causing PVS to fail in Blunder.

Anyhow, before I explain what the bug was, I have to admit to some dumbassery on my part. In my last several posts I reported that cloning Rustic's search code fixed PVS. What I didn't realize is that I copied Rustic's code incorrectly, particularly the move scoring part. In my haste to begin testing, I accidentally made sure killers were always scored above the hash move and good captures, which for whatever reason made PVS very effective.

Why? No clue. The point being that the supposed fix I got from Rustic was unfortunately artificial, and once I fixed this move scoring bug, PVS stopped working again. (So sadly it seems Marcel is still in a 20 Elo debit to me

)

The good news is that I eventually tracked down the actual bug over the weekend. The offending section of code was this:

Code: Select all

// Store the result of the search for this position only if we haven't run out of time.
if !search.Timer.Stop {
	bestMove := NullMove
	if ttFlag == ExactFlag {
		bestMove = pvLine.GetPVMove()
	}
	search.TT.Store(search.Pos.Hash, ply, depth, alpha, ttFlag, bestMove)
}

Specifically, the if statement where I check if the search time is up before saving a TT entry. Which made sense to me when I wrote the code at the time, but out of curiosity, I removed the if statement and tested PVS again. And to my surprise, PVS began working like a charm, and the PVS version was notably stronger than the non-PVS version. Not sure by how much Elo yet, have to finish running the full 2000 game test, but I estimate it's at least 50 Elo, probably more.

I originally added the if statement because I theorized the TT was being filled with useless entries once the time for the current search had run out, so it made no point to store them. But apparently, this hurt the performance of PVS to the point where it became an Elo loss. I'm still not entirely sure why yet, because on paper this shouldn't really matter. But I've learned at this point that like it or not, testing trumps theory.

Over the next couple of days, I'll be adding back in the bits and pieces I stripped from Blunder while debugging PVS, and seeing where it stands. And depending on the results of testing, I'll consider releasing Blunder 7.0.0 sometime soon. Thanks again to everyone that offered help and advice, it was all much appreicated.

mvanthoor · Post by **mvanthoor** » Mon Oct 18, 2021 8:42 am

algerbrex wrote: ↑Mon Oct 18, 2021 7:11 am Update. And hopefully the last one in this "saga."
...

Good going. In the past, I've had several things in Rustic's search where people said "you can remove this, because it doesn't make a difference", but when I removed it (such as storing the "best of the worst" move), the engine immediately lost 50-60 Elo.

After switching the search to fail-soft alpha-beta, there's no "strange" code in there anymore. It just stores the best move with its accompanying hash flag, whatever that move and flag may be. As said, it didn't cause any Elo gain or loss, but the search is (to me) more logical and readable.

algerbrex · Post by **algerbrex** » Mon Oct 18, 2021 3:22 pm

mvanthoor wrote: ↑Mon Oct 18, 2021 8:42 am
algerbrex wrote: ↑Mon Oct 18, 2021 7:11 am Update. And hopefully the last one in this "saga."
...
Good going. In the past, I've had several things in Rustic's search where people said "you can remove this, because it doesn't make a difference", but when I removed it (such as storing the "best of the worst" move), the engine immediately lost 50-60 Elo.

After switching the search to fail-soft alpha-beta, there's no "strange" code in there anymore. It just stores the best move with its accompanying hash flag, whatever that move and flag may be. As said, it didn't cause any Elo gain or loss, but the search is (to me) more logical and readable.

Thanks. I'm still considering whether I should just switch over to failsoft and what would make sense to me in the long term.

What's most surprising to me is the result of the test from last night:

Code: Select all

Score of Blunder_pvs vs Blunder_no_pvs: 1190 - 598 - 212  [0.648] 2000
...      Blunder_pvs playing White: 605 - 293 - 102  [0.656] 1000
...      Blunder_pvs playing Black: 585 - 305 - 110  [0.640] 1000
...      White vs Black: 910 - 878 - 212  [0.508] 2000
Elo difference: 106.0 +/- 15.0, LOS: 100.0 %, DrawRatio: 10.6 %
Finished match

Even understanding that Elo from self-play is inflated, 100+ Elo from PVS still seems to be quite a big gain. I've double-checked my code and testing to make sure I didn't introduce any bugs, and the test results seem legit. So I'll be interested to see what kind of gain Blunder gets in gauntlet testing once I add back in the other features. I suppose I should also note Blunder's evaluation now includes mobility (30 Elo), so that's probably also increasing the PVS gain.

mvanthoor · Post by **mvanthoor** » Mon Oct 18, 2021 4:34 pm

algerbrex wrote: ↑Mon Oct 18, 2021 3:22 pm Even understanding that Elo from self-play is inflated, 100+ Elo from PVS still seems to be quite a big gain. I've double-checked my code and testing to make sure I didn't introduce any bugs, and the test results seem legit. So I'll be interested to see what kind of gain Blunder gets in gauntlet testing once I add back in the other features. I suppose I should also note Blunder's evaluation now includes mobility (30 Elo), so that's probably also increasing the PVS gain.

A more accurate evaluation can make PVS stronger / more effective, just like a better move ordering does. However, 100 Elo does indeed seem a lot. What other features do you have enabled besides PVS in this test run? It could be that you are getting +55 from PVS, and +45 from the mobility feature, which would sound quite plausible.

algerbrex · Post by **algerbrex** » Mon Oct 18, 2021 4:44 pm

mvanthoor wrote: ↑Mon Oct 18, 2021 4:34 pm
algerbrex wrote: ↑Mon Oct 18, 2021 3:22 pm Even understanding that Elo from self-play is inflated, 100+ Elo from PVS still seems to be quite a big gain. I've double-checked my code and testing to make sure I didn't introduce any bugs, and the test results seem legit. So I'll be interested to see what kind of gain Blunder gets in gauntlet testing once I add back in the other features. I suppose I should also note Blunder's evaluation now includes mobility (30 Elo), so that's probably also increasing the PVS gain.
A more accurate evaluation can make PVS stronger / more effective, just like a better move ordering does. However, 100 Elo does indeed seem a lot. What other features do you have enabled besides PVS in this test run? It could be that you are getting +55 from PVS, and +45 from the mobility feature, which would sound quite plausible.

The only difference between the two engines of course is using PVS vs no PVS. But the other features both engines use are:

TT-cuts and move ordering
MVV-LVA move ordering
Tuned PST + Tapered Eval
Tuned mobility (from the most recent dev version)

And of course the basics (ab-pruning, qsearch, etc.)

So my evaluation is quite a bit more complex than Rustic 3.0.0, which if I recall correctly, only had one set of middle-game PST and recorded 56 Elo from PVS in self-play.

When I find the time, I may run another test of PVS vs no PVS, only the evaluation would match Rustic 3.0.0's evaluation. I'd venture to guess the results would be much closer to what you've found.

mvanthoor · Post by **mvanthoor** » Mon Oct 18, 2021 5:02 pm

algerbrex wrote: ↑Mon Oct 18, 2021 4:44 pm So my evaluation is quite a bit more complex than Rustic 3.0.0, which if I recall correctly, only had one set of middle-game PST and recorded 56 Elo from PVS in self-play.

Yes. I'm actively trying to forego evaluation improvements as long as possible (apart from tuning for new tables).

When I find the time, I may run another test of PVS vs no PVS, only the evaluation would match Rustic 3.0.0's evaluation. I'd venture to guess the results would be much closer to what you've found.

That'd be great

And I'll just redo half my XBoard implementation (again) because somehow, the engine now stalls in 3.5% of games played, even when running in UCI-mode

Can't fathom what I did wrong. I'll have to run tests after adding each command/engine feature.

algerbrex · Post by **algerbrex** » Mon Oct 18, 2021 6:37 pm

mvanthoor wrote: ↑Mon Oct 18, 2021 5:02 pm
algerbrex wrote: ↑Mon Oct 18, 2021 4:44 pm So my evaluation is quite a bit more complex than Rustic 3.0.0, which if I recall correctly, only had one set of middle-game PST and recorded 56 Elo from PVS in self-play.
Yes. I'm actively trying to forego evaluation improvements as long as possible (apart from tuning for new tables).

Any particular reason? Or you'd rather just stick to improving search for now?

Personally, I've always been drawn to seeing just how good I can make Blunder with minimal knowledge of chess. And PSTs + mobility seems to strike a pretty nice balance right now.

mvanthoor wrote: ↑Mon Oct 18, 2021 5:02 pm
When I find the time, I may run another test of PVS vs no PVS, only the evaluation would match Rustic 3.0.0's evaluation. I'd venture to guess the results would be much closer to what you've found.
That'd be great

Cool, I'll try to work that in some time this week then and report back

mvanthoor wrote: ↑Mon Oct 18, 2021 5:02 pm And I'll just redo half my XBoard implementation (again) because somehow, the engine now stalls in 3.5% of games played, even when running in UCI-mode Can't fathom what I did wrong. I'll have to run tests after adding each command/engine feature.

Ah, that sucks. Good luck with that. Note sure how the UCI mode could be affected by that? I just know at some point I'll need to implement the XBoard protocol for Blunder.

mvanthoor · Post by **mvanthoor** » Mon Oct 18, 2021 7:19 pm

algerbrex wrote: ↑Mon Oct 18, 2021 6:37 pm Any particular reason? Or you'd rather just stick to improving search for now?]

Personally, I've always been drawn to seeing just how good I can make Blunder with minimal knowledge of chess. And PSTs + mobility seems to strike a pretty nice balance right now.

Improving evaluation can be a rabbit-hole of tinkering with lots of parameters. I'd rather first have a tuner to retune my own PST's, and then add pruning to make the engine see deeper. I intend to add null move, history heuristic (improve move ordering) and probably futility pruning (to cut down on QSearch) first, so the engine reaches something like depth 10-12 in the middle game.

I think I may add a very simple mobility function after that, which just calculates available legal moves == mobility and see what that gives me... but I also may look into other pruning methods first. I don't know yet. I just want to stay away from the tuning parameter thing for some time yet, because each time you add a parameter to the evaluation, you need to retune it.

Cool, I'll try to work that in some time this week then and report back

Did you also test how much mobility-only, without PVS, gave you in strength? (Obviously I don't expect you to test all permutations of all function combinations...)

Ah, that sucks. Good luck with that. Note sure how the UCI mode could be affected by that? I just know at some point I'll need to implement the XBoard protocol for Blunder.

(Some information about Rustic's internal architecture; you can skip this if you want, but it could be useful with regard to making Blunder a multi-protocol engine.)

For XBoard, the engine needs to know about its state, such as "Observing", "Waiting", "Thinking", etc. One very big catch of a state machine is that you have to be very precise where, and when, you change states. I made it so that Rustic's XBoard handler only directly changes Rustic's state when the engine can't do it by itself. Those should only be the inactive states: from Observing to Waiting, and back again. When search is called (either with time/depth controls, or as analysis, in Infinite mode), the search can report that it is "Thinking" (when running with time/depth controls) or "Analyzing" (when running in "Infinte" mode). It can also report when the search has exited (without producing a move), or finished (produced a best move). Thus the engine can switch states automatically, without me having to do this manually.

I suspect I made a mistake somewhere in the state machine, or in one of the functions I added to Rustic to make the state machine possible. The state machine will actually run when in UCI mode, even though UCI doesn't use the information. Still, if the state machine stops the search (due to a bug, race condition, or even by accident, by sending the wrong command due to a copy/paste or auto-complete error), then the UCI mode is affected.

I reverted Rustic to the point just before it had a state machine, and I'll retest this version. If it doesn't stall, I'll have to re-implement the state machine and test after each function I add.

I'm adding functions to the engine, such as state switching, exiting the search without a best move, accepting a completely new position during analysis and restarting the analysis, etc, which can either be used by a protocol, or not. I'm not specifically implementing "UCI" or "XBoard"; the protocols are not intertwined with the engine itself, or the search; they are just commands to use the engine's functionality. Therefore, in Rustic, XBoard is "just" a different command set to make the engine do things. The engine doesn't know if it is running in UCI mode or XBoard mode; it only knows it loaded a communication module called "UCI" or "XBoard", and passes the incoming commands to the appropriate handler. This handler then uses the engine's internal functions to do what is expected.

Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?