Can principal variation search be worth no Elo?

algerbrex · Post by **algerbrex** » Wed Oct 20, 2021 2:22 pm

mvanthoor wrote: ↑Wed Oct 20, 2021 11:37 am You save entries into the TT in alpha-beta, not in iterative deepening (I hope). I don't check if time is up before saving an entry into the TT; I check this at the beginning of alpha-beta. The problem you now have is this:

- You do alpha-bèta
- After the move loop, you write into the TT
- Somewhere along the line, time's up
- And then the recursive alpha-beta unwinds
- But when you should be saving to the TT you don't, because the time is up.

If you are saving into the TT in the bèta cutoff (beta-flag) and after the move loop (alpha-flag), then you're missing all the alpha-flag saves. If you are saving only once (like me, with fail-soft), you miss a huge number of TT-writes.

Right, I only save entires in alpha-beta. I'm not even sure how one only saves them in the ID loop.

And am I actually missing a huge amount of TT entries? I didn't realize that. My thinking was that entries after time is up weren't going to be worth anything, since I was simply returning a value of zero to unwind the alpha-beta call stack.

In other words, when I imagine my engine is running alpha-beta at some depth D, and it realizes it's out of time, it now returns zero and aborts the search early. And the value from this aborted search propagates up to D-1, which may or may not finish its move loop, and then returns to D-2, and so on. But the important point to me here is that whatever I return from the node at depth D when I realize I'm out of time, is going to change the overall results of the search, which means we can't necessarily trust entries being written to the TT once our time is up. A beta-cutoff might not actually be a beta-cutoff. Or a node that fails-low might not actually fail-low if we had finished searching.

mvanthoor wrote: ↑Wed Oct 20, 2021 11:37 am You shouldn't be checking if the time is up before you write to the TT. You need to allocate something like 50-200ms (depending on how fast your engine is) of headroom. So if you calculate a time slice of 800ms for a move, you subtract 100ms. Then you can spend the slice of 700ms for searching, and then you have 100ms to unwind alpha-beta (AND write to the TT) when time is up.

Hadn't necessarily thought about that scheme, but it makes sense to me and sounds like it would work well.

mvanthoor wrote: ↑Wed Oct 20, 2021 11:37 am Strange. I'll do a time-up check before writing to the TT then, and see if my engine increases 170 Elo in playing strength. I still think you're doing something strange here. You _SHOULD_ check if you've broken iterative deepening because of time-up, because you will have to ignore that run and send the previously saved best move to the GUI.

Perhaps I am doing something odd here, but making sure time is up to prevent garbage entries into TT makes sense to me inuitvely.

mvanthoor wrote: ↑Wed Oct 20, 2021 11:37 am Couldn't you try my search and move ordering code again, and see if this works with PVS; including the time-up check in the same spot as were it is in my code?

Sure, I could. But if I wrote things correctly this latest time, which I believe I did, then I made sure to copy your time-check up code too.

On a related note, I switched Blunder over to fail-soft last night, in alpha-beta and qsearch, and it seems like it made a difference, and PVS was a ~32 Elo win, even when checking if time was up before saving to the TT. Not sure why.

mvanthoor · Post by **mvanthoor** » Wed Oct 20, 2021 4:46 pm

algerbrex wrote: ↑Wed Oct 20, 2021 2:22 pm ...

Hm... now I'm not sure anymore with regard to losing TT-information if you block TT-writing on time-up. I put an "if not time up" guard around the TT-write, so it wouldn't do a TT-write on time-up anymore. It made absolutely no difference:

Code: Select all

Score of Rustic Alpha 3.18.100 TU vs Rustic Alpha 3.18.100 XB: 371 - 365 - 264 [0.503]
...      Rustic Alpha 3.18.100 TU playing White: 199 - 171 - 130  [0.528] 500
...      Rustic Alpha 3.18.100 TU playing Black: 172 - 194 - 134  [0.478] 500
...      White vs Black: 393 - 343 - 264  [0.525] 1000
Elo difference: 2.1 +/- 18.5, LOS: 58.8 %, DrawRatio: 26.4 %
1000 of 1000 games finished.

TU = time up version
XB = current development version (for the XBoard protocol)

There are lots of ways to structure alpha/beta, and several opinions on where to put things such as the return on draw, or the return on time-up. There are also many ways to structure / terminate alpha-beta in the wrong way, which will obviously either cost playing strength, or just break the program.

Personally, I found the current fail-soft version with the untangled beta and alpha parts the most readable one: no nested beta and alpha parts, only one value to return (best_score), and only one place to write into the TT (after the move loop, writing the best_score / best_move combination that was found in the move loop).

algerbrex · Post by **algerbrex** » Wed Oct 20, 2021 6:33 pm

mvanthoor wrote: ↑Wed Oct 20, 2021 4:46 pm Hm... now I'm not sure anymore with regard to losing TT-information if you block TT-writing on time-up. I put an "if not time up" guard around the TT-write, so it wouldn't do a TT-write on time-up anymore. It made absolutely no difference:
Code: Select all
Score of Rustic Alpha 3.18.100 TU vs Rustic Alpha 3.18.100 XB: 371 - 365 - 264 [0.503]
...      Rustic Alpha 3.18.100 TU playing White: 199 - 171 - 130  [0.528] 500
...      Rustic Alpha 3.18.100 TU playing Black: 172 - 194 - 134  [0.478] 500
...      White vs Black: 393 - 343 - 264  [0.525] 1000
Elo difference: 2.1 +/- 18.5, LOS: 58.8 %, DrawRatio: 26.4 %
1000 of 1000 games finished.
TU = time up version
XB = current development version (for the XBoard protocol)

I'd imagine that at the very least, adding in such code wouldn't cause too many issues.

mvanthoor wrote: ↑Wed Oct 20, 2021 4:46 pm There are lots of ways to structure alpha/beta, and several opinions on where to put things such as the return on draw, or the return on time-up. There are also many ways to structure / terminate alpha-beta in the wrong way, which will obviously either cost playing strength, or just break the program.

Personally, I found the current fail-soft version with the untangled beta and alpha parts the most readable one: no nested beta and alpha parts, only one value to return (best_score), and only one place to write into the TT (after the move loop, writing the best_score / best_move combination that was found in the move loop).

Yeah, I've found that to be true as well. I've settled on fail soft right now since it's clean and makes PVS work (for whatever reason). So I'll call it a wrap for now.

P.S: Is the current version of Rustic in your main branch up to date? I downloaded the latest version of Rustic from the main branch and compiled it myself since I thought it would be useful in gauntlet testing since Rustic has always given Blunder some trouble.

mvanthoor · Post by **mvanthoor** » Wed Oct 20, 2021 6:59 pm

algerbrex wrote: ↑Wed Oct 20, 2021 6:33 pm Yeah, I've found that to be true as well. I've settled on fail soft right now since it's clean and makes PVS work (for whatever reason). So I'll call it a wrap for now.

Fail-soft is a tiny bit better than fail-hard, because it saves more accurate evaluation scores. The change is so small however, that it is completely obliterated by PVS, which just cuts out the less accurate scores. (That's the reason why PVS gets better when your evaluation gets better: it can prove faster that moves are bad, and thus the tree gets smaller.)

The transition from fail-hard to fail-soft should not make PVS go from broken to working. The only thing I can imagine is that the transition fixed a bug in your fail-hard implementation. Somewhere...

P.S: Is the current version of Rustic in your main branch up to date? I downloaded the latest version of Rustic from the main branch and compiled it myself since I thought it would be useful in gauntlet testing since Rustic has always given Blunder some trouble.

Thanks for using Rustic in testing. The main branch (master) wasn't up to date. I've been using the main branch as a dev-branch, which isn't the best practice ever. I've just straightened this out.

- "main/master" is now at version Alpha 3.0.0, which is the latest release.
- "dev" tracks the latest development version (at this time, 3.18.100)
- "ref_*" are branches for reference versions. If the development version turns out to be stable (no crashes, no time forfeits, no stalls) and it doesn't lose Elo against the previous ref_* version, it gets its own "ref_" branch.
- "XBoard" is the latest bleeding edge branch in which I'm writing and testing the XBoard module. Each time when an addition is made there and tested working, I'll merge it into the dev version.

If you want to test the latest version, you can switch to the "dev" branch (which is now the same as ref_3.18.100) and compile that. First fetch the changes from the repository (or completely re-clone it) because I just changed all that before dinner.

Expected playing strength should be around 2150 - 2170. The performance is 2130 - 2230 depending on which engine Rustic is playing against.

The difference between Alpha 3.0.0 (latest release) and 3.18.100 are:
- Tapered / Tuned evaluation (using MinimalChess tables for now), which gave +250 Elo
- Some code optimization, which gave +50 Elo.

So a 300 Elo improvement over Alpha 3.0.0.

The rest is all code refactoring, better Rust-style and less C-style, and XBoard implementation.

algerbrex · Post by **algerbrex** » Wed Oct 20, 2021 7:16 pm

mvanthoor wrote: ↑Wed Oct 20, 2021 6:59 pm Yeah, I've found that to be true as well. I've settled on fail soft right now since it's clean and makes PVS work (for whatever reason). So I'll call it a wrap for now.

Fail-soft is a tiny bit better than fail-hard, because it saves more accurate evaluation scores. The change is so small however, that it is completely obliterated by PVS, which just cuts out the less accurate scores. (That's the reason why PVS gets better when your evaluation gets better: it can prove faster that moves are bad, and thus the tree gets smaller.)

The transition from fail-hard to fail-soft should not make PVS go from broken to working. The only thing I can imagine is that the transition fixed a bug in your fail-hard implementation. Somewhere...
[/quote]

I agree, it was a mystery to me as well. Oh well I suppose.

mvanthoor wrote: ↑Wed Oct 20, 2021 6:59 pm Thanks for using Rustic in testing. The main branch (master) wasn't up to date. I've been using the main branch as a dev-branch, which isn't the best practice ever. I've just straightened this out.

- "main/master" is now at version Alpha 3.0.0, which is the latest release.
- "dev" tracks the latest development version (at this time, 3.18.100)
- "ref_*" are branches for reference versions. If the development version turns out to be stable (no crashes, no time forfeits, no stalls) and it doesn't lose Elo against the previous ref_* version, it gets its own "ref_" branch.
- "XBoard" is the latest bleeding edge branch in which I'm writing and testing the XBoard module. Each time when an addition is made there and tested working, I'll merge it into the dev version.

If you want to test the latest version, you can switch to the "dev" branch (which is now the same as ref_3.18.100) and compile that. First fetch the changes from the repository (or completely re-clone it) because I just changed all that before dinner.

Expected playing strength should be around 2150 - 2170. The performance is 2130 - 2230 depending on which engine Rustic is playing against.

The difference between Alpha 3.0.0 (latest release) and 3.18.100 are:
- Tapered / Tuned evaluation (using MinimalChess tables for now), which gave +250 Elo
- Some code optimization, which gave +50 Elo.

So a 300 Elo improvement over Alpha 3.0.0.

The rest is all code refactoring, better Rust-style and less C-style, and XBoard implementation.

Thanks. I'll make sure to switch over, download the dev branch, and recompile rustic to make sure I'm using it. I was in the process of testing it against the latest dev version of Blunder, which I estimate is between 2250 and 2300. So from some preliminary testing, your estimate of Rustic's current strength sounds pretty accurate to me.

Once I wrap up testing to get a good estimate of Blunder's current strength and implement some other user-friendly features, I'll probably release Blunder 7.0.0. And for the next several versions after that, I'll probably release versions 7.1.0, 7.2.0, 7.3.0, etc. with each version having 1-2 new features.

Mergi · Post by **Mergi** » Wed Oct 20, 2021 7:36 pm

The best (least error prone) place to check for aborted search would be right after you unmake a move in the main move loop. Then you dont need to worry about anything else. When you start unwinding the recursion, all the parent nodes will immediately perform the same check as well and return.

Btw, fail soft AB implementation is quite nice for improving your static evaluation as well. If you store a score in TT but its not good enough for a cutoff (or the depth isnt sufficient), you can still use that score to make your static eval more precise. Not a huge difference, but it can help with some pruning techniques that use static eval.

mvanthoor · Post by **mvanthoor** » Wed Oct 20, 2021 9:23 pm

algerbrex wrote: ↑Wed Oct 20, 2021 7:16 pm Once I wrap up testing to get a good estimate of Blunder's current strength and implement some other user-friendly features, I'll probably release Blunder 7.0.0. And for the next several versions after that, I'll probably release versions 7.1.0, 7.2.0, 7.3.0, etc. with each version having 1-2 new features.

Did you write the tuner directly into Blunder, or as a seperate project? You need the evaluation function, so as to not write that twice (and keep two copies), I'll probably make "--tuning <file.epd>" option for Rustic, and write the tuner directly into the engine and then move it to the "extra" module.

algerbrex · Post by **algerbrex** » Wed Oct 20, 2021 9:30 pm

mvanthoor wrote: ↑Wed Oct 20, 2021 9:23 pm
algerbrex wrote: ↑Wed Oct 20, 2021 7:16 pm Once I wrap up testing to get a good estimate of Blunder's current strength and implement some other user-friendly features, I'll probably release Blunder 7.0.0. And for the next several versions after that, I'll probably release versions 7.1.0, 7.2.0, 7.3.0, etc. with each version having 1-2 new features.
Did you write the tuner directly into Blunder, or as a seperate project? You need the evaluation function, so as to not write that twice (and keep two copies), I'll probably make "--tuning <file.epd>" option for Rustic, and write the tuner directly into the engine and then move it to the "extra" module.

I wrote the tuner directly into Blunder. It's under a folder called "tuner" in the main project folder. And you're correct. I had no intention of rewriting my evaluation, so I made sure the tuner was designed in such a way that it could use the evaluation function already in evaluation.go. So what you're planning to do sounds about like the approach I, and others have taken.

I was lucky that the PST I tuned using my tuner performed better than the ones I got from MinimalChess, but only because I had been using the wrong ones, which were tuned to work with a 13th mobility-based table that Blunder doesn't have.

But over all I'm happy with what I have, even though the initial tuning session took about 8-12 hours if I remember correctly. I think a lot of this is simply due to my crappy hardware. I'm currently running all of my projects on my Pixelbook I use for school, which has an i5 processor, 4 cores, and 8 gigs of ram. I'm hoping I can upgrade this soon to speed up the tuning and testing process.

mvanthoor · Post by **mvanthoor** » Wed Oct 20, 2021 10:13 pm

algerbrex wrote: ↑Wed Oct 20, 2021 9:30 pm I wrote the tuner directly into Blunder. It's under a folder called "tuner" in the main project folder. And you're correct. I had no intention of rewriting my evaluation, so I made sure the tuner was designed in such a way that it could use the evaluation function already in evaluation.go. So what you're planning to do sounds about like the approach I, and others have taken.

I was lucky that the PST I tuned using my tuner performed better than the ones I got from MinimalChess, but only because I had been using the wrong ones, which were tuned to work with a 13th mobility-based table that Blunder doesn't have.

Hm. I'm also using the MinimalChess 0.5 tables; and Rustic also doesn't have a mobility table. So it would be great if a re-tune can add another 20-30 points to Rustic. I might even be able to hit 2200 on CCRL Blitz without adding any other features but a tapered and tuned eval compared to Alpha 3.0.0

(When paired against the "correct" engines; against some engines, Rustic-dev performs at around 2130 Elo, while against others, it performs at around 2230.)

But over all I'm happy with what I have, even though the initial tuning session took about 8-12 hours if I remember correctly. I think a lot of this is simply due to my crappy hardware. I'm currently running all of my projects on my Pixelbook I use for school, which has an i5 processor, 4 cores, and 8 gigs of ram. I'm hoping I can upgrade this soon to speed up the tuning and testing process.

I intend to build a separate system for running tests. Something with a an AMD 5700G (8-core) most likely. Or I might go for a Tuxedo notebook, but I'm always hesitant to run such workloads on laptops because of cooling and noise levels.

Rust has a library called "Rayon", which can parallelize iterators by replacing ".iter()" with ".par_iter()". If I can write the tuner using iterators, I may be able to use this library to parallelize it over all four cores of my oldish i7-6700K, which would cut tuning time to 25%. Maybe Go has something similar?

I assume you wrote the tuner according to the pseudo-code on CPW?

algerbrex · Post by **algerbrex** » Wed Oct 20, 2021 11:09 pm

mvanthoor wrote: ↑Wed Oct 20, 2021 10:13 pm Hm. I'm also using the MinimalChess 0.5 tables; and Rustic also doesn't have a mobility table. So it would be great if a re-tune can add another 20-30 points to Rustic. I might even be able to hit 2200 on CCRL Blitz without adding any other features but a tapered and tuned eval compared to Alpha 3.0.0 (When paired against the "correct" engines; against some engines, Rustic-dev performs at around 2130 Elo, while against others, it performs at around 2230.)

I believe from retuning I got about 5-ish Elo, so not too much. But I only went through 1-2 tuning sessions, on the two halves of the Zurichess quiet position training set. So I'm sure 10-20 Elo from retuning is certainly possible.

And hitting 2200 on the CCRL would be quite nice with only Rustic's current feature set and better tables

PSTs are surprisingly powerful (at least they were for me).

mvanthoor wrote: ↑Wed Oct 20, 2021 10:13 pm I intend to build a separate system for running tests. Something with a an AMD 5700G (8-core) most likely. Or I might go for a Tuxedo notebook, but I'm always hesitant to run such workloads on laptops because of cooling and noise levels.

Rust has a library called "Rayon", which can parallelize iterators by replacing ".iter()" with ".par_iter()". If I can write the tuner using iterators, I may be able to use this library to parallelize it over all four cores of my oldish i7-6700K, which would cut tuning time to 25%. Maybe Go has something similar?

I assume you wrote the tuner according to the pseudo-code on CPW?

I'd eventually like to build something similar, so I can use my Pixelbook for programming and schoolwork, and the other system for running tests and parameter tuning.

And one nice thing about go is that it's quite simple to use multithreading via goroutines, but that's a part of the language I'd need to learn a bit more about before I used it in any extensive manner. But I'm sure I could eventually get that up and running so I could speed up the tuning process.

And yup copied the naive implementation from CPW. So I already knew it'd be slow, which I wasn't too worried about for this initial version since the goal was to get a working tuner, not a fast one. But a faster tuner will definitely be needed in the future to speed up the development process.

Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?