Can principal variation search be worth no Elo?

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 9:27 am

Mergi wrote: ↑Thu Sep 23, 2021 8:25 am
mvanthoor wrote: ↑Thu Sep 23, 2021 1:48 am Are you sure about that? As far as I've understood, the mate score is always an EXACT score, and if a TT-entry is EXACT, you always return the score. You don't compare the saved score to the incoming score. You do this for alpha and beta however.
Now that you mention it, i'm actually not 100% certain. But there's an easy way to test this. When returning alpha/beta from TT, check whether it is a mating score and display some message. I'll try to do it later today.

I made your change of adjusting the TT-score before everything, instead of my version that only adjust for mate scoring for the EXACT flag. There is no difference (90% of test finished):

Code: Select all

Score of Rustic Alpha 3.15.100 vs Rustic Alpha 3.14.100: 1282 - 1295 - 1059 [0.498]
...      Rustic Alpha 3.15.100 playing White: 711 - 588 - 519  [0.534] 1818
...      Rustic Alpha 3.15.100 playing Black: 571 - 707 - 540  [0.463] 1818
...      White vs Black: 1418 - 1159 - 1059  [0.536] 3636
Elo difference: -1.2 +/- 9.5, LOS: 39.9 %, DrawRatio: 29.1 %
3636 of 4000 games finished.

So I suspect I was correct in saying that a mate score is always EXACT.

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 10:16 am

algerbrex wrote: ↑Thu Sep 23, 2021 4:40 am Before it gets a CCRL rating, I really need to release a bugfix, as I've realized Blunder has some pretty glaring bugs. I'm not too busy this week with Uni work, so I'm working towards releasing a bugfix sometime towards the end of this week. So far I know I need to fix how the TT adjusts mate scores, the size of tt entries, loading TT sizes from the UCI protocol. Additionally, I need to bound the history heuristics, as Amanj pointed out to me that during longer time controls, they overflowed and caused many issues.

You released 6.0.0, so CCRL is possibly already testing. You could send Gabor on this forum a message to see if they can postpone the test. If they can't because the tournament already started, just wait until Blunder 6.0.0 appears in the list, and then release 6.1.0. (Or even 7.0.0 with bugfixes and PVS.)

I'm not going back and retro-fixing the TT in Alpha 3 and 2.

Had the same problems with history; I removed them to take a look at them later.

And the amount of features I've needed to reach certain milestones has bugged me. I wanted to try to break 2000 without any sort of pruning, which I think I did since I estimated that null move pruning gave me ~50 Elo, and Blunder 5 is rated at 2055 Elo. But I largely agree with you, which is why for the time being I've stopped testing new features so I can (1) refractor the search routine and (2) try seeing what Elo gain I'm getting after fixing some bugs.

50 Elo for null move seems low. 100 Elo is often reported. Rustic doesn't have any pruning yet. It's still a very basic engine (you can see the feature-set on GitHub). Even before fixing the TT, I would be very disappointed if Rustic 4 (now without Alpha, as it should pass 2000 Elo) would score less than 2130 or even 2150 Elo in the CCRL Blitz list. That version was scoring 2100 - 2210, depending on the engines it tested against. I'll have to redo those tests with the fixed TT. (Running already; will be complete sometime this night.)

I understand that not everybody wants to write a chess engine doing it my way, where I try to get the last Elo-point from a feature before moving to the next one. Even so, here I thought all the code I had up to this point was bug-free, and then I'm accidentally missing a minus sign that costs like 20 Elo. Scatter a few of those mistakes around the engine, and its 100 points weaker than it could have been.

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 4:12 pm

A test has been running throughout the day. It seems the TT mate handling fix doesn't give any advantage against one engine, but it nets the full +20 Elo (self-play) against another. So the gain is probably up to 20 Elo depending on which engine is the opponent.

As said, I also tested Mergi's assertation that every TT-score with any flag (Alpha, Beta or Exact) needed to be adjusted, as compared to my belief that only Exact scores need to be adjusted. This change did not make a difference; neither positive nor negative. Adjusting scores for mate handling for the Alpha or Beta flag types has no effect. I assume an Alpha or Beta flag score is never below or above the mate score threshold, so it'll actually never be adjusted. (It stands to reason, because mating the opponent should be part of the PV, which is an Exact score.)

Enough about the TT mate handling from me. Blunder needs to adjust this first too (and possibly other things such as bucket/entry memory size) before looking into PVS again.

algerbrex · Post by **algerbrex** » Thu Sep 23, 2021 6:30 pm

mvanthoor wrote: ↑Thu Sep 23, 2021 4:12 pm Blunder needs to adjust this first too (and possibly other things such as bucket/entry memory size) before looking into PVS again.

Right, that's the plan right now. I'll be running a test fist to make sure the TT is working correctly, then I'll look back at PVS.

mvanthoor · Post by **mvanthoor** » Thu Sep 23, 2021 6:32 pm

algerbrex wrote: ↑Thu Sep 23, 2021 6:30 pm
mvanthoor wrote: ↑Thu Sep 23, 2021 4:12 pm Blunder needs to adjust this first too (and possibly other things such as bucket/entry memory size) before looking into PVS again.
Right, that's the plan right now. I'll be running a test fist to make sure the TT is working correctly, then I'll look back at PVS.

Good luck; I'll take a look through the code tonight to see if I can spot something fishy... uh wait. This isn't Stockfish, is it? For this engine, I have to spot coding blunders...

Mergi · Post by **Mergi** » Thu Sep 23, 2021 7:38 pm

mvanthoor wrote: ↑Thu Sep 23, 2021 9:27 am
I made your change of adjusting the TT-score before everything, instead of my version that only adjust for mate scoring for the EXACT flag. There is no difference (90% of test finished):
Code: Select all
Score of Rustic Alpha 3.15.100 vs Rustic Alpha 3.14.100: 1282 - 1295 - 1059 [0.498]
...      Rustic Alpha 3.15.100 playing White: 711 - 588 - 519  [0.534] 1818
...      Rustic Alpha 3.15.100 playing Black: 571 - 707 - 540  [0.463] 1818
...      White vs Black: 1418 - 1159 - 1059  [0.536] 3636
Elo difference: -1.2 +/- 9.5, LOS: 39.9 %, DrawRatio: 29.1 %
3636 of 4000 games finished.
So I suspect I was correct in saying that a mate score is always EXACT.

When doing mate puzzles, I'm saving many mate scores to my TT as alphas or betas and i get vastly reduced nodes searched, when adjusting for mate ply. The reason you are not seeing any ELO gains is probably because once there's a mating position, the game is won/lost either way.

I also took a peek at Stockfish's and Ethereal's code, both extremely similar when it comes to TT, and both adjust their scores for mate as well.

Code: Select all

fen 2k4r/1r1q2pp/QBp2p2/1p6/8/8/P4PPP/2R3K1 w - - 1 0

// adjusting every stored score for mate
info depth 2 seldepth 6 nodes 336 time 1 nps 336000 hashfull 0 score cp -94 pv a6a8 b7b8
info depth 3 seldepth 7 nodes 1531 time 2 nps 765500 hashfull 0 score cp -94 pv a6a8 b7b8 c1c6
info depth 4 seldepth 6 nodes 1673 time 2 nps 836500 hashfull 0 score cp 340 pv a6a8 b7b8 c1c6 d7c6
info depth 5 seldepth 6 nodes 1987 time 2 nps 993500 hashfull 0 score cp 340 pv a6a8 b7b8 c1c6 d7c6 a8c6
info depth 6 seldepth 8 nodes 2547 time 3 nps 849000 hashfull 0 score cp 1388 pv a6a8 b7b8 c1c6 d7c7 c6c7 c8d8
info depth 7 seldepth 8 nodes 9784 time 4 nps 2446000 hashfull 0 score cp 1388 pv a6a8 b7b8 c1c6 d7c7 c6c7 c8d8 a8b8
info depth 8 seldepth 10 nodes 700579 time 151 nps 4639596 hashfull 12 score mate 4 pv a6a8 b7b8 c1c6 d7c7 c6c7 c8d8 a8 b8
bestmove a6a8

// only adjusting when exact flag is encountered
info depth 2 seldepth 6 nodes 336 time 2 nps 168000 hashfull 0 score cp -94 pv a6a8 b7b8
info depth 3 seldepth 7 nodes 1531 time 2 nps 765500 hashfull 0 score cp -94 pv a6a8 b7b8 c1c6
info depth 4 seldepth 6 nodes 1673 time 3 nps 557666 hashfull 0 score cp 340 pv a6a8 b7b8 c1c6 d7c6
info depth 5 seldepth 6 nodes 1987 time 3 nps 662333 hashfull 0 score cp 340 pv a6a8 b7b8 c1c6 d7c6 a8c6
info depth 6 seldepth 8 nodes 2547 time 3 nps 849000 hashfull 0 score cp 1388 pv a6a8 b7b8 c1c6 d7c7 c6c7 c8d8
info depth 7 seldepth 8 nodes 9784 time 5 nps 1956800 hashfull 0 score cp 1388 pv a6a8 b7b8 c1c6 d7c7 c6c7 c8d8 a8b8
info depth 8 seldepth 10 nodes 1415502 time 254 nps 5572842 hashfull 12 score mate 4 pv a6a8 b7b8 c1c6 d7c7 c6c7 c8d8 a8 b8
bestmove a6a8

algerbrex · Post by **algerbrex** » Thu Sep 23, 2021 8:42 pm

mvanthoor wrote: ↑Thu Sep 23, 2021 6:32 pm
algerbrex wrote: ↑Thu Sep 23, 2021 6:30 pm
mvanthoor wrote: ↑Thu Sep 23, 2021 4:12 pm Blunder needs to adjust this first too (and possibly other things such as bucket/entry memory size) before looking into PVS again.
Right, that's the plan right now. I'll be running a test fist to make sure the TT is working correctly, then I'll look back at PVS.
Good luck; I'll take a look through the code tonight to see if I can spot something fishy... uh wait. This isn't Stockfish, is it? For this engine, I have to spot coding blunders...

Thanks. Things are looking pretty good so far. As I mentioned in an earlier post, I rewrote the search and fixed TT bugs, and after the testing finished, it looks like the TT is giving me about 195 Elo, which is more than expected:

Code: Select all

Score of Blunder 0.4.0 vs Blunder 0.3.0: 286 - 74 - 57  [0.754] 417
...      Blunder 0.4.0 playing White: 145 - 35 - 29  [0.763] 209
...      Blunder 0.4.0 playing Black: 141 - 39 - 28  [0.745] 208
...      White vs Black: 184 - 176 - 57  [0.510] 417
Elo difference: 194.8 +/- 35.1, LOS: 100.0 %, DrawRatio: 13.7 %
SPRT: llr 2.95 (100.2%), lbound -2.94, ubound 2.94 - H1 was accepted
Finished match

Of course, I left Blunder's tapered evaluation in, which is probably inflating the Elo a bit, but still the results are good, so I'll add in killers, test those, and then add in PVS.

As a side note, I first tried doing the bugfix you suggested, which was to only adjust the mate score of exact entries, which worked fine most of the time, but in sparse endgames, it would cause issues with TT hits and getting no PV move, which crashed Blunder. I then realized that Mergi's suggestion was what I thought I needed, and adjusting the scores before comparing alpha-beta and returning an exact score did the trick.

mvanthoor · Post by **mvanthoor** » Fri Sep 24, 2021 12:47 am

algerbrex wrote: ↑Thu Sep 23, 2021 8:42 pm As a side note, I first tried doing the bugfix you suggested, which was to only adjust the mate score of exact entries, which worked fine most of the time, but in sparse endgames, it would cause issues with TT hits and getting no PV move, which crashed Blunder. I then realized that Mergi's suggestion was what I thought I needed, and adjusting the scores before comparing alpha-beta and returning an exact score did the trick.

Weird. In my engine, I don't have the problem of not having a PV. Adjusting for mate scoring for all TT flag types compared to adjusting only for the Exact type doesn't make any difference. But, if the latter seems to be more correct (also demonstrated by Mergi's last test, which I'll try tomorrow), I'll change it.

PS: What is the SPRT command you use? That stuff is confusing. I seem to get the correct results, but I also got a PM from someone that I had things the wrong way around on my site... (but then how can I get the correct results...?)

algerbrex · Post by **algerbrex** » Fri Sep 24, 2021 2:04 am

mvanthoor wrote: ↑Fri Sep 24, 2021 12:47 am PS: What is the SPRT command you use? That stuff is confusing. I seem to get the correct results, but I also got a PM from someone that I had things the wrong way around on my site... (but then how can I get the correct results...?)

Right now my testing command looks like this:

Code: Select all

cutechess-cli \
-srand $RANDOM \
-engine cmd=blunder-new \
-engine cmd=blunder-old \
-openings file=$HOME/2moves_v2a.pgn format=pgn order=random \
-each tc=inf/5+0.1 option.Hash=64 proto=uci \
-games 2 -rounds 10000 -repeat 2 \
-sprt elo0=1 elo1=5 alpha=0.05 beta=0.05 \
-concurrency 4 \
-ratinginterval 50

I looked around when I was first trying to understand SPRT testing, and after playing around a little myself, I actually ended up settling on the values you use in your test, as they made sense to me and seemed to be working well. So I'd be a little confused as well.

algerbrex · Post by **algerbrex** » Fri Sep 24, 2021 3:35 am

So far testing isn't looking good for PVS, and its still showing up as no gain, even after the rewrite. At least it's not a loss anymore so that's progress

From some cursory observations, PVS seems to be reducing the node count and time-to-depth once again. For example using the original position from this thread, with no PVS:

Code: Select all

info depth 1 score cp 29 time 5 nodes 109
info depth 2 score cp 9 time 53 nodes 710
info depth 3 score cp 22 time 13 nodes 6488
info depth 4 score cp 13 time 17 nodes 31757
info depth 5 score cp 34 time 81 nodes 176965
info depth 6 score cp 31 time 286 nodes 857950
info depth 7 score cp 37 time 1743 nodes 5194817
Bestmove: g1f3
Time: 7500ms

With PVS:

Code: Select all

info depth 1 score cp 29 time 2 nodes 109
info depth 2 score cp 9 time 9 nodes 710
info depth 3 score cp 22 time 10 nodes 7026
info depth 4 score cp 13 time 24 nodes 31294
info depth 5 score cp 34 time 79 nodes 178970
info depth 6 score cp 31 time 285 nodes 835773
info depth 7 score cp 37 time 1662 nodes 5027167
Bestmove: g1f3
Time: 7500ms

I did notice in a couple of positions the node count would spike slightly higher at lower depths, so I'll try experimenting more with disabling PVS then and see how that goes.

Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?

Re: Can principal variation search be worth no Elo?