PV-move ordering necessary if you have TT-move ordering?

mvanthoor · Post by **mvanthoor** » Sat Jul 03, 2021 9:33 pm

While trying to test some stuff with the TT buckets (having 2, 3, or 4, etc...), I was looking through the code, and saw something in my Store() function:

Code: Select all

    pub fn store(&mut self, verification: u32, data: D, used_entries: &mut usize) {
        let mut low = 0;

        // Run through the bucket and find the index of the entry having
        // the lowest depth so it can be replaced.
        for i in 1..ENTRIES_PER_BUCKET {
            if self.bucket[i].data.depth() < self.bucket[low].data.depth() {
                low = i
            }
        }

        // If the verification in the entry is 0, then it was never used
        // before. Count the use of this entry.
        if self.bucket[low].verification == 0 {
            *used_entries += 1;
        }

        // Store.
        self.bucket[low] = Entry { verification, data }
    }

That is how it is now. Before, this line:

Code: Select all

if self.bucket[i].data.depth() < self.bucket[low].data.depth() { ... }

was this:

Code: Select all

if self.bucket[i].data.depth() < data.depth() { ... }

Difference:

in the first one (as it is now), we're looking through the entries in the bucket, and if the depth in an entry is lower than the one in the entry at index "low", then we replace "low" with that new index. In the end we have the entry with the lowest depth, so we can replace it.

In the second one (which it was in alpha 2 and alpha 3), the depth of the bucket's entries are compared to the depth of the _INCOMING_ entry / data... so we go and replace the LAST entry we encounter that has a lower depth than the one of the _INCOMING_ data. I was comparing the wrong stuff.

Shit.

I KNEW there had to be a bug somewhere, because I fixed the last bug in Rustic AGES ago. To loosely quote Highlander: "There will always be one." Or something like that.

A quick test that is running now seems to point at an Elo increase of about 20 Elo points. (Have to run more games to actually verify. Will run an SPRT-test tonight.) The newest version of Rustic is at least keeping comfortably ahead of the previous one, even though this fix, which just improves the efficiency of the TT, is the single difference between them.

Now I'll have to retest with 2, 3, and 4 buckets and using the TT in QSearch as well.

lithander · Post by **lithander** » Sat Jul 03, 2021 10:57 pm

mvanthoor wrote: ↑Sat Jul 03, 2021 3:03 pm Darn. That took a lot of convincing. Again

Before I implement a feature I need to know why I need it. Just because you have 4 buckets doesn't convince me I need four buckets. Not in a minimal engine. You must admit that your tests also didn't show any real world benefit for buckets. (But now that you've found that bug they probably will) Sure I could have "faith" that so many chess programmers do it and that they are probably smarter than me. I take suggestions and critizism very seriously and usually believe that they have a point and that I'm wrong but then I *still* don't act on faith alone. I want to know where exactly I am wrong. As long as my PV-playing code was giving me an measurable benefit even though it shouldn't have I couldn't just remove it and ignore the issue. This is about my mental models as much as about the engine I'm building.

So, thanks a lot for making this thread! Now I understand much more. And I know now that I can make my TT just a little more complex (and why and how) and don't have to maintain a tri-table for the PV during search anymore. That's a huge simplification. This will reduce LOC and cost me nothing in terms of strength.

... and also, the two-bucket implementation HGM suggested is just beautiful! That simple but powerful stuff is exactly what I'm after for MinimalChess.

hgm · Post by **hgm** » Sat Jul 03, 2021 11:43 pm

lithander wrote: ↑Sat Jul 03, 2021 10:57 pm... and don't have to maintain a tri-table for the PV during search anymore. That's a huge simplification. This will reduce LOC and cost me nothing in terms of strength.

You would still have to do that if you want the engine to print the PV in UCI format. Micro-Max doesn't print a PV, (just the best move of each iteration), and from that I learned that engines that do not print PVs are really very annoying.

lithander · Post by **lithander** » Sun Jul 04, 2021 12:03 am

hgm wrote: ↑Sat Jul 03, 2021 11:43 pm You would still have to do that if you want the engine to print the PV in UCI format. Micro-Max doesn't print a PV, (just the best move of each iteration), and from that I learned that engines that do not print PVs are really very annoying.

Can't I just extract the PV from my now improved TT? Play the best root move (which I can know without a tri-table) and then I make TT lookup for the next move. Rinse repeat. I think if the main goal of my slightly-more complex TT is to not overwrite the PV-moves this should allow me to retrieve the PV, no?

hgm · Post by **hgm** » Sun Jul 04, 2021 12:12 am

You can do that, but I am not sure it is simpler than maintaining a triangular array.

lithander · Post by **lithander** » Sun Jul 04, 2021 12:49 am

Simple is such a ambiguous term. I've been struggling with what exactly it's supposed to mean since I put "minimal" in the name of my first chess engine...

My argument against using a triangular array would be this: The concept of the tri-table is something on top of the rest of the engine that takes a while to explain and understand. It's not much code but you need the mental image of how the different subsections of the array relate to the different PVs that exist at different plys simultaneously. How they get copied around and when and why.

Extracting the PV via TT doesn't introduce any new concepts but even exemplifies how the TT stores the PV in the first place. So it's simpler in a way.

*shrug*

hgm · Post by **hgm** » Sun Jul 04, 2021 7:53 am

I suppose it depends on how you implement it. If you define a local variable pv[MAXPLY] as an array of moves, and pass a pointer to it to the child, so that it can be used to return the PV of that child, I don't think you need much mental imagery to understand what goes on. It would be just a thing returned from Search(), like the score.

Code: Select all

Search(Move *returnPV, ...)
{
  Move childPV[MAXPLY];
  returnPV[0] = ENDOFPV; // by default return empty PV
  ...
    MakeMove(move);
    score = -Search(childPV, ...);
    UnMake()
    if(score > alpha) {
      if(score >= beta) return beta;
      alpha = score;
      returnPV[0] = move; // construct new PV for this node
      int i=0; while((returnPV[i+1] = childPV[i]) != ENDOFPV) i++;
    }
  ...
}

mvanthoor · Post by **mvanthoor** » Sun Jul 04, 2021 1:07 pm

Yesterday I found an article about replacement schemes:

https://www.researchgate.net/publicatio ... ion_Tables

It is an article in which different TT replacement schemes are tested. Because it's the beginning of the 90's, the TT's are between 8K and 1024K, and testing took almost 2 months. In short: when your TT gets bigger, the replacement scheme you choose has less and less impact. I ran a test yesterday:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw 
   0 Rustic Alpha 3.1.114 4E C       1       9    4000   50.1%   28.1% 
   1 Rustic Alpha 3.1.114 2E         9      18    1000   51.3%   29.8% 
   2 Rustic Alpha 3.1.114 3E         7      18    1000   51.0%   27.9% 
   3 Rustic Alpha 3.1.114 1E        -7      18    1000   48.9%   28.7% 
   4 Rustic Alpha 3.1.113 4E B     -11      18    1000   48.4%   26.2% 

4000 of 4000 games finished.

Version "114 4E C" is the current development version. It has fixed the bug mentioned above, and it uses 4 entries/bucket. 1E, 2E, 3E use 1-3 entries, and are also bugfixed. "113 4E B" uses 4 entries/bucket, and has the replacement bug mentioned above.

The transposition table was 32 MB. (Maybe I should have made it even smaller; I may try that.) The replacement scheme is "replace lowest depth."

The differences are tiny, and after 1000 games per engine, the margins are still too large. Still, fixing this bug should gain approximately 11 Elo (4E B scored -11 against 4E C), and then moving from 4 buckets to 2 could gain another 9 (2E scored +9 against 4Ec). That'd be a 20 Elo boost. Even 3E was better than "4E C"; probably the extra bucket costs too much time to go through.

I also want to test the TwoDeep implementation from the article, which works similar to killers. The first entry in the bucket is the most important.

Code: Select all

if incoming_data.depth() > bucket[0].depth() {
	bucket[1] = bucket[0];		// Shift entry 0 to entry 1
	bucket[0] = incoming_data	// Store new data at entry 0
}
else {
	bucket[1] = incoming_data
}

It ditches looping over multiple entries, but it has an extra data copy. I have no idea which is faster.

hgm wrote: ↑Sun Jul 04, 2021 7:53 am I suppose it depends on how you implement it. If you define a local variable pv[MAXPLY] as an array of moves, and pass a pointer to it to the child, so that it can be used to return the PV of that child, I don't think you need much mental imagery to understand what goes on. It would be just a thing returned from Search(), like the score.

That's how I do it, but I use a vector instead of an array with pointers.

hgm · Post by **hgm** » Sun Jul 04, 2021 1:53 pm

No such thing as a 'vector' in C...

Running games is an incredibly noisy and thus inefficient way for measuring this, as it is basically just a speed improvement. So it is much easier to measure average time to depth on a representative set of test positions. Then you can make plots like these:

lithander · Post by **lithander** » Tue Jul 06, 2021 1:34 am

hgm wrote: ↑Sat Jul 03, 2021 12:31 pm You must decide for yourself which approach conforms best to your measure of simplicity:

1) Putting extra code in Search() to follow the PV from the triangular array in the first branch of the tree, next to probing the TT and using the TT move.
2) Stuffing the PV from the triangular array back into the TT after each search.
3) Using a TT with buckets of 2, and replace the least deep of those.
4) Just require a very large TT and ignore the problem.

I am biased in favor of (3), as it does not only solve the PV problem in the first branch, but increases the efficiency of the TT through the entire tree. (But, in the limit of very large TT, this advantage of course disappears entirely.) In itself 'shallowest-of-two' replacement is not very complex; it just requires two extra lines of code for manipulating the index to which you map a position:
Code: Select all
index = KEY_TO_INDEX(haskKey);
signature = KEY_TO_SIGNATURE(hashKey); // can, but doesn't have to be entire key
if(TT[index].signature != signature) index ^= 1; // try other entry in 'bucket' if not in first one
if(TT[index].signature == signature) {
  // treat hash hit
} else { // hash miss; prepare replacement
  if(TT[index].depth > TT[index^1].depth) index ^= 1; // prepare to replace least deep entry in bucket
}

So I have tried solution (2) first where I removed the code that was playing from the PV array directly. Instead the PV of the last iteration is now stored again in the hash before going into Search again to make sure it's available from the TT. Less code and a bit faster, too. In self play against the previous version with 50 MB of hash the ELO gain after 2000 games was +5 ELO (+/- 13) so I reduced the hash size to 5 MB and got +15.1 (+/- 12.8). I also saved a few LOC.

So that is version 0.5.1 now and for 0.5.2 I wanted to test the 2-bucket implementation you suggested where each bucket can also be directly indexed. It was super convenient to add as I just had to change the Index(ulong hash) method really. Extracting the PV from the hash-table recursively is also only a few lines and I can get rid of all tri-table related code. This reduces the LOC from 710 LOC in version 0.5 to 676 LOC. A nice reduction in size.

It also seems to perform really well. There are less position getting kicked out. The tree looks significantly smaller than before, too.

Code: Select all

  
1. [X] f6f7 g8f7 e1e4 e8e4 f3g5 f7g7 g5e4 d7c6 f1g2 c8e6 f2c2 = +282.00, 4130K nodes, 3605.1517ms
    TT Density: 27% TT Overwrites: 8%
   2. [X] c6f3 d1f3 h8h2 g2h2 e5f3 h2g2 f3d4 c3b5 e6e5 b5d4 e5d4 = -128.00, 1633K nodes, 1326.4345ms
    TT Density: 16% TT Overwrites: 4%
   3. [X] b3b5 c6b5 c7c8Q e8f7 c8e6 f7e6 d5c7 e6e7 c7b5 a7a5 b5d4 = +177.00, 2883K nodes, 2742.6361ms
    TT Density: 32% TT Overwrites: 9%
   4. [X] c8c5 b4c5 d5d3 d2d3 c4c5 g1h1 c5a7 d3d6 a7a4 f1c1 a4e4 = -292.00, 7099K nodes, 5789.4015ms
    TT Density: 46% TT Overwrites: 14%
   5. [X] c5f2 g1f2 f6g4 f2f1 g4h2 f1g1 c7g3 e2e7 d7e6 e7e8 g8g7 = -154.00, 18798K nodes, 15526.8088ms
    TT Density: 83% TT Overwrites: 35%
   6. [X] g5f6 d8f6 h6g4 f6f3 g2f3 e5g5 f3e4 g5g4 g1h1 g4e4 b2b4 = +30.00, 5083K nodes, 4296.778ms
    TT Density: 35% TT Overwrites: 10%
   7. [X] g1g7 f6g7 f5f7 g8h8 c1g1 b7g2 g1g2 e8g8 e5g6 = +615.00, 9733K nodes, 9292.0181ms
    TT Density: 53% TT Overwrites: 18%
   8. [X] c8g4 f3f2 d5d4 a1e1 d4e3 f1e3 f6d5 d3f1 h3h5 f1c4 g4f3 = -434.00, 9170K nodes, 7894.0597ms
    TT Density: 54% TT Overwrites: 18%
   9. [X] h5h3 b2b3 h3h8 d2f1 g5g4 f3f4 e5d3 f2e2 e6e5 f1e3 e5f4 = -267.00, 34484K nodes, 28186.6371ms
    TT Density: 94% TT Overwrites: 48%
  10. [X] g5c1 e1c1 c8c3 c1d2 c3c1 d2c1 d4e2 g1f1 e2c1 b7d5 c1d3 = -64.00, 2246K nodes, 1953.3546ms
    TT Density: 22% TT Overwrites: 6%
  11. [X] f6g4 f3g4 h8h4 g3f1 b8h8 c4e3 h4h2 f1h2 h8h2 g2f3 = -169.00, 59987K nodes, 45803.6773ms
    TT Density: 98% TT Overwrites: 56%
  12. [X] c6e7 g8g7 e7g6 h7g6 h4e7 g7g8 e7f8 g8f8 e5g6 f8f7 g6f4 = +332.00, 5736K nodes, 4793.2817ms
    TT Density: 41% TT Overwrites: 13%
  13. [X] f4d5 e7d5 f3f6 d5f6 c1h6 b7b5 c4d3 f6g4 h6f8 g8f8 e4e5 = +192.00, 6469K nodes, 5740.1143ms
    TT Density: 45% TT Overwrites: 14%

If you clear the TT after every move, that is all there is to it, and it is probably simpler and less bug-prone than solution (1). But if you don't clear the TT, the deepest entry in each bucket will eventually get filled with results from deep searches on positions earlier in the game that are no longer reacheable. So that your current search effectively can use only one entry of each bucket. Then you are back in the always-replace situation, but with only half the table size. Therefore people that use such 'depth protection' in their replacement schemes usually combine it with 'aging', where they keep track (in the TT) of whether an entry is used in a search. And if it has not been used for a number of searches, it gets cleared or replaced.

...let's say I want to keep 'age' out of my hash entries to keep them simple...

If you don't want to keep track of aging in the entry, you could erase them 'stochastically' at such a low rate that on average they wil survive for, say, 5 searches. Or just keep track of how many entries of each depth are in the table, and overwrite entries of that depth when there get too many of those.

The positions are randomly placed all over the table so do I really have to randomly delete them? Can't I just delete 1/5th of the table after each searched concludes. That way after 5 searched moves all positions have been cleared exactly once.

...and another idea: After a move was searched to depth X the opponent plays and when it's my turn again two moves have been played. So I can be pretty sure that all positions with depth >= X - 2 are not going to be useful anymore and I could just explicitly purge those. Smaller depth positions might not get purged that way but could get overwritten organically. Would that be enough or would these medium-depth positions still clot up the TT with unreachable positions?

PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?

Re: PV-move ordering necessary if you have TT-move ordering?