My transposition tables make things slower

Discussion of chess software programming and technical issues.

Moderator: Ras

HemiMG
Posts: 4
Joined: Wed Dec 23, 2020 1:52 am
Full name: John Garrison

My transposition tables make things slower

Post by HemiMG »

Hi everyone.
I just typed a much longer explanation of this, but the site keeps logging me out and making me do Cloudflare captchas and I lost it trying to submit it. My transposition tables are slowing down the chess engine I'm working on. I've hacked them into the perft test to be sure they are the problem. Sure enough, the perft test to depth of 6 takes three times longer with the transposition tables than without them. And the result is wrong. I've tested to ensure that my board is keeping the zorbist keys accurate as it makes and unmakes moves. So I don't think that's the problem.

I've attached what I believe to be the relevant code below. If anyone sees any glaring errors that would cause slowdowns and inaccuracies, I'd love to hear about them.

Code: Select all

typedef struct {
    Key key; // 64-bits
    Move best_move; // 32-bits
    Score score; // 16-bits
    Depth depth; // 8-bits
    NodeInfo node_info; // 8-bits
} TTEntry;

typedef struct {
    TTEntry age_bucket;
    TTEntry depth_bucket;
} TTBucket;

TTEntry *TTable::GetEntry(uint64_t hash)
{
    int index = hash & hash_mask;

    if (table[index].age_bucket.key == hash)
        return &table[index].age_bucket;
    else if (table[index].depth_bucket.key == hash)
        return &table[index].depth_bucket;
    else
        return nullptr;
}

void TTable::StoreEntry(uint64_t hash, Move best_move, Score score, Depth depth, NodeInfo node_type)
{
    int index = hash & hash_mask;

    TTBucket& bucket = table[index];
    if (depth >= bucket.depth_bucket.depth || generation != GetNodeGeneration(bucket.depth_bucket.node_info))
    {
        TTEntry *entry = &bucket.depth_bucket;
        entry->key = hash;
        entry->best_move = best_move;
        entry->score = score;
        entry->depth = depth;
        entry->node_info = SetNodeType(entry->node_info, node_type);
        entry->node_info = SetNodeGeneration(entry->node_info, generation);
    } else {
        TTEntry *entry = &bucket.age_bucket;
        entry->key = hash;
        entry->best_move = best_move;
        entry->score = score;
        entry->depth = depth;
        entry->node_info = SetNodeType(entry->node_info, node_type);
        entry->node_info = SetNodeGeneration(entry->node_info, generation);
    }
}

uint64_t Game::Perft(int depth)
{
    TTEntry* entry = t_table->GetEntry(board.key);
    if (entry != nullptr && entry->depth >= depth)
    {
        return entry->score;
    }

    MoveList move_list;
    uint64_t nodes = 0;

    if (depth == 0)
        return 1ull;

    move_list = board.GeneratePseudoLegalMoves();

    for (auto move_info : move_list)
    {
        board.MakeMove(move_info.move);
        if (!board.IllegalMove())
        {
            nodes += Perft(depth-1);
        }
        board.UnmakeMove(move_info.move);
    }

    t_table->StoreEntry(board.key, nodes, nodes, depth, nodes);

    return nodes;
}

void Game::DividedPerft(int depth)
{
    MoveList move_list;
    nodes_searched = 0;

    move_list = board.GeneratePseudoLegalMoves();

    auto start1 = std::chrono::high_resolution_clock::now();

    for (auto move_info : move_list)
    {
        board.MakeMove(move_info.move);
        if (!board.IllegalMove())
        {
            uint64_t move_nodes = Perft(depth-1);
            nodes_searched += move_nodes;
            std::cout << BuildUCIMove(move_info.move) << " " << move_nodes << "\n";
        }

        board.UnmakeMove(move_info.move);
    }

    auto finish1 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = finish1 - start1;
    std::cout << "\nNodes searched: " << nodes_searched << " \n";
    std::cout << "Time taken: " << elapsed.count() << " \n\n";
}
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: My transposition tables make things slower

Post by hgm »

That is quite normal; probing the hash table is a very slow operation. Micro-Max 1.6, which has no hash table, has about 6 times larger nps than micro-Max 5.0, which has one.
User avatar
Bo Persson
Posts: 256
Joined: Sat Mar 11, 2006 8:31 am
Location: Malmö, Sweden
Full name: Bo Persson

Re: My transposition tables make things slower

Post by Bo Persson »

HemiMG wrote: Wed Sep 27, 2023 6:27 am My transposition tables are slowing down the chess engine I'm working on.
Yes, you are doing a lot more work for each node. That takes time.

The idea is that the transposition table should save you from researching branches, and thus save expanding a lot of nodes. When it works properly, it is a net win (for a search, not for perft).
User avatar
Ronald
Posts: 161
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: My transposition tables make things slower

Post by Ronald »

If you want to use the hashtable for perft calculations you can only use a hashtable entry if the depth of the entry exactly matches the current depth, otherwise you will mix different depth counts for your position which will create wrong results.
if (entry != nullptr && entry->depth >= depth)
Using the hashtable will slow down your basic speed, but will speedup perft itself a lot, that is an import part of the idea of using hashtables!
User avatar
Ronald
Posts: 161
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: My transposition tables make things slower

Post by Ronald »

When I got my Threadripper 3990x (64 cores/128 threads) a few years ago, I created a multithreaded version which used a hashtable, to test the speed of the processor just for fun. After some experimenting it produced some very impressive results. I also learned 2 things from it: The XOR trick works (XORing the hashkey with the data to make sure hashkey and data belong together) and I needed more than 64 bits for the hashkey to avoid different positions with identical hash. The nice thing is that if something is not working correctly you will always end up with a wrong perft result, with a full engine an error often only leads to worse play.

Triggered by this I just run the multi threaded perft on my 3990x again with 127 threads and hashtable of 16G. The results are still impressive to me, so I thought I'd share them (normal speed 150M nodes per second):
Time: 17.9495 in seconds
Perft results for depth 9: 2.439.530.234.167 Nodes per sec: 1.35911e+11

Time: 589.134 in seconds
Perft results for depth 10: 69.352.859.712.417 Nodes per sec: 1.1772e+11
So it runs perft 10 within 10 minutes, and it is always fun to see such big numbers match with the chessprogramming.org numbers :D

Good luck with the development of your chess engine and, mostly, enjoy!
syzygy
Posts: 5693
Joined: Tue Feb 28, 2012 11:56 pm

Re: My transposition tables make things slower

Post by syzygy »

hgm wrote: Wed Sep 27, 2023 10:36 am That is quite normal; probing the hash table is a very slow operation. Micro-Max 1.6, which has no hash table, has about 6 times larger nps than micro-Max 5.0, which has one.
If adding hashing to perft results in 3x longer running times AND an incorrect result, then clearly there is just a bug in the code.
syzygy
Posts: 5693
Joined: Tue Feb 28, 2012 11:56 pm

Re: My transposition tables make things slower

Post by syzygy »

HemiMG wrote: Wed Sep 27, 2023 6:27 am

Code: Select all

uint64_t Game::Perft(int depth)
{
    TTEntry* entry = t_table->GetEntry(board.key);
    if (entry != nullptr && entry->depth >= depth)
    {
        return entry->score;
    }
I don't think you want to use the hashed perft score for a different depth. But this does not explain why perft takes longer with hashing.

To make sure that your incrementally updated hashkey is correct, compare it in each node with the freshly recomputed value.
HemiMG
Posts: 4
Joined: Wed Dec 23, 2020 1:52 am
Full name: John Garrison

Re: My transposition tables make things slower

Post by HemiMG »

Thanks everyone! I've been furiously debugging things and have almost got it worked out.
Ronald wrote: Wed Sep 27, 2023 12:18 pm If you want to use the hashtable for perft calculations you can only use a hashtable entry if the depth of the entry exactly matches the current depth, otherwise you will mix different depth counts for your position which will create wrong results.
This was the first problem. That got my perft results correct at depth 5 from the starting position, where the transposition table was first messing them up. Thankfully I decided to revisit transposition table speed and use perft to check it because somewhere along the way a bug appeared in my castling code. After getting that fixed and getting past perft 7, I discovered a transposition table collision in perft 8. That represents 84 billion moves though, so I'm not sure one hash collision is worth worrying about. Other than that, it seems to have resulted in about double the speed at perft 7.

Now to move on to other stuff. I originally wrote the code a year or two ago, so this was supposed to just be a clean up and organization task. Fingers crossed the rest of it goes smoothly.
User avatar
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: My transposition tables make things slower

Post by Volker Annuss »

You take the hashed node count from entry->score which is only 16 bits wide. So you will get overflows for high depths.
Chessnut1071
Posts: 313
Joined: Tue Aug 03, 2021 2:41 pm
Full name: Bill Beame

Re: My transposition tables make things slower

Post by Chessnut1071 »

hgm wrote: Wed Sep 27, 2023 10:36 am That is quite normal; probing the hash table is a very slow operation. Micro-Max 1.6, which has no hash table, has about 6 times larger nps than micro-Max 5.0, which has one.
FEN[226] = "6r1/p1pq1p1p/1p1p1Qnk/3PrR2/2n1P1PP/P1P5/4R3/6K1 w - - 0 1"; // f5h5 e5h5 g4g5 h5g5 h4g5 h6h5 e2h2 h5g4 h2g2 g4h5 f6f3 h5h4 f3g3 h4h5 g2h2 d7h3 g3h3 g6h4 h3h4 h5g6 h4h6

Using the FEN above, I tested the solution with and without the TT. Results as follows:

1) With TT: time to solution: 161.085 seconds; nodes searched = 310,019,642
2) Without TT: time to solution = 1244.05 seconds; nodes searched = 1,556,880,385

The TT table reduced time to solution by 80% and the number of nodes searched by 87%.

My TT table includes 1) hash key 2) success/failure 3) ply. If a collusion occurs, per your suggestion, I just throw ignore the hash key. Also, I have enough space so the keys can't overlap. I use checkmate solutions instead of perft, and have found at least 80% reduction in time and about 92 % reduction in nodes searched. However, the TT gets much more efficient at higher ply levels. The one thing I found is you need larger hash table for deep ply solutions. Without a TT the time to solution on some problems could be months or even years. I can't even imagine running without a TT unless the solution is only a few ply deep.