Nice speedup for Berserk with a simple eval hash

Dann Corbit · Post by **Dann Corbit** » Thu May 23, 2024 9:50 am

3040242 nodes 150947 nps // no eval hash
3040242 nodes 208980 nps // eval hash added
Sorry about the formatting, but you cannot format as code and also make your changes bold.
At least, I do not know how to do it.
// Berserk is a UCI compliant chess engine written in C
// Copyright (C) 2024 Jay Honnold

// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.

// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.

// You should have received a copy of the GNU General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.

#include "eval.h"

#include <stdio.h>

#include "attacks.h"
#include "bits.h"
#include "board.h"
#include "move.h"
#include "nn/accumulator.h"
#include "nn/evaluate.h"
#include "uci.h"
#include "util.h"

const int PHASE_VALUES[6] = {0, 3, 3, 5, 10, 0};
const int MAX_PHASE = 64;

void SetContempt(int* dest, int stm) {
int contempt = CONTEMPT;

dest[stm] = contempt;
dest[stm ^ 1] = -contempt;
}

#define USE_EVAL_HASH 1

#ifdef USE_EVAL_HASH
typedef struct eval_hash {
int value;
unsigned long long key;
} eval_hash;

static const size_t EVAL_HASH_MASK = 0x3FFFFFF;
static eval_hash eht[0x3FFFFFF + 1];
#endif
// Main evalution method
Score Evaluate(Board* board, ThreadData* thread) {
#ifdef USE_EVAL_HASH
// added evaluation hash table
uint64_t hash = board->zobrist;
eval_hash *e = &eht[hash & EVAL_HASH_MASK];
if (e->key == hash)
{
return e->value;
}
#endif
if (IsMaterialDraw(board))
{
#ifdef USE_EVAL_HASH
e->key = hash;
e->value = 0;
#endif
return 0;
}
Accumulator* acc = board->accumulators;
for (int c = WHITE; c <= BLACK; c++) {
if (!acc->correct[c]) {
if (CanEfficientlyUpdate(acc, c))
ApplyLazyUpdates(acc, board, c);
else
RefreshAccumulator(acc, board, c);
}
}

int score = board->stm == WHITE ? Propagate(acc, WHITE) : Propagate(acc, BLACK);

// scaled based on phase [1, 1.5]
score = (128 + board->phase) * score / 128;
score += board->phase * thread->contempt[board->stm] / 64;
int v = Min(EVAL_UNKNOWN - 1, Max(-EVAL_UNKNOWN + 1, score));
#ifdef USE_EVAL_HASH
e->key = hash;
e->value = v;
#endif
return v;
}
// removed method evaluatetrace for this post, since no changes were made to it.

Dann Corbit · Post by **Dann Corbit** » Thu May 23, 2024 10:04 am

I modified the makefile to build on Windows using gcc by default instead of clang (clang actually makes a better binary, but I can rarely get it to work on windows).
The binary in the source folder is avx2 built on threadripper.
build with:
make pgo ARCH=native

The eh in the name stands for 'eval hash'.

My modifications are here:

Dann Corbit · Post by **Dann Corbit** » Thu May 23, 2024 10:17 am

I could not contact Jay Honnold, but he might be interested.
He has instructions to use discord to contact him, but I don't use discord.

Dann Corbit · Post by **Dann Corbit** » Thu May 23, 2024 10:29 am

Much ado about nothing.
I had to static link to use the binary stand-alone on windows.
After that, my speed improvement went up in smoke.
There may also be interference from a tournament I have running.
It uses 30 of 32 cores, and maybe the bench uses more than 2 threads

Steve Maughan · Post by **Steve Maughan** » Thu May 23, 2024 12:34 pm

Hi Dann,

Isn't an evaluation hash for a NNUE engine moot? For every position you need to update the neural network, and in doing so you evaluate the position. What am I missing?

— Steve

Dann Corbit · Post by **Dann Corbit** » Thu May 23, 2024 5:23 pm

Steve Maughan wrote: ↑Thu May 23, 2024 12:34 pm Hi Dann,

Isn't an evaluation hash for a NNUE engine moot? For every position you need to update the neural network, and in doing so you evaluate the position. What am I missing?

— Steve

If eval() is called for a position already calculated, the value is in the eval hash table. Since evaluation of a neural network is expensive I would think you would save on calculation effort. The neural network does not care if it got the number by calculation or from the hash does it?
I have seen that LC0 uses an evaluation hash, which is what made me think it would work.

mar · Post by **mar** » Thu May 23, 2024 6:46 pm

looking at Berserk's transposition.h:

Code: Select all

typedef struct __attribute__((packed)) {
  uint16_t hash;
  uint8_t depth;
  uint8_t agePvBound;
  uint32_t evalAndMove;
  int16_t score;
} TTEntry;

shows that it stores eval in the TT, so the entire TT is in fact "eval cache"

also your eval cache code might break due to data races in smp:

Code: Select all

eval_hash *e = &eht[hash & EVAL_HASH_MASK];
if (e->key == hash)
{ // <<<< potential data race here, another thread may modify e->value before we return
return e->value;
}

Steve Maughan · Post by **Steve Maughan** » Thu May 23, 2024 7:23 pm

Dann Corbit wrote: ↑Thu May 23, 2024 5:23 pm If eval() is called for a position already calculated, the value is in the eval hash table. Since evaluation of a neural network is expensive I would think you would save on calculation effort. The neural network does not care if it got the number by calculation or from the hash does it?
I have seen that LC0 uses an evaluation hash, which is what made me think it would work.

But with NNUE you *must* update the NN after every move. I guess you could only update the first layer and pull the evaluation from the hash.

— Steve

JacquesRW · Post by **JacquesRW** » Thu May 23, 2024 8:16 pm

Steve Maughan wrote: ↑Thu May 23, 2024 7:23 pm
Dann Corbit wrote: ↑Thu May 23, 2024 5:23 pm If eval() is called for a position already calculated, the value is in the eval hash table. Since evaluation of a neural network is expensive I would think you would save on calculation effort. The neural network does not care if it got the number by calculation or from the hash does it?
I have seen that LC0 uses an evaluation hash, which is what made me think it would work.
But with NNUE you *must* update the NN after every move. I guess you could only update the first layer and pull the evaluation from the hash.

— Steve

You're saving the rest of the inference by using a hash, i.e. going from the first hidden layer, which is updated incrementally, to the output.
As already pointed out, Beserk does it already: https://github.com/jhonnold/berserk/blo ... rch.c#L468

EDIT: Also there's no requirement to update the accumulators after every move, you can do the updates lazily, which Beserk also does IIRC.

Dann Corbit · Post by **Dann Corbit** » Fri May 24, 2024 9:42 pm

mar wrote: ↑Thu May 23, 2024 6:46 pm looking at Berserk's transposition.h:
Code: Select all
typedef struct __attribute__((packed)) {
  uint16_t hash;
  uint8_t depth;
  uint8_t agePvBound;
  uint32_t evalAndMove;
  int16_t score;
} TTEntry;
shows that it stores eval in the TT, so the entire TT is in fact "eval cache"

also your eval cache code might break due to data races in smp:
Code: Select all
eval_hash *e = &eht[hash & EVAL_HASH_MASK];
if (e->key == hash)
{ // <<<< potential data race here, another thread may modify e->value before we return
return e->value;
}

Hyatt and Mann did a study on lockless hash tables and concluded that collisions don't matter.
"A lock-less transposition table implementation for parallel search chess engines"
However, with modern hardware and hundreds of threads, it may no longer be true.
Especially with a small hash table like the one in this post (67 million slots).

Nice speedup for Berserk with a simple eval hash

Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash

Re: Nice speedup for Berserk with a simple eval hash