Okay, that explains how you can be so fast with hash probing. About 85% of the nodes are QS, so if you only probe in 15%, the theoretical maximum speed due to the hash probing is not 3Mnps, but 20Mnps.
This is another aspect one should be well aware of when comparing speed of different engines.
Note that I always lost strength by refraining from hash probing in QS.
engine speed issues
Moderator: Ras
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: engine speed issues
Leorik also reports 'nps' in the range of 6M like Rustic and the way it achieves that is that it doesn't probe the TT in qsearch and it does count stand-pat nodes as visited. Also in my qsearch implementation only a subset of the evaluation is updated, specifically only the pawn-structure and material one. I don't update the zobrist hash anymore or the halfmove clock because I don't need it in qsearch. The mobility term is computed once before going into qsearch and then kept constant during qsearch. All this drives up the 'nps'...hgm wrote: ↑Fri Jul 22, 2022 4:06 pm Sure, I am not saying that one way is better than another. But it is important to know what exactly people are counting when you try to compare the raw speed of different engines. A stand-pat, especially based on a 'lazy eval' takes almost no time at all. So these nodes very much drive up the average speed. But when the engine uses futility pruning, it is exactly these nodes that are no longer visited.
I am still puzzled that you can achieve this speed with a hash table. I remember that when I added hash probing to micro-Max the speed dropped from 6Mnps to 2Mnps, just because the hash probes were so slow. So it seems the hash probes took 2/3 of the time, suggesting that these would limit you to 3Mnps if nothing else had to be done. Micro-Max' hash table might not be entirely optimal, though (no alignment with cache lines).
Do you probe the TT in every counted node, or do you test for stand-pat cutoff even before probing the hash?
If probing the TT in qsearch would cut the nps in half but increase the average search depth than this change doesn't really make the engine slower. Still going by the reported 'nps' value you'd think it lost half of it's speed. But on the other hand I wouldn't know what other metric than 'nps' would better to compare engine speeds?
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: engine speed issues
Well, you could count move generations instead of nodes, or evaluations. Depending on which part of you engine consumes most time. Nodes counted at the top of (q)search is a bit misleading if you have many nodes that don't do anything. But futility pruning usually makes such nodes go away.
Anyway, the conclusion seems to be that 1Mnps is sort of normal for a bitboard engine that has futility pruning and probes that TT in every node, while 6Mnps is normal for a bitboard engine that does not probe the TT, does not use futility pruning, and has an evaluation without expensive terms.
Interesting that you stop calculating mobility when you enter QS. I once tried something I called 'static mobility' in micro-Max. This calculated mobility per piece as a (nearly free) side effect of move generation, at d=1 and in the null-move reply. And then temporarily added the bonus for it to the piece values used in QS. It did not really improve playing strength, but that was probably because it was poorly tuned.
Mobility is rather easy to calculate incrementally during QS, though. Just calculate it from the number of moves of the moved piece in its new location (and the remembered number in its old location), and the path length of some occasionally unblocked slider moves.
Anyway, the conclusion seems to be that 1Mnps is sort of normal for a bitboard engine that has futility pruning and probes that TT in every node, while 6Mnps is normal for a bitboard engine that does not probe the TT, does not use futility pruning, and has an evaluation without expensive terms.
Interesting that you stop calculating mobility when you enter QS. I once tried something I called 'static mobility' in micro-Max. This calculated mobility per piece as a (nearly free) side effect of move generation, at d=1 and in the null-move reply. And then temporarily added the bonus for it to the piece values used in QS. It did not really improve playing strength, but that was probably because it was poorly tuned.
Mobility is rather easy to calculate incrementally during QS, though. Just calculate it from the number of moves of the moved piece in its new location (and the remembered number in its old location), and the path length of some occasionally unblocked slider moves.
-
- Posts: 66
- Joined: Thu Dec 09, 2021 8:26 pm
- Full name: Kyle Zhang
Re: engine speed issues
Sorry for the 10-day interval on the reply, I was on a trip without my computer.mvanthoor wrote: ↑Fri Jul 22, 2022 12:09 am To the OP: what language is the engine written in? I see the ".back()" method a lot. Are you using vectors for everything? If so, all of your stuff is stored on the heap, which is a lot slower to access than the stack. Personally I'd gladly use vectors (and some other nice/convenient structures) for things like move lists, but when I tried it, the engine speed dropped by 8-10% for each array that I replaced with a vector. The only place where I use a vector is for the PV, because it is very convenient and not slower than an array; if the array turns out to be too small (= PV gets too long for it) having to copy it manually is more of a hassle than to trust the compiler to extend the vector. (And I start it with a reserved space to begin with.)
So if you're now using vectors everywhere, the first thing I'd try is to replace those with arrays where possible.
So today I just got back, and tried to implement your suggestion (my engine is written in C++ as it is the only one I know). I did change the castling and ep state into C-arrays and got a slight, but nice, speed boost!
However since my move-generation function took a reference to the vector that served as the movelist, and if I change the vector to the C-style array, I don't know how to pass a reference or a pointer into the legal_moves function.
So any help would be nice

Peacekeeper: https://github.com/Sazgr/peacekeeper/
-
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: engine speed issues
It’s been a while since I used C, but if I remember correctly, array names are basically pointers to the first element of an array. So you can just pass the name of the C style array, as you’re passing in a copy of the pointer to the first element, so you’re still accessing the same area.
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: engine speed issues
I am now favoring a design where each node does not merely remember its own move list, but also the move lists of all its children. Nodes often search the same move multiple times (e.g. in the standard implementation of PVS or LMR, and for Internal Iterative Deepening). That way you can avoid duplicate move generation and sorting, and the second call to the node can simply continue where the first call left off. (But at higher depth, or with wider window.) For PVS or LMR re-searches you would only have to remember the move list, but for IID you might have gone through all moves before revisiing those at higher depth.
I usually put moves on a global stack, in which case I would just have to pass the index in that stack where the move list for a node starts. But if you give each node a local array large enough to hold the move lists of all children you could pass the pointer to the first element in that array (where you could reserve the first few elements to indicate the number of moves of the list, and other info you want to pass from one call to the child to another (such as up to which point the moves are already sorted).
I usually put moves on a global stack, in which case I would just have to pass the index in that stack where the move list for a node starts. But if you give each node a local array large enough to hold the move lists of all children you could pass the pointer to the first element in that array (where you could reserve the first few elements to indicate the number of moves of the list, and other info you want to pass from one call to the child to another (such as up to which point the moves are already sorted).
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: engine speed issues
Just pass the pointer to the parent move along and store that there. It is generally an efficient idea and you can avoid a lot of code for checking specific circumstances.hgm wrote: ↑Tue Aug 02, 2022 9:09 am I am now favoring a design where each node does not merely remember its own move list, but also the move lists of all its children. Nodes often search the same move multiple times (e.g. in the standard implementation of PVS or LMR, and for Internal Iterative Deepening). That way you can avoid duplicate move generation and sorting, and the second call to the node can simply continue where the first call left off.
For example when the last move was a pawn move you dont have to check for checks by knights.
A lot of other optimisation exist in other cases.
Remembering and incremental updating generally is very strong.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 66
- Joined: Thu Dec 09, 2021 8:26 pm
- Full name: Kyle Zhang
Re: engine speed issues
Oops yeah I just forgot about thatalgerbrex wrote: ↑Tue Aug 02, 2022 1:06 amIt’s been a while since I used C, but if I remember correctly, array names are basically pointers to the first element of an array. So you can just pass the name of the C style array, as you’re passing in a copy of the pointer to the first element, so you’re still accessing the same area.

At any rate I have gotten everyone's suggestions into my engine and working, and I'm happy to say that its speed has increased to 3Mnps

So since the raw speed seems more than OK, I should probably work on the search and evaluation now I guess.
Peacekeeper: https://github.com/Sazgr/peacekeeper/