Joost Buijs wrote: ↑Thu May 31, 2018 7:52 am
At the root my engine stores every-PV move and every fail-high, and does this for every iteration. For the remaining part of the tree it only stores at the end of a node.
What is the advange of that?
I did a quick test with my program and it showed a -32 elo with that.
There is no special advantage at all, it just seems the natural way of doing this. That your engine degrades 32 Elo just by changing the way you store the TT entries at the root is strange, or do you mean storing the entries only at the end of each node?
Anyway, it is difficult to compare these concepts between different engines, what for engine 'A' is better can be worse for engine 'B'. In practice it is better to rely on the outcome of automated tests. A gauntlet between your engine and a bunch of other engines of approx. the same strength works best.
hgm wrote: ↑Thu May 31, 2018 11:01 am
Isn't that very suspect? How can it make the slightest difference what you store in the TT during an iteration? It should be overwritten by what you store when you leave the node. And up to that point it should never have been probed, as any visits to the node during that period should have been repetitions.
It should not have any impact on a single threaded search, but storing entries usually is a very slow process and doing all these redundant stores is not something you would like, and in a multi threaded search it will certainly have an impact on your search as well.
I don't think it would be slow. You store in a bucket that was probed before, so it is very likely to still be in cache. To be flushed out of the cache (even L2) in the time of an iteration you would need a pretty deep search. And nodes with such a deep search are so rare that it doesn't have any impact, no matter how slow it is. A 40-Elo gain roughly corresponds to a 40% speedup...
In SMP the data could indeed be probed, when reached through another path. But was the test that reports this +40 Elo gain one with more than a single core?
hgm wrote: ↑Thu May 31, 2018 11:48 am
I don't think it would be slow. You store in a bucket that was probed before, so it is very likely to still be in cache. To be flushed out of the cache (even L2) in the time of an iteration you would need a pretty deep search. And nodes with such a deep search are so rare that it doesn't have any impact, no matter how slow it is. A 40-Elo gain roughly corresponds to a 40% speedup...
In SMP the data could indeed be probed, when reached through another path. But was the test that reports this +40 Elo gain one with more than a single core?
+40 Elo seems very high indeed, there probably was something else wrong or missing. I agree that these redundant stores can't have an impact of 40% on speed, also because this only happens at PV nodes which are scarce, but it still is something I try to avoid. When you try to get your engine as strong as possible, each procent of speed that you can gain counts, doing redandant things that are easy to avoid are out of the question in my opinion.
Edit:
In the CPW code snippet shown before an exact entry is stored at the end of each node, even when it is a fail-low, that is probably the reason for the huge Elo gain after correcting this. Maybe that snipped is wrong, I don't know, I never looked at CPW.
hgm wrote: ↑Thu May 31, 2018 11:01 am
Isn't that very suspect? How can it make the slightest difference what you store in the TT during an iteration? It should be overwritten by what you store when you leave the node. And up to that point it should never have been probed, as any visits to the node during that period should have been repetitions.
Research may occur (aspiration windows) and iid for sorting (in absence of TT move) deeper in the tree may encounter the same position I think. I agree so much elo probably means there is something wrong. Also TT insertion when val > alpha was done for all moves until time control set the stopFlag so the last thing added in TT this way may not be the "real" best move because stopFlag when set forbid further insertion in TT and especially the one at the end of the move loop for the current deepening depth.
Sorry for the late reaction, i was not able to react sooner.
The only way to calculate the exact value or upper bound of a position is to calculate all the moves in the position and then to determine which move created the highest score, if it's inside the alfa beta window it's exact if it's below alfa it's an upper bound. Therefore it's useless to store intermediate results in the hash, because if the first store is not the best, you do stores for nothing. Even though a store might be cheap, doing it for nothing is still an unnecessary cost.
In addition to that it might introduce errors in your search. In a single thread, it might have no influence if your engine checks for a single repetition in alfabeta and quiescence, my engine does not check for repetition in quiescence but uses the hashtable, so when i tested with my engine it did have an effect on the searchtree.
When multi threaded with lazy SMP, different threads will (hopefully) find a lot of identical positions and then intermediate hash values will be found and used, and thus create errors in your search. So there are only reasons not to do it.
I think the pseudo code of Joost describes the right way to do it, i don't understand the CPW code, it looks completely wrong to me.
Considering the depth to use when storing the position in hash, depends on how you do your extension/reductions. In my engine i only do extensions/reductions for a subset of moves in the position and then it doesn't seem right to store the position with a corrected depth. I hadn't looked good enough to the source of Topplechess to realize that in Topplechess every move is extended 1 ply. In that case it might even be better to store it with the corrected depth, because in that case the search really is based on the higher depth.
I agree that saving a hash value during the move loop isn't very useful. I've moved it to the end of the function as Joost recommends in their pseudocode.
I've also added simple move ordering, based on hash, SEE, killers and history.
For some reason, the search is still as slow as before, even though everything about the transposition table looks like it's being implemented correctly: The hashfull function shows that the table is definitely filling up, and the hash hit rate (successful probes / total probes) is almost 100%, but still the search is taking far longer than expected, and the engine doesn't find a winning score.
Is this usually caused by hash cutoffs not working correctly, or is it an issue with move ordering? Everything seems like it's working but the search still doesn't find the win.
I've updated the repository with my new code: https://github.com/konsolas/ToppleChess
This is what it looks like with an experimental version of my engine.
Conditions are: single core, evaluation: material and PSQT only, move ordering: hash move first and history, TT size: 1024MB.
It goes to depth 24 almost instantly (51 msec.), after this promotions appear on the board and it slows down, it also finds the correct move on depth 24. The evaluation function has no knowledge about passed pawns at all, when I add this it will probably find the correct move earlier on.
I tried to understand why your engine is not working as expected, so i tried to turn of as much as possible in my engine. What is left then is a staged movegenerator (hash, captures, quiets including promotions) which i cannot turn off, with a hashtable. No move ordering MVV/see/killer/history etc,and only a "clean" alfabeta and my engine only has PSQT for evaluation, then it will still find the correct move very fast.
It only has a problem with finding the solution if it doesn't use the hashmove. I'm not saying that is your problem, but you might check if you have hashmoves and if so that they are executed as the first move in the position, because other mover ordering and alfabeta tricks seem to have a lot less influence on this position.
Here is the output of the clean version of my engine.