Hybride replacemment strategy worse than always-replace

hgm · Post by **hgm** » Sun Apr 28, 2024 10:51 am

You seem to have the strange notion that I should do the testing to validate your engine. But my critcicsm concerns the testing methodology, and your engine only features in this as an example, which you yourself claim to be representative for development of 'modern engines'. For the benefit of the reader I illustrate the reasons for my concern by examples, using numbers that are typical, where you did not provide actual numbers. If you would really know "what goes on under the hood", you would be able to correct those to numbers that apply to your engine from the top of your head; it beats me how anyone could need two hours for determining how many entries there typically are in the 'current generation' of TT entries.

But let us take a step back rather that losing ourselves in nitpicking over details:

Fact: the shape of your search tree of your engine is completely different when you have no hash hits (i.e. no TT) from what it is when you have a number of hash hits that is as good as you can ever expect (TT much larger than the search tree and an advanced replacement scheme). In the former case you report ~4 times fewer nodes for the same nominal depth (which typically is used to indicate the length of the PV in the full-width part of the search).

Fact: By your own reporting, you only ever tested under conditions where the TT is much larger than the search tree.

Fact: it is well known that tree shaping through reductions and pruning can have a dramatic effect on playing strength.

Fact: it would require insanely large amounts of memory (i.e. orders of magnitude more than you would ever need for any other application) to have the same ratio of tree size to hash size as is used by the rating testers when using the engine for analysis. (e.g. 8MB at 0.2 sec/move would correspond to 144GB for a 1 hour analysis).

Fact: In real-life applications the quality of an engine is determined by the accuracy of its analysis per dollar spent on hardware; it is not relevant for users that the engine would be the strongest in the world when running on a supercomputer, when it sucks on the best hardware they can afford.

I leave it up to the readers to decide if they consider it a legitimate point of concern that an engine uses a totally different, basically untested search under the conditions they need to use it in, than for which it was tested. (A parallel with the car industry comes to mind, though...)

connor_mcmonigle · Post by **connor_mcmonigle** » Sun Apr 28, 2024 3:53 pm

hgm wrote: ↑Sun Apr 28, 2024 10:51 am ...

Fact: it would require insanely large amounts of memory (i.e. orders of magnitude more than you would ever need for any other application) to have the same ratio of tree size to hash size as is used by the rating testers when using the engine for analysis. (e.g. 8MB at 0.2 sec/move would correspond to 144GB for a 1 hour analysis).
...

At least for actual tournaments, something on the order of 100s of GiB of memory is the norm. It's not as preposterous as you seem to think.

It is conceivable that hash pressure would have some bearing on replacement scheme optimality. However, you've neglected to provide any evidence supporting that claim. Even if we accept that hash pressure has bearing on replacement scheme optimality on faith, that still leaves us with the problem of choosing some hash pressure target to optimize for. Whether that's 1MiB@0.2s per move or 8MiB@0.2s is not particularly relevant and does not somehow invalidate the state of the art chess engine testing methodology.

Regardless of what dubious surrogate metric you choose to optimize your TT replacement scheme for, you'll run into the same problem under this assumption.

Viz · Post by **Viz** » Sun Apr 28, 2024 4:15 pm

This is the main problem.
You make vague statements that have 0 practical proofs of them being true - and in fact a lot of them have practical counterexamples.
But keep standing on them.
"If theory isn't supported by reality - just choose another reality" I guess.

Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace