No, that was not the whole point, not the main point, and not even one of the important points. The point for doing this would be that it could require fewer games than sequential SPRT on individual patches, by using the same set of games to measure the effect of a number of patches simultaneously. That it also allows you to detect interaction of the patches is just a free bonus, which was only mentioned to rebute the remark that this would not work when patches interact.JacquesRW wrote: ↑Fri Apr 26, 2024 12:58 amI was going to point this out in my original post, but didn't see any reason, because your whole point of doing A + B over SPRT was for patches that interact, and I had hoped that you would act in good enough faith to not point this out as some kind of "gotcha" (I notice that you didn't make any argument for if they do interact, why not?).
BTW, how many entries were there in the TT for the akimbo bench results that you presented?
It can be advantageous to separate the issues. Unfortunately there is no other way than playing games that I know of, for assessing move quality. But it is a very noisy method, and excessively many games are needed to detect small differences. If you could determine an upper bound for the effect of grafting by testing with a generously large TT and a good replacement scheme (e.g. shallowest of four with aging), so that essentially no overwriting is taking place at all, you might need very many games, but you only need to do that once. You can then test various replacement schemes with small TT by measuring how they reduce node count of the same search tree on a set of positions. The reduced number of TT hits you would have in the small table should decrease the effect of grafting on move quality even more. That way you would prevent having to do the huge number of games for each replacement scheme that you test.
I could be wrong, but I expect the effect of grafting in the middle-game to be very small. Based on the observation of Bob Hyatt that he could replace an amazingly large fraction of the evaluation in the leaves (like close to a percent) without affecting the move choice in the root. And most grafts would not really cause a dramatic change of the score; the latter usually changes only slighty with depth. Only in cases where the extra depth sees something dramatic, such as a checkmate, there is a huge change. But that would only be the case in a very small fraction of the grafts.