Hybride replacemment strategy worse than always-replace

Rebel · Post by **Rebel** » Fri Apr 26, 2024 12:29 am

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Time and time again devs that have opted for alternative testing solutions (mostly because they didn't know sprt existed)

False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm have ended up with feature bloated engines that could barely play decent chess, hundreds if not thousands of Elo weaker that they should've been.

False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Edit: it's insane that a conclusion so obvious such as "use the testing every engine in the top 100 CCRL uses" needs 3 pages of arguments, this is why talkchess is a cesspit and the only useful interactions are because actual devs take pity on people being mislead.

Not sure what you are trying to establish, but this forum was created in 1997 and probably will survive discord.

And BTW, please fill in your real name in your profile, that is obliged here.

Thank you.

Pali · Post by **Pali** » Fri Apr 26, 2024 12:33 am

hgm wrote: ↑Thu Apr 25, 2024 10:58 pm
Pali wrote: ↑Thu Apr 25, 2024 9:34 pmHere is a comparison for Black Marlin (using nodes as NPS is the same for each):
Current replacement scheme takes 31981317 nodes to reach depth 30
Always replace takes 33255810 nodes to reach depth 30
Preferring lower depth entries takes 18510403 nodes to reach depth 30
As an extra, removing TT entirely takes 7805004 nodes to reach depth 30
I would say you have an engine that lies very much about its depth, to get fewer nodes without TT. It cannot have search all branches to the same depth, with fewer nodes and no hash cutoffs to save nodes.

"It cannot have search all branches to the same depth"
What you are referring to is called pruning, reductions and extensions. If you consider this lying about depth, sure go ahead. But keep in mind that no engine is honest about depth by your definition and as such, you shouldn't give this advice to anyone without explicitly reviewing their code and making sure that they perform no reductions, pruning or extensions at any point.

Pali · Post by **Pali** » Fri Apr 26, 2024 12:41 am

Rebel wrote: ↑Fri Apr 26, 2024 12:29 am
pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Time and time again devs that have opted for alternative testing solutions (mostly because they didn't know sprt existed)
False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm have ended up with feature bloated engines that could barely play decent chess, hundreds if not thousands of Elo weaker that they should've been.
False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Edit: it's insane that a conclusion so obvious such as "use the testing every engine in the top 100 CCRL uses" needs 3 pages of arguments, this is why talkchess is a cesspit and the only useful interactions are because actual devs take pity on people being mislead.
Not sure what you are trying to establish, but this forum was created in 1997 and probably will survive discord.

And BTW, please fill in your real name in your profile, that is obliged here.

Thank you.

This is not about which platform survives, it's about where engine dev takes place. All Stockfish developers, all Open Bench developers, all Swe developers, at least a decent amount of Leela developers do engine development almost exclusively on Discord and talk here only to prevent misinformation or call out clones.

pgg106 · Post by **pgg106** » Fri Apr 26, 2024 12:45 am

Rebel wrote: ↑Fri Apr 26, 2024 12:29 am
pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Time and time again devs that have opted for alternative testing solutions (mostly because they didn't know sprt existed)
False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm have ended up with feature bloated engines that could barely play decent chess, hundreds if not thousands of Elo weaker that they should've been.
False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Edit: it's insane that a conclusion so obvious such as "use the testing every engine in the top 100 CCRL uses" needs 3 pages of arguments, this is why talkchess is a cesspit and the only useful interactions are because actual devs take pity on people being mislead.
Not sure what you are trying to establish, but this forum was created in 1997 and probably will survive discord.

And BTW, please fill in your real name in your profile, that is obliged here.

Thank you.

Not sure what all the "false" mean when i can think of several examples of exactly that happening in the last year alone (to how many new developers have you talked in the last year? how many new engines have you been following?).
What i'm trying to establish is that talkchess has been an empty shell for years at this point and most of the conversations happening here are either empty fluff or even worse actively detrimental for someone new who wants to work on an hobby. See the recent claim from a former moderator that "800 fixed games tests" are a silver bullet for basically everything.
I know about the real name rule, i intentionally removed it, feel free to deactive this account but after having done so try to police blatant trolls and people spreading outright fake or outdated information like it's still 1997.

JacquesRW · Post by **JacquesRW** » Fri Apr 26, 2024 12:58 am

hgm wrote: ↑Thu Apr 25, 2024 10:43 pm
JacquesRW wrote: ↑Thu Apr 25, 2024 9:02 pmYou can just SPRT B without A against your new master (that contains A).
Sure, you can. This was not what was presented as 'the ultimate method', though. It becomes even more interesting when you have to test 5 patches, and are allergic to the idea that any pair of those might interact.

You can't soundly combine your A, B and A+B test results, but suppose we ignore that, could you give some solid examples of **how** you can practically select pairs of patches where this method is actually beneficial over sequentially SPRTing stuff.
Not sure why you say that. For patches that do not interact it seems straightforward enough.

I was going to point this out in my original post, but didn't see any reason, because your whole point of doing A + B over SPRT was for patches that interact, and I had hoped that you would act in good enough faith to not point this out as some kind of "gotcha" (I notice that you didn't make any argument for if they do interact, why not?).

JacquesRW · Post by **JacquesRW** » Fri Apr 26, 2024 1:15 am

hgm wrote: ↑Thu Apr 25, 2024 10:43 pm I expect it to be the most important metric, because apart from simple end-games hash grafting is pretty rare. If you are really concerned about this, it would be better to test the improvement of move quality separately. Playing games beteen a version of the engine that only accepts exact-depth hits with one that also accepts over-deep hits is a suitable method for this. As you are only interested in determining an upper limit, to ascertain that differences in search speed that you measured are not significantly affected by it.

I am concerned about it, and so now your advice is to playing games? Wouldn't it have been simpler to just play games from the start?

Rebel · Post by **Rebel** » Fri Apr 26, 2024 2:10 am

pgg106 wrote: ↑Fri Apr 26, 2024 12:45 am
Rebel wrote: ↑Fri Apr 26, 2024 12:29 am
pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Time and time again devs that have opted for alternative testing solutions (mostly because they didn't know sprt existed)
False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm have ended up with feature bloated engines that could barely play decent chess, hundreds if not thousands of Elo weaker that they should've been.
False.

pgg106 wrote: ↑Thu Apr 25, 2024 11:56 pm Edit: it's insane that a conclusion so obvious such as "use the testing every engine in the top 100 CCRL uses" needs 3 pages of arguments, this is why talkchess is a cesspit and the only useful interactions are because actual devs take pity on people being mislead.
Not sure what you are trying to establish, but this forum was created in 1997 and probably will survive discord.

And BTW, please fill in your real name in your profile, that is obliged here.

Thank you.
Not sure what all the "false" mean

I was speaking about my own engine.

There are other programmers here who do fine using SPRT and SPSA, so you are wrong.

pgg106 · Post by **pgg106** » Fri Apr 26, 2024 2:35 am

I can count so many who don't do just fine that the idea of presenting speculative, unsound testing as a valid alternative is simply inane. I'm sure there are 1-2-10 exceptions to this who can stare at gauntlents and hand-tune their search heuristics, suggestions shouldn't be made on the basis that outliers exist, and let me assure you that you are definitely an outlier

Viz · Post by **Viz** » Fri Apr 26, 2024 2:57 am

hgm wrote: ↑Thu Apr 25, 2024 10:58 pm
Pali wrote: ↑Thu Apr 25, 2024 9:34 pmHere is a comparison for Black Marlin (using nodes as NPS is the same for each):
Current replacement scheme takes 31981317 nodes to reach depth 30
Always replace takes 33255810 nodes to reach depth 30
Preferring lower depth entries takes 18510403 nodes to reach depth 30
As an extra, removing TT entirely takes 7805004 nodes to reach depth 30
I would say you have an engine that lies very much about its depth, to get fewer nodes without TT. It cannot have search all branches to the same depth, with fewer nodes and no hash cutoffs to save nodes.

Well, sorry, "lies about depth" is just bizarre.
What we have there...
Singular extensions (some 100+ elo)
Other positive extensions (1-2 elo)
Razoring (1-2 elo)
IIR (10~ elo)
Do deeper/shallower search after LMR (2-5~ elo)
Multicut (2-5 elo)
Negative extensions (10 elo)
Full depth search reductions (~10 elo)
Reducing after a fail middle (~2 elo)
Maybe I forgot smth ofc.
So as we can see to not "lie about depth" you just need a simple thing - remove 150 elo worth of search improvements and you will be just fine! You will enable definitely robust and not bonus "look at time to depth and conclude smth" methodology.
Also do you even know that any engine that is more complicated than basic minimax doesn't search all branches to the same depth, even simple alpha-beta wouldn't do so?

Rebel · Post by **Rebel** » Fri Apr 26, 2024 7:18 am

pgg106 wrote: ↑Fri Apr 26, 2024 2:35 am I can count so many who don't do just fine that the idea of presenting speculative, unsound testing as a valid alternative is simply inane. I'm sure there are 1-2-10 exceptions to this who can stare at gauntlents and hand-tune their search heuristics, suggestions shouldn't be made on the basis that outliers exist, and let me assure you that you are definitely an outlier

I seldom go to Discord, maybe once a month. It's an obscure, cluttered chat box with a bad search function. I never found anything useful there. Make it an organized forum like here.

And BTW, if you keep posting fill in your name.

And out of curiosity, how is your engine called?

Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace