Multiple change testing

shawn · Post by **shawn** » Mon Jul 22, 2024 4:52 pm

Try SPRT https://www.chessprogramming.org/Sequen ... Ratio_Test. This is the better testing method everyone uses nowadays. The inverted results you got with those changes are just an effect of small sample sizes.

hgm · Post by **hgm** » Mon Jul 22, 2024 5:55 pm

Indeed, if you have infinite computer power available so that you won't have to worry much about efficiency that is a very reliable method.

Viz · Post by **Viz** » Mon Jul 22, 2024 7:47 pm

hgm wrote: ↑Mon Jul 22, 2024 5:55 pm Indeed, if you have infinite computer power available so that you won't have to worry much about efficiency that is a very reliable method.

This is THE most efficient method of testing regardless of the amount of compuer power you have, period. Everything else is just bad more or less.

hgm · Post by **hgm** » Mon Jul 22, 2024 8:05 pm

But the question is: "for what, and under which conditions?". If you would pick patches from a pool that in 75% of the cases would increase the strength by 1 Elo, and in 25% would decrease it by 1 Elo, it would be infinitely faster to just pick 100 patches and accept them all without any testing whatsoever, than to test any of those with SPRT and only accept those that pass. You would have already +50 Elo before the SPRT methodology even gave you +1 Elo...

Viz · Post by **Viz** » Mon Jul 22, 2024 8:57 pm

hgm wrote: ↑Mon Jul 22, 2024 8:05 pm But the question is: "for what, and under which conditions?". If you would pick patches from a pool that in 75% of the cases would increase the strength by 1 Elo, and in 25% would decrease it by 1 Elo, it would be infinitely faster to just pick 100 patches and accept them all without any testing whatsoever, than to test any of those with SPRT and only accept those that pass. You would have already +50 Elo before the SPRT methodology even gave you +1 Elo...

And how do you conclude if patch has 75% probability of increasing strength by 1 elo and 25% probability of decreasing it by 1 elo? From your astral spirit vibes?
Even this really specific task is better concluded with, surprise, SPRT.
Make [-1;1] bounds and stop at ln 4 LLR - voila, you have SPRT that concludes EXACTLY what you describe and it's the way to conclude it with playing minimum number of games.

AndrewGrant · Post by **AndrewGrant** » Mon Jul 22, 2024 9:00 pm

hgm wrote: ↑Mon Jul 22, 2024 8:05 pm But the question is: "for what, and under which conditions?". If you would pick patches from a pool that in 75% of the cases would increase the strength by 1 Elo, and in 25% would decrease it by 1 Elo, it would be infinitely faster to just pick 100 patches and accept them all without any testing whatsoever, than to test any of those with SPRT and only accept those that pass. You would have already +50 Elo before the SPRT methodology even gave you +1 Elo...

If you can pick patches that are winners at a 3-to-1 rate, then you don't need to test at all...
So it is hardly a counter argument to doing things with meaningful statistical power.

Ciekce · Post by **Ciekce** » Mon Jul 22, 2024 10:13 pm

if it was not obvious, hgm's opinion here is horribly wrong - another vote for literally the only good option in SPRT

Whiskers · Post by **Whiskers** » Tue Jul 23, 2024 2:07 am

Ciekce wrote: ↑Mon Jul 22, 2024 10:13 pm if it was not obvious, hgm's opinion here is horribly wrong - another vote for literally the only good option in SPRT

hi ciekce

Viz · Post by **Viz** » Tue Jul 23, 2024 6:05 am

Ciekce wrote: ↑Mon Jul 22, 2024 10:13 pm if it was not obvious, hgm's opinion here is horribly wrong - another vote for literally the only good option in SPRT

There is quite literally no "voting" at this topic.
If any dev claims he can spot gainers with 75% probability he is either a liar or a Jesus.

AndrewGrant · Post by **AndrewGrant** » Tue Jul 23, 2024 8:40 am

Viz wrote: ↑Tue Jul 23, 2024 6:05 am
Ciekce wrote: ↑Mon Jul 22, 2024 10:13 pm if it was not obvious, hgm's opinion here is horribly wrong - another vote for literally the only good option in SPRT
There is quite literally no "voting" at this topic.
If any dev claims he can spot gainers with 75% probability he is either a liar or a Jesus.

75% chance is easy... if you start a new project and know the current state of the art.
Otherwise, impossible.

Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing

Re: Multiple change testing