TalkChess.com

Posted: **Sat Feb 01, 2020 1:57 am**

I wonder if this could lead to some paradigm shift.

For over a decade it was assumed that small, incremental changes which are functionally independent are generally additive.

Revert 5 patches which were merged, but lead to a regression test that showed negative Elo gain:

http://tests.stockfishchess.org/tests/v ... d58394fdb9

This was discussed in depth in:

https://github.com/official-stockfish/S ... ssues/2531

Posted: **Sat Feb 01, 2020 2:08 am**

Deberger wrote: ↑Sat Feb 01, 2020 1:57 am I wonder if this could lead to some paradigm shift.

For over a decade it was assumed that small, incremental changes which are functionally independent are generally additive.

Revert 5 patches which were merged, but lead to a regression test that showed negative Elo gain:

http://tests.stockfishchess.org/tests/v ... d58394fdb9

This was discussed in depth in:

https://github.com/official-stockfish/S ... ssues/2531

Yes, people abuse the notion of "simplification" to commit anything. Combine that with pervasive p-hacking, and that's no surprise.

Posted: **Sat Feb 01, 2020 1:41 pm**

The reverted patches passed as "elo gainers".

Posted: **Sat Feb 01, 2020 2:19 pm**

The 5 reverted patches were all developmental, for a future Version 12.

Today a 6th patch was reverted, the final LMR which was included in Version 11.

https://github.com/official-stockfish/S ... 261b26ac3a

Posted: **Sun Feb 02, 2020 8:03 am**

Today a 7th patch was reverted.

(Simplify away king infiltration:)
https://github.com/official-stockfish/S ... 0d916f447c

A patch which was committed twelve days before Version 11 was released.

(Introduce king infiltration bonus:)
https://github.com/official-stockfish/S ... e025bf8f46

Posted: **Sun Feb 02, 2020 8:57 am**

Alayan wrote: ↑Sat Feb 01, 2020 1:41 pm The reverted patches passed as "elo gainers".

Are we sure this pentanomial test is correct ? When I look at these [0-2] results, I'm very surprised by how low the stopping time is, compared to what you'd expected it to be for SPRT(0,2). And considering that SPRT is asymptotically optimal, something doesn't make sense...

Another problem is the bounds used for STC. They provide almost no filtering. Previously, we have 0-5 for both STC and LTC, such that p-hacking was much reduced.

Posted: **Sun Feb 02, 2020 10:53 am**

The validity of the pentanomial model can be verified by simulation.
https://github.com/vdbergh/pentanomial
Concerning short tests: there are various things to consider, notably:
- Fishtest Elo bounds are no longer BayesElo.
- The stopping time distribution for an SPRT has long tails.
- The great majority of patches submitted to Fishtest are at best neutral (the Elo prior was measured some time ago to be ~ N(-1,1)).

Posted: **Sun Feb 02, 2020 11:28 pm**

I wrote a simple multi-threaded C version of the pentanomial simulator.

https://github.com/vdbergh/simul

Everything in a single C file. As it is much much faster than the Python version one can see better how accurate the implementation is.

Posted: **Mon Feb 03, 2020 12:04 pm**

Michel wrote: ↑Sun Feb 02, 2020 11:28 pm I wrote a simple multi-threaded C version of the pentanomial simulator.

https://github.com/vdbergh/simul

Everything in a single C file. As it is much much faster than the Python version one can see better how accurate the implementation is.

Now with a decent README.md!

Posted: **Mon Feb 03, 2020 8:27 pm**

SF is so strong, that today all changes need almost astronomical number of games to pass. In good old days it was 100-200 games and engine was better

.

TalkChess.com

Stockfish Reverts 5 Recent Patches

Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches

Re: Stockfish Reverts 5 Recent Patches