Future of NNUE is Dimension 3072 network!

Viren · Post by **Viren** » Tue Mar 19, 2024 9:12 am

Its used after every passed patch yes: https://github.com/vondele/matetrack
There is also a zugzwang test suite for patches targetting those positions.

There is no utility running other test suites, their goal is too similar to real play and thus the result will be noise. If talkchess can prove otherwise using valid statistical methods, it can be explored further. So far, they have simply just shown a complete incompetency in understanding sample size, as well as posting positions in which multiple moves are winning.

Draude · Post by **Draude** » Tue Mar 19, 2024 9:13 am

smatovic wrote: ↑Tue Mar 19, 2024 9:02 am
Viren wrote: ↑Tue Mar 19, 2024 8:53 am @talkchess guys:

Maybe start to use your brain? We already have a test suite for mates that is used to reject patches:
https://github.com/official-stockfish/S ... 2002504682
Do you use this mate .epd regression test after every passed patch on fish-test?

Viren wrote: ↑Tue Mar 19, 2024 8:53 am Test suites cant be used to measure small differences accurately because they have a small sample size of positions, which may also be biased. They can also be wrong: All moves could actually draw, or multiple moves could actually win.
Then be smart*, it's not meant to measure Elo gain, therefore you have your SPRT self-play.

*testsuites are a moving target.

--
Srdja

Yes, you are completely correct! Multiple times I have refused to merge SF patches into my engines because they perform very bad at my test suite!

See my latest post to download

smatovic · Post by **smatovic** » Tue Mar 19, 2024 9:28 am

Draude wrote: ↑Tue Mar 19, 2024 9:13 am ...

Well done kiddo, you will make it.

--
Srdja

smatovic · Post by **smatovic** » Tue Mar 19, 2024 9:46 am

Viren wrote: ↑Tue Mar 19, 2024 9:12 am Its used after every passed patch yes: https://github.com/vondele/matetrack
There is also a zugzwang test suite for patches targetting those positions.

There is no utility running other test suites, their goal is too similar to real play and thus the result will be noise. If talkchess can prove otherwise using valid statistical methods, it can be explored further. So far, they have simply just shown a complete incompetency in understanding sample size, as well as posting positions in which multiple moves are winning.

As you prob. already know, to create sound testsuites is an art in itself, there are people maintaining those, maybe give them a try? IIRC there was f.e. STS 1-15 in different iterations. And, sure, you have to re-evaluate those periodically, the positions and the best-moves/scores. My point is, if you tune for Elo (e.g. aggressive pruning), you might loose in some other edge, as you should be aware (SF derivatives), to win games, to solve puzzles, to play styles differs, or alike.

--
Srdja

Future of NNUE is Dimension 3072 network!

Re: Future of NNUE is Dimension 3072 network!

Re: Future of NNUE is Dimension 3072 network!

Re: Future of NNUE is Dimension 3072 network!

Re: Future of NNUE is Dimension 3072 network!