Stockfish randomicity

Guenther · Post by **Guenther** » Fri Sep 29, 2023 9:36 pm

amchess wrote: ↑Fri Sep 29, 2023 9:01 pm No! This is not a three-coin toss because the probabilities of the outcomes are not equidistributed.
Indeed, with nnue, it is increasingly difficult to find positions that alternate between decisive result for a color or the draw.
Here, chess knowledge intervenes.
In general, business knowledge improves the testing strategy:
sw engineering exam.
I didn't allow myself to insult you, partly because I don't know you, but more importantly, I would never do that.
Since you don't think so, this is my last post in this thread
because this thread has become totally unconstructive.
I ask the moderators to intervene and hope to be heard this time.
It is not possible that one cannot discuss a topic civilly without boorish personal attacks, without even knowing the interlocutor.

You are hilarious ... and were from the beginning ... this is a dead horse for a deaf man.

connor_mcmonigle · Post by **connor_mcmonigle** » Sat Sep 30, 2023 4:31 am

amchess wrote: ↑Fri Sep 29, 2023 9:01 pm No! This is not a three-coin toss because the probabilities of the outcomes are not equidistributed.
...

This is why I wrote "weighted coin toss". The whole point is that the coin toss is unfair/weighted so I must presume that you've failed to understand my simplified explanation. The goal of a test is to uncover those weights (estimate the parameters of the underlying trinomial distribution) and doing so requires many samples. Even if you somehow had 200 test positions for which an engine's performance on said 200 positions corresponded perfectly to performance on all chess positions, you'd still need many samples to estimate the parameters due to the very "randomicity" you've independently re-discovered. The exact math behind the current state of the art GSPRT testing methodology can be found here: https://stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf.

amchess wrote: ↑Fri Sep 29, 2023 9:01 pm ...
Indeed, with nnue, it is increasingly difficult to find positions that alternate between decisive result for a color or the draw.
Here, chess knowledge intervenes.
In general, business knowledge improves the testing strategy:
sw engineering exam.
I didn't allow myself to insult you, partly because I don't know you, but more importantly, I would never do that.
Since you don't think so, this is my last post in this thread
because this thread has become totally unconstructive.
I ask the moderators to intervene and hope to be heard this time.
It is not possible that one cannot discuss a topic civilly without boorish personal attacks, without even knowing the interlocutor.

If you want to take such great offense at the mere suggestion that your math skills are rusty, that's your problem.

Sopel · Post by **Sopel** » Sat Sep 30, 2023 6:48 pm

amchess wrote: ↑Tue Sep 26, 2023 2:00 pm I tested Stockfish at long times (25+10) against an engine of similar strength over 200 games. The test took more than 10 days and I ran it twice. In the first, Stockfish won with 3 games to spare. In the other, the result was the opposite.

He's becoming self-aware

amchess · Post by **amchess** » Tue Oct 03, 2023 4:15 pm

Certainly, my testing strategy is far from perfect and we are refining it, but in the long run (25+10) it certainly makes more sense than the 15,000 ultra-rapid time control games that only exacerbate the techniques of selectivity at the expense of depth, especially with the advent of nnue.
I have tried to explain it in every possible way.
In the CCRL, a version of ShashChess has long been ahead of the equivalent Stockfish version and they play several games there.
Because of this, on Discord, I was warned not to release any more engine versions under the threat of being banned!
Someone even stated, "It is unbearable to see ShashChess at the top of the rankings." I saved the posts as proof.
Obviously, for the sake of truth and because I hate any form of totalitarianism, I preferred to be banned: the trolls are the ones causing trouble.

I'm in good company though and by now it is sadly known:
https://github.com/mcostalba/Stockfish/ ... 1741688516
https://groups.google.com/g/fishcooking/c/e05Imm5ivqE

I don't want to deal with the behaviors of hypertrophic egos that absolutely refuse to question themselves. I am surprised that there is no netiquette of civility, but so be it.

Furthermore, the highest elo in the Stockfish community apparently does not reach the level of a master.
If they want to continue using a brute force approach in their testing strategy, they are free to be a whip for their own behinds. I'm stepping away from it, especially since the goal of ShashChess is NOT to be a bullet monster, but a versatile tool useful for players of all levels and types, as demonstrated by its (real) handicap mode.
Why do many strong corr GMs (a part me, IM) prefer ShashChess to Stockfish or at least consider it an equally viable and indispensable alternative?
I deeply respect the Stockfih community, but I demand as much and above all, I do not tolerate personal attacks that are outside the scope of programming.
It is deeply offensive to consider every engine a clone and to censor those who do not think as we do and who are otherwise respectful of community netiquette:
"I do not agree with what you have to say, but I'll defend to the death your right to say it." (Evelyn Beatrice Hall)

connor_mcmonigle · Post by **connor_mcmonigle** » Tue Oct 03, 2023 9:09 pm

amchess wrote: ↑Tue Oct 03, 2023 4:15 pm Certainly, my testing strategy is far from perfect and we are refining it, but in the long run (25+10) it certainly makes more sense than the 15,000 ultra-rapid time control games that only exacerbate the techniques of selectivity at the expense of depth, especially with the advent of nnue.
I have tried to explain it in every possible way.
In the CCRL, a version of ShashChess has long been ahead of the equivalent Stockfish version and they play several games there.
Because of this, on Discord, I was warned not to release any more engine versions under the threat of being banned!
Someone even stated, "It is unbearable to see ShashChess at the top of the rankings." I saved the posts as proof.
Obviously, for the sake of truth and because I hate any form of totalitarianism, I preferred to be banned: the trolls are the ones causing trouble.

I'm in good company though and by now it is sadly known:
https://github.com/mcostalba/Stockfish/ ... 1741688516
https://groups.google.com/g/fishcooking/c/e05Imm5ivqE

I don't want to deal with the behaviors of hypertrophic egos that absolutely refuse to question themselves. I am surprised that there is no netiquette of civility, but so be it.

Furthermore, the highest elo in the Stockfish community apparently does not reach the level of a master.
If they want to continue using a brute force approach in their testing strategy, they are free to be a whip for their own behinds. I'm stepping away from it, especially since the goal of ShashChess is NOT to be a bullet monster, but a versatile tool useful for players of all levels and types, as demonstrated by its (real) handicap mode.
Why do many strong corr GMs (a part me, IM) prefer ShashChess to Stockfish or at least consider it an equally viable and indispensable alternative?
I deeply respect the Stockfih community, but I demand as much and above all, I do not tolerate personal attacks that are outside the scope of programming.
It is deeply offensive to consider every engine a clone and to censor those who do not think as we do and who are otherwise respectful of community netiquette:
"I do not agree with what you have to say, but I'll defend to the death your right to say it." (Evelyn Beatrice Hall)

You still seemingly fail to comprehend the purpose of those "15,000 games" so I can only presume you either didn't read the paper I linked or failed to understand it. I'm happy to explain the math behind Stockfish's testing methodology (generalized sequential probability ratio testing) if you're actually curious. However, you seem more interested in congratulating yourself and bashing Stockfish developers which is especially ironic given Shashchess is >99% Stockfish with some other various bits of code ripped out of other Stockfish forks.

ShashChess is weaker and has always been weaker than the development version of Stockfish in every test of playing strength. The fact that ShashChess has appeared above Stockfish in rating lists is a consequence of the fact that you copy over all the latest Stockfish patches and release more frequently than the Stockfish team (+SSS). If you really believed Stockfish's testing methodology was broken and the patches written by a bunch of "low Elo non-masters", it's unclear why you'd copy every Stockfish patch into ShashChess. It's very unfortunate that testers waste their time testing your low quality clone as it seems to just further inflate your ego. It is evident that you are a novice programmer. The following alone pretty much invalidates any authority you claim to have on the subject of engine programming: https://github.com/amchess/ShashChess/b ... h.cpp#L481. ShashChess would be nothing without Stockfish.

syzygy · Post by **syzygy** » Tue Oct 03, 2023 11:37 pm

amchess wrote: ↑Tue Oct 03, 2023 4:15 pmIt is deeply offensive to consider every engine a clone

Are you saying your engine is not a clone? If so, then you are deluded.

amchess · Post by **amchess** » Tue Oct 03, 2023 11:50 pm

The underlying idea of ShashChess is to apply Alexander Shashin's theory to a modern chess engine in order to make it more flexible for longer time controls: at least equally strong in matches and a better solver of complex positions where selectivity techniques penalize it.

Shashin, besides being a renowned player, is also a nuclear physicist and, in an interview, he stimulated the idea of applying his thinking system to a chess engine.

I chose Stockfish because it is open-source and the strongest in short time controls.

For this reason, simply put, it has the Stockfish code with the additional part you linked. It didn't make sense to reinvent the wheel.

Could the code be written better? Of course, nobody is perfect, and I am always ready to welcome suggestions.

So, I appreciate the work that has been done and I don't systematically spew mud.

The article you sent, for which I thank you, provides a testing strategy for short time controls, but due to selectivity techniques, the indicated methodology doesn't make sense for longer time controls.

This is also why the strong correspondence players I know don't particularly like this engine.

Since I also play correspondence chess, another motivation was to provide a more suitable tool.

The tests I have conducted for matches and complex positions have ALWAYS been done with the same base Stockfish engine version.

In any case, banning those who think differently demonstrates an inability/refusal to engage in civil discussion. This indeed qualifies certain individuals and removes any authority they may have.

AndrewGrant · Post by **AndrewGrant** » Wed Oct 04, 2023 10:34 am

This thread is a gold mine.

Sopel · Post by **Sopel** » Wed Oct 04, 2023 10:41 am

We resort to mockery because it's just impossible to have a discussion with you. You deal with [practically] unfalsifiable statements, handwave every argument with anecdotal remarks, or just plain deny/lie about easily verifiable facts. Not to mention you contradict yourself all the time. I can't take you seriously at this point.

Uri Blass · Post by **Uri Blass** » Thu Oct 05, 2023 8:20 am

amchess wrote: ↑Tue Sep 26, 2023 2:00 pm Chess knowledge is not only to play the best or worst moves, but also to find as comprehensive/significant test samples as possible.
I tested Stockfish at long times (25+10) against an engine of similar strength over 200 games. The test took more than 10 days and I ran it twice. In the first, Stockfish won with 3 games to spare. In the other, the result was the opposite.
As for the tests at ultra-long times over thousands of games, they exalt patches that pruning very hard. In fact, some derivatives such as Crystal and ShashChess show how they are able to resolve many more hard positions than basic Stockfish at non ultrafast times, precisely because they do not make too much use of selectivity techniques.
So, in my opinion, testing over thousands of games is only good if you want to get a bullet monster.

I understand that you claim that you believe the other engine is stronger than stockfish at long time control.

I think that it is an interesting claim but you clearly need more games to find if you are correct.

If you believe the engines are equal at 25+10 when the other engine is stronger at longer time control then it may be interesting to test at time control of 250+100 and use SPRT to show that the other engine is stronger.

You may need something in the order of 20,000 days of computer time for it but it can take less than it if a lot of people donate computer time for the test.

Alternatively you can use only 25+10 time control and try to prove not that the engine is better but there is no regression in playing strength at that time control by SPRT.

In that case maybe you can expect to get conclusions after 1/10 of the time.

I think that this type of tests are interesting but for some reason people who donate computer time prefer to donate the time only for testing at bullet or blitz.
I wonder if you can convince part of them to change their choice and there is no problem if one test take some months instead of one day.

Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity

Re: Stockfish randomicity