You are hilarious ... and were from the beginning ... this is a dead horse for a deaf man.amchess wrote: ↑Fri Sep 29, 2023 9:01 pm No! This is not a three-coin toss because the probabilities of the outcomes are not equidistributed.
Indeed, with nnue, it is increasingly difficult to find positions that alternate between decisive result for a color or the draw.
Here, chess knowledge intervenes.
In general, business knowledge improves the testing strategy:
sw engineering exam.
I didn't allow myself to insult you, partly because I don't know you, but more importantly, I would never do that.
Since you don't think so, this is my last post in this thread
because this thread has become totally unconstructive.
I ask the moderators to intervene and hope to be heard this time.
It is not possible that one cannot discuss a topic civilly without boorish personal attacks, without even knowing the interlocutor.
Stockfish randomicity
Moderator: Ras
-
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: Stockfish randomicity
-
- Posts: 544
- Joined: Sun Sep 06, 2020 4:40 am
- Full name: Connor McMonigle
Re: Stockfish randomicity
This is why I wrote "weighted coin toss". The whole point is that the coin toss is unfair/weighted so I must presume that you've failed to understand my simplified explanation. The goal of a test is to uncover those weights (estimate the parameters of the underlying trinomial distribution) and doing so requires many samples. Even if you somehow had 200 test positions for which an engine's performance on said 200 positions corresponded perfectly to performance on all chess positions, you'd still need many samples to estimate the parameters due to the very "randomicity" you've independently re-discovered. The exact math behind the current state of the art GSPRT testing methodology can be found here: https://stat.columbia.edu/~jcliu/paper/GSPRT_SQA3.pdf.
If you want to take such great offense at the mere suggestion that your math skills are rusty, that's your problem.amchess wrote: ↑Fri Sep 29, 2023 9:01 pm ...
Indeed, with nnue, it is increasingly difficult to find positions that alternate between decisive result for a color or the draw.
Here, chess knowledge intervenes.
In general, business knowledge improves the testing strategy:
sw engineering exam.
I didn't allow myself to insult you, partly because I don't know you, but more importantly, I would never do that.
Since you don't think so, this is my last post in this thread
because this thread has become totally unconstructive.
I ask the moderators to intervene and hope to be heard this time.
It is not possible that one cannot discuss a topic civilly without boorish personal attacks, without even knowing the interlocutor.
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Stockfish randomicity
He's becoming self-aware
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 356
- Joined: Tue Dec 05, 2017 2:42 pm
Re: Stockfish randomicity
Certainly, my testing strategy is far from perfect and we are refining it, but in the long run (25+10) it certainly makes more sense than the 15,000 ultra-rapid time control games that only exacerbate the techniques of selectivity at the expense of depth, especially with the advent of nnue.
I have tried to explain it in every possible way.
In the CCRL, a version of ShashChess has long been ahead of the equivalent Stockfish version and they play several games there.
Because of this, on Discord, I was warned not to release any more engine versions under the threat of being banned!
Someone even stated, "It is unbearable to see ShashChess at the top of the rankings." I saved the posts as proof.
Obviously, for the sake of truth and because I hate any form of totalitarianism, I preferred to be banned: the trolls are the ones causing trouble.
I'm in good company though and by now it is sadly known:
https://github.com/mcostalba/Stockfish/ ... 1741688516
https://groups.google.com/g/fishcooking/c/e05Imm5ivqE
I don't want to deal with the behaviors of hypertrophic egos that absolutely refuse to question themselves. I am surprised that there is no netiquette of civility, but so be it.
Furthermore, the highest elo in the Stockfish community apparently does not reach the level of a master.
If they want to continue using a brute force approach in their testing strategy, they are free to be a whip for their own behinds. I'm stepping away from it, especially since the goal of ShashChess is NOT to be a bullet monster, but a versatile tool useful for players of all levels and types, as demonstrated by its (real) handicap mode.
Why do many strong corr GMs (a part me, IM) prefer ShashChess to Stockfish or at least consider it an equally viable and indispensable alternative?
I deeply respect the Stockfih community, but I demand as much and above all, I do not tolerate personal attacks that are outside the scope of programming.
It is deeply offensive to consider every engine a clone and to censor those who do not think as we do and who are otherwise respectful of community netiquette:
"I do not agree with what you have to say, but I'll defend to the death your right to say it." (Evelyn Beatrice Hall)
I have tried to explain it in every possible way.
In the CCRL, a version of ShashChess has long been ahead of the equivalent Stockfish version and they play several games there.
Because of this, on Discord, I was warned not to release any more engine versions under the threat of being banned!
Someone even stated, "It is unbearable to see ShashChess at the top of the rankings." I saved the posts as proof.
Obviously, for the sake of truth and because I hate any form of totalitarianism, I preferred to be banned: the trolls are the ones causing trouble.
I'm in good company though and by now it is sadly known:
https://github.com/mcostalba/Stockfish/ ... 1741688516
https://groups.google.com/g/fishcooking/c/e05Imm5ivqE
I don't want to deal with the behaviors of hypertrophic egos that absolutely refuse to question themselves. I am surprised that there is no netiquette of civility, but so be it.
Furthermore, the highest elo in the Stockfish community apparently does not reach the level of a master.
If they want to continue using a brute force approach in their testing strategy, they are free to be a whip for their own behinds. I'm stepping away from it, especially since the goal of ShashChess is NOT to be a bullet monster, but a versatile tool useful for players of all levels and types, as demonstrated by its (real) handicap mode.
Why do many strong corr GMs (a part me, IM) prefer ShashChess to Stockfish or at least consider it an equally viable and indispensable alternative?
I deeply respect the Stockfih community, but I demand as much and above all, I do not tolerate personal attacks that are outside the scope of programming.
It is deeply offensive to consider every engine a clone and to censor those who do not think as we do and who are otherwise respectful of community netiquette:
"I do not agree with what you have to say, but I'll defend to the death your right to say it." (Evelyn Beatrice Hall)
-
- Posts: 544
- Joined: Sun Sep 06, 2020 4:40 am
- Full name: Connor McMonigle
Re: Stockfish randomicity
amchess wrote: ↑Tue Oct 03, 2023 4:15 pm Certainly, my testing strategy is far from perfect and we are refining it, but in the long run (25+10) it certainly makes more sense than the 15,000 ultra-rapid time control games that only exacerbate the techniques of selectivity at the expense of depth, especially with the advent of nnue.
I have tried to explain it in every possible way.
In the CCRL, a version of ShashChess has long been ahead of the equivalent Stockfish version and they play several games there.
Because of this, on Discord, I was warned not to release any more engine versions under the threat of being banned!
Someone even stated, "It is unbearable to see ShashChess at the top of the rankings." I saved the posts as proof.
Obviously, for the sake of truth and because I hate any form of totalitarianism, I preferred to be banned: the trolls are the ones causing trouble.
I'm in good company though and by now it is sadly known:
https://github.com/mcostalba/Stockfish/ ... 1741688516
https://groups.google.com/g/fishcooking/c/e05Imm5ivqE
I don't want to deal with the behaviors of hypertrophic egos that absolutely refuse to question themselves. I am surprised that there is no netiquette of civility, but so be it.
Furthermore, the highest elo in the Stockfish community apparently does not reach the level of a master.
If they want to continue using a brute force approach in their testing strategy, they are free to be a whip for their own behinds. I'm stepping away from it, especially since the goal of ShashChess is NOT to be a bullet monster, but a versatile tool useful for players of all levels and types, as demonstrated by its (real) handicap mode.
Why do many strong corr GMs (a part me, IM) prefer ShashChess to Stockfish or at least consider it an equally viable and indispensable alternative?
I deeply respect the Stockfih community, but I demand as much and above all, I do not tolerate personal attacks that are outside the scope of programming.
It is deeply offensive to consider every engine a clone and to censor those who do not think as we do and who are otherwise respectful of community netiquette:
"I do not agree with what you have to say, but I'll defend to the death your right to say it." (Evelyn Beatrice Hall)
You still seemingly fail to comprehend the purpose of those "15,000 games" so I can only presume you either didn't read the paper I linked or failed to understand it. I'm happy to explain the math behind Stockfish's testing methodology (generalized sequential probability ratio testing) if you're actually curious. However, you seem more interested in congratulating yourself and bashing Stockfish developers which is especially ironic given Shashchess is >99% Stockfish with some other various bits of code ripped out of other Stockfish forks.
ShashChess is weaker and has always been weaker than the development version of Stockfish in every test of playing strength. The fact that ShashChess has appeared above Stockfish in rating lists is a consequence of the fact that you copy over all the latest Stockfish patches and release more frequently than the Stockfish team (+SSS). If you really believed Stockfish's testing methodology was broken and the patches written by a bunch of "low Elo non-masters", it's unclear why you'd copy every Stockfish patch into ShashChess. It's very unfortunate that testers waste their time testing your low quality clone as it seems to just further inflate your ego. It is evident that you are a novice programmer. The following alone pretty much invalidates any authority you claim to have on the subject of engine programming: https://github.com/amchess/ShashChess/b ... h.cpp#L481. ShashChess would be nothing without Stockfish.
-
- Posts: 5693
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Stockfish randomicity
Are you saying your engine is not a clone? If so, then you are deluded.
-
- Posts: 356
- Joined: Tue Dec 05, 2017 2:42 pm
Re: Stockfish randomicity
The underlying idea of ShashChess is to apply Alexander Shashin's theory to a modern chess engine in order to make it more flexible for longer time controls: at least equally strong in matches and a better solver of complex positions where selectivity techniques penalize it.
Shashin, besides being a renowned player, is also a nuclear physicist and, in an interview, he stimulated the idea of applying his thinking system to a chess engine.
I chose Stockfish because it is open-source and the strongest in short time controls.
For this reason, simply put, it has the Stockfish code with the additional part you linked. It didn't make sense to reinvent the wheel.
Could the code be written better? Of course, nobody is perfect, and I am always ready to welcome suggestions.
So, I appreciate the work that has been done and I don't systematically spew mud.
The article you sent, for which I thank you, provides a testing strategy for short time controls, but due to selectivity techniques, the indicated methodology doesn't make sense for longer time controls.
This is also why the strong correspondence players I know don't particularly like this engine.
Since I also play correspondence chess, another motivation was to provide a more suitable tool.
The tests I have conducted for matches and complex positions have ALWAYS been done with the same base Stockfish engine version.
In any case, banning those who think differently demonstrates an inability/refusal to engage in civil discussion. This indeed qualifies certain individuals and removes any authority they may have.
Shashin, besides being a renowned player, is also a nuclear physicist and, in an interview, he stimulated the idea of applying his thinking system to a chess engine.
I chose Stockfish because it is open-source and the strongest in short time controls.
For this reason, simply put, it has the Stockfish code with the additional part you linked. It didn't make sense to reinvent the wheel.
Could the code be written better? Of course, nobody is perfect, and I am always ready to welcome suggestions.
So, I appreciate the work that has been done and I don't systematically spew mud.
The article you sent, for which I thank you, provides a testing strategy for short time controls, but due to selectivity techniques, the indicated methodology doesn't make sense for longer time controls.
This is also why the strong correspondence players I know don't particularly like this engine.
Since I also play correspondence chess, another motivation was to provide a more suitable tool.
The tests I have conducted for matches and complex positions have ALWAYS been done with the same base Stockfish engine version.
In any case, banning those who think differently demonstrates an inability/refusal to engage in civil discussion. This indeed qualifies certain individuals and removes any authority they may have.
-
- Posts: 1953
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Stockfish randomicity
This thread is a gold mine.
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Stockfish randomicity
We resort to mockery because it's just impossible to have a discussion with you. You deal with [practically] unfalsifiable statements, handwave every argument with anecdotal remarks, or just plain deny/lie about easily verifiable facts. Not to mention you contradict yourself all the time. I can't take you seriously at this point.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 10788
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish randomicity
I understand that you claim that you believe the other engine is stronger than stockfish at long time control.amchess wrote: ↑Tue Sep 26, 2023 2:00 pm Chess knowledge is not only to play the best or worst moves, but also to find as comprehensive/significant test samples as possible.
I tested Stockfish at long times (25+10) against an engine of similar strength over 200 games. The test took more than 10 days and I ran it twice. In the first, Stockfish won with 3 games to spare. In the other, the result was the opposite.
As for the tests at ultra-long times over thousands of games, they exalt patches that pruning very hard. In fact, some derivatives such as Crystal and ShashChess show how they are able to resolve many more hard positions than basic Stockfish at non ultrafast times, precisely because they do not make too much use of selectivity techniques.
So, in my opinion, testing over thousands of games is only good if you want to get a bullet monster.
I think that it is an interesting claim but you clearly need more games to find if you are correct.
If you believe the engines are equal at 25+10 when the other engine is stronger at longer time control then it may be interesting to test at time control of 250+100 and use SPRT to show that the other engine is stronger.
You may need something in the order of 20,000 days of computer time for it but it can take less than it if a lot of people donate computer time for the test.
Alternatively you can use only 25+10 time control and try to prove not that the engine is better but there is no regression in playing strength at that time control by SPRT.
In that case maybe you can expect to get conclusions after 1/10 of the time.
I think that this type of tests are interesting but for some reason people who donate computer time prefer to donate the time only for testing at bullet or blitz.
I wonder if you can convince part of them to change their choice and there is no problem if one test take some months instead of one day.