Shashin theory

Graham Banks · Post by **Graham Banks** » Wed Sep 25, 2024 11:08 pm

Viz wrote: ↑Wed Sep 25, 2024 10:05 pm I recall stockfish dev vs some previous stockfish from startposition played on fishtest.
And it had the most insane variance from any other test, literally some workers showing +100 elo and some -100 elo and this was stable for this workers.
Problem was that some were underclocked and some were overclocked, so some ran at 10.1+0.101 and some were running at 9.9+0.099.
and this was the sole reason why it would get from -100 to +100 elo from the same position at the single core.
And there you show some "wow, look at this, I made a checkbox and games are completely different, must be a lot of work" - hell, no.
You can achieve the same if not bigger result by not changing engines at all but changing time / game (from my example change doesn't even need to be big), hash, threads and other stuff.
This is constantly shown at so-called alt-finals at navratil, this guy just replays TCEC finals on more powerful hardware but with the same nps ratio. Game pair win for stockfish at TCEC can change to game pair win for leela and game pair win for Leela at TCEC can change to game pair win for SF. Not even talking about such "minor" things as double win becoming double draws and double draws becoming double wins - this also happens a lot.
And trust me this would also happen if he used exactly the same hardware for both engines as TCEC does.

I guess I was thinking that it was similar to being able to make personalities in Chessmaster by changing the values of aspects like material values, king safety, pawn structure, passed pawn awareness, mobility, etc.

Rebel · Post by **Rebel** » Wed Sep 25, 2024 11:13 pm

Peter Berger wrote: ↑Wed Sep 25, 2024 8:10 pm I had read this one, but I didn't think it necessarily amounted to much. If you look at the strange README on Shashin in Shash, where do these +-, += etc even come from to put you into Major Petrosian or Minor Tal mode? From Stockfish would be my guess.
So I assumed Shashchess and Stockfish might be completely identical at depth=1 without this meaning anything too interesting or new.
And then maybe it changes some search parameters based on low level depths at deeper plies. This can't be too much of a deal anyway, as in my personal tests Shashchess and Stockfish behave in a very similar way when it is about chess moves chosen also at higher depths - not to forget Shashchess loses nearly no strength compared to Stockfish.
I didn't look at the source code at all as I am no programmer, so I don't expect to be able to detect anything the ICGA wouldn't have seen anyway during their two month investigation of things.
We share the same impression on the implementation of the "Shashin theory" here - my personal bet is that this is mostly bullshit - but now me, I am out, as I simply lack the knowledge to do more than "suspecting".

My opinion as well, simple reason - if Shashin theory is so special it would have been the default setting.

Rebel · Post by **Rebel** » Wed Sep 25, 2024 11:22 pm

Graham Banks wrote: ↑Wed Sep 25, 2024 11:08 pm
Viz wrote: ↑Wed Sep 25, 2024 10:05 pm I recall stockfish dev vs some previous stockfish from startposition played on fishtest.
And it had the most insane variance from any other test, literally some workers showing +100 elo and some -100 elo and this was stable for this workers.
Problem was that some were underclocked and some were overclocked, so some ran at 10.1+0.101 and some were running at 9.9+0.099.
and this was the sole reason why it would get from -100 to +100 elo from the same position at the single core.
And there you show some "wow, look at this, I made a checkbox and games are completely different, must be a lot of work" - hell, no.
You can achieve the same if not bigger result by not changing engines at all but changing time / game (from my example change doesn't even need to be big), hash, threads and other stuff.
This is constantly shown at so-called alt-finals at navratil, this guy just replays TCEC finals on more powerful hardware but with the same nps ratio. Game pair win for stockfish at TCEC can change to game pair win for leela and game pair win for Leela at TCEC can change to game pair win for SF. Not even talking about such "minor" things as double win becoming double draws and double draws becoming double wins - this also happens a lot.
And trust me this would also happen if he used exactly the same hardware for both engines as TCEC does.
I guess I was thinking that it was similar to being able to make personalities in Chessmaster by changing the values of aspects like material values, king safety, pawn structure, passed pawn awareness, mobility, etc.

That technique is not possible any longer with NNUE because you only get an eval value from the NN with no strings attached. Of course you can fiddle the score with HCE code and make it a personality, a bit of the world upside down.

peter · Post by **peter** » Wed Sep 25, 2024 11:24 pm

Why not simply compare output at single positions?
Tactical single best move game changers out of Tal positions (with clear advantage of side to move) showing better time to solution with High Tal checked are numerous and well known to all users testing with such positions now and then, with single positions and with suites, most of those "classical" collections contain mainly such more or less difficult (as for hardware- time to be solved) winners, so easiest way was to show better results at such suites, especially those containing composed studies too, disregarding how much it means as for game playing or not, differences get visible that way most quickly and clearly.

And there are positions with single best moves as for defending positions out of disadvantage for side to move too, just one example of such a Petrosian- position from corr. chess, found in two games of 2021, one is Petrov M.-Sikorsky H., bm 19...Qd7:

Code: Select all

ShashChess 36 by A. Manzo, F. Ferraguti, K. Kiniama and Stockfish developers (see AUTHORS file)
position fen r1bq2rk/ppp5/3p1nnb/P1PPp1pp/1P2Pp2/2N2P2/3BBNPP/R2Q2RK b - - 0 19
setoption name hash value 2048
setoption name High Petrosian value true
go depth 40
...
info depth 32 seldepth 39 multipv 1 score cp -66 wdl 6 738 256 upperbound nodes 28074653 nps 1115135 hashfull 90 tbhits 0 time 25176 pv g8g7 a1a2
...
info depth 32 seldepth 44 multipv 1 score cp -61 wdl 7 768 225 lowerbound nodes 38169000 nps 1115922 hashfull 130 tbhits 0 time 34204 pv d8d7
...
info depth 32 seldepth 44 multipv 1 score cp -58 wdl 7 784 209 nodes 39897910 nps 1115120 hashfull 136 tbhits 0 time 35779 pv d8d7 a1c1 g5g4 c5d6 c7d6 c3b5 g4g3 h2g3 d7d8 f2h3 g8g7 e2f1 a7a6 b5a3 f6g8 g3f4 g6f4 d1e1 d8f6 d2f4 h6f4
...
info depth 40 seldepth 52 multipv 1 score cp -64 wdl 6 749 245 nodes 119065944 nps 1116302 hashfull 400 tbhits 0 time 106661 pv d8d7 a1c1 g5g4 c5d6 c7d6 c3b5 g4g3 h2g3 d7d8 f2h3 g8g7 c1c8 a8c8 b5a7 c8a8 a7c6 b7c6 d5c6 f6h7 b4b5 h7g5 b5b6 g5h3 g2h3 d8c8 g3g4 c8c6 g4g5 c6d7 d1f1 g6h4 a5a6 h6g5 e2b5 d7f7 b6b7
bestmove d8d7 ponder a1c1

So at depth 32 after about 34 seconds best move is found and kept stable in output then.
And default:

Code: Select all

ShashChess 36 by A. Manzo, F. Ferraguti, K. Kiniama and Stockfish developers (see AUTHORS file)
position fen r1bq2rk/ppp5/3p1nnb/P1PPp1pp/1P2Pp2/2N2P2/3BBNPP/R2Q2RK b - - 0 19
setoption name hash value 2048
go depth 40
...
info depth 40 seldepth 62 multipv 1 score cp -65 wdl 6 746 248 nodes 273285222 nps 1074656 hashfull 712 tbhits 0 time 254300 pv g8g7 a1a2 c8d7 a5a6 b7a6 e2a6 g5g4 c5c6 d7c8 a6c8 d8c8 f2d3 a7a6 d1e2 g6h4 d2e1 h4g6 b4b5 a6b5 a2a8 c8a8 c3b5 h6g5 e1f2 h8h7 f2a7 h7h6 g1b1 g4f3 e2f3 a8c8 b1a1 c8g8 f3e2 g5h4 a7g1
bestmove g8g7 ponder a1a2

It's just one example of a maybe not surely game changing move, yet it's a clear single best move. And not even that would be the point to be discussed here, seeing a clear difference in output- lines over ponder- time single threaded on same hardware with same amount of hash proves a difference in search and eval between the two settings, doesn't it?

noobpwnftw · Post by **noobpwnftw** » Thu Sep 26, 2024 1:10 am

Nobody said there is no difference, people are just saying there is less than 3% difference between ShashChess and Stockfish regardless of settings while there is more than 30% difference between Stockfish's own versions under the same conditions.

Not to mention across various other engines, such ratio is still consistent. So whatever the difference there is, it is pretty insignificant. Even the code that made such differences aren't really original, they are taken from Crystal, I suspect using just Crystal would solve those positions even better and what does any of that have anything to do with Shashin theory?

Hai · Post by **Hai** » Thu Sep 26, 2024 4:42 pm

noobpwnftw wrote: ↑Thu Sep 26, 2024 1:10 am Nobody said there is no difference, people are just saying there is less than 3% difference between ShashChess and Stockfish regardless of settings while there is more than 30% difference between Stockfish's own versions under the same conditions.

Not to mention across various other engines, such ratio is still consistent. So whatever the difference there is, it is pretty insignificant. Even the code that made such differences aren't really original, they are taken from Crystal, I suspect using just Crystal would solve those positions even better and what does any of that have anything to do with Shashin theory?

Try the Top Chess Engines Testsuite 2024 v2.
https://www.mediafire.com/file/cypaz2t0 ... 2.pgn/file
Stockfish 16.1 (20%) 23/115
Stockfish 17 (44%) 51/115
ShashChess 35 High Tal + MultiPV=4 + MCTS ON + MCTSThreads = 2 (80%) 93/115

It's very clear that ShashChess is much better suitable to find something, when you have a position on the board which has something.
If you have a position on the board which has nothing, then Stockfish will have the more precise evaluation in +0.01 steps.
The people must decide what they prefer to use.

noobpwnftw · Post by **noobpwnftw** » Thu Sep 26, 2024 5:21 pm

Not only it is a sample of 115, but also your definition of the "suitable" is entirely unfounded for even the very basics like threads and MPV are inconsistent in your result and whatever move they may find is right for you is entirely subjective. Even with that, why not compare it with Crystal where the relevant code was yoinked from?

None of that addresses the not so much difference of any significance issue though, and again what does it have anything to do with Shashin theory?

chrisw · Post by **chrisw** » Thu Sep 26, 2024 8:29 pm

Powerful take on Shashin to add to the discussion ...

peter · Post by **peter** » Fri Sep 27, 2024 10:45 am

noobpwnftw wrote: ↑Thu Sep 26, 2024 1:10 am Nobody said there is no difference, people are just saying there is less than 3% difference between ShashChess and Stockfish regardless of settings while there is more than 30% difference between Stockfish's own versions under the same conditions.

That depends on the conditions and corresponding to those you have to have a relation between "differences" in performances or similarities and the test- specific error bars. With each and every test you have immanent statistical confidence depending on openings (if it's about game playing), hardware- TC and pool of engines running.

Not to mention across various other engines, such ratio is still consistent. So whatever the difference there is, it is pretty insignificant. Even the code that made such differences aren't really original, they are taken from Crystal, I suspect using just Crystal would solve those positions even better and what does any of that have anything to do with Shashin theory?

The ratios are as consistent als the tests are. What doesn't say they are transitive to each other at all. You may have consistent tests that are statistically significant on their own but are not to be compared at all to other ones, still also consistent and significant too, e.g. you may have perfectly consistent tests of positional testing "only" with exaclty determined error bars of their own and still (of course) cannot compare them in any way to game playing tests of certain openings, hardware- TC and engine- pool. You even cannot compare game playing tests "only" one to each other one, if openings, hardware- TCs and or engine- pool differ too much to each other.
That all said now to you question about

< what does any of that have anything to with Shashin theory? >

We are (I am) here and now talking about Shashin theory as Andrea Manzo translated it into Stockfish- code and made it usable with UCI- parameters, as it's core is shown on ShashChess- github- site with the table of correlations between evals of positons and Tal to Petrosian classification.
What I wanted to show with the single one position above was the difference of output per time with and without option Petrosian checked for a Petrosion- position as for its classification per eval.

To see some more difference with some more positions I here have 1111 of those, that are to a very big part Tal- positions and did let them run as a suite with a TC of 1"/pos. in MEA with 4 threads of a 16x3.5GHz CPU, 32Mb hash. 2 settings of ShashChess together with SF dev. (240917) as well as SF17 and SF16.1, ShashChess 1x with High Tal false (default), 1x with High Tal true. Error bar for this one suite, engine- pool and hardware- TC is about 1.5% of Total Rate. In column Hash for Lc0 NN-cache replaces hash of A-B-engines and for this one engine a 3070ti Nvidia GPU is used too.

Code: Select all

    EPD  : 1111.epd
    Time : 1000 ms
                                                   Max   Total   Time   Hash          
    Engine              Score   Found  Pos   ELO  Score   Rate    ms     Mb  Cpu      
 1  ShashChess36HTon    29051    980  1111  3568  36615  79.3%   1000    32    4
 2  Crystal240503       28659    961  1111  3523  36615  78.3%   1000    32    4
 3  Stockfish0917       28253    971  1111  3474  36615  77.2%   1000    32    4
 4  ShashChess36        28011    961  1111  3442  36615  76.5%   1000    32    4
 5  Stockfish17         27754    959  1111  3411  36615  75.8%   1000    32    4
 6  SF16.1              27639    945  1111  3379  36815  75.1%   1000    32    4
 7  Lc0v0.31.1-6147500PT27056    939  1111  3325  36615  73.9%   1000   100    2
 8  Dragon3.3           24568    898  1111  3001  36815  66.7%   1000    32    4

                                    Created with MEA
                                          by
                                       Ferdinand
                                         Mosca

These are the positions, collected from Eret, Arasan 21- suite, HTC and some easy (as for hardware- TC to be solved with) composed endgame- studies:

https://drive.google.com/file/d/1kVmfmK ... sp=sharing

, regards

peter · Post by **peter** » Fri Sep 27, 2024 11:53 am

Edit- time over.

peter wrote: ↑Fri Sep 27, 2024 10:45 am That all said now to you question about

< what does any of that have anything to with Shashin theory? >

should read <anything to do with> instead, mistyped leaving away one <do>, yet in direct quote from posting answered to it was correct anyhow.
And in listing of the suites the 1111 positions were taken from, I forgot to mention STS (Strategic Test Suite) too, about 590 positions out of this one are used too, "only" those deriving from there, that don't have too many multiple solutions, thus fitting to the single best move- solutions of the other sources. The suite can therefore be used in GUIs instead of MEA too, multiple solutions then are simply adjudicated equally as solved, being listed as more than one with bm- syntax, only in MEA of course the points per solution- principle works.

Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory

Re: Shashin theory