SPCC: Testruns of Stockfish 220917 finished

pohl4711 · Post by **pohl4711** » Wed Sep 21, 2022 5:28 pm

Ratinglist- and regression-testruns of Stockfish 220917 finished.

https://www.sp-cc.de

Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm

(Perhaps you have to clear your browsercache or reload the website)

mehmet123 · Post by **mehmet123** » Wed Sep 21, 2022 8:09 pm

Elo difference at Rating List and VLTC Regression Test is very different.
I dont think its relevant about the patch (simplfy trend and optimism), because there isnt any radical change at the code. NCM test show only -1.3 elo difference and this is a very smalll elo change. Some new values at search codes are very effective at long time controls and time control at VLTC is more than 3x according to Ratlng List test.

pohl4711 · Post by **pohl4711** » Thu Sep 22, 2022 9:22 am

mehmet123 wrote: ↑Wed Sep 21, 2022 8:09 pm Elo difference at Rating List and VLTC Regression Test is very different.
I dont think its relevant about the patch (simplfy trend and optimism), because there isnt any radical change at the code. NCM test show only -1.3 elo difference and this is a very smalll elo change. Some new values at search codes are very effective at long time controls and time control at VLTC is more than 3x according to Ratlng List test.

IMHO the "problem" is, that new SF plays a little bit more "drawish". In my ratinglist-testrun, the draw-rate of SF 220917 is 62% (and SF 220907 (same opponents) had "only" 60.3% draws)). This reduces the number of wins for SF 220917 and the score (and the Elo).
In my VLTC Regression-testruns, I use my UHO-openings (otherwise there would be 95%+ draws). Using these openings reduces the draw-rate massively and this "hides" the fact, that one engine-version plays more drawish than another one... So in this test-setup, we see a progress and in the ratinglis-testsetup we see a regress. My 2 cents...

Lazy_Frank · Post by **Lazy_Frank** » Thu Sep 22, 2022 1:43 pm

Let me give you analogy with bridge (card game, as well interactive two players (in bridge two pairs) game).

Let's assume you always with partner get the deals where you can bid 3NT and play such contracts (balance and suits structure is well suited for that).

After thousands or millions games you can say: i know everything how to bid and play 3NT.
Fine. Sounds great and true.

But you do not have big clue how to defend 3NT,
because you always get the deals where can bid 3NT (believe me, in defense everything looks much more complicated then from declarer side).

Also you do not have clue (in practical sense) about other contracts besides 3NT as small or grand slam contracts (for example 6 clubs) apart from general game principles.

Jouni · Post by **Jouni** » Sat Sep 24, 2022 7:54 pm

Sorry, but I find EAS testing total waste of time. Only thing that matters is result. Remember SF 200 moves shuffling wins at TCEC. 1-0!

SPCC: Testruns of Stockfish 220917 finished

SPCC: Testruns of Stockfish 220917 finished

Re: SPCC: Testruns of Stockfish 220917 finished

Re: SPCC: Testruns of Stockfish 220917 finished

Re: SPCC: Testruns of Stockfish 220917 finished

Re: SPCC: Testruns of Stockfish 220917 finished