SPCC: Testruns of Stockfish 220917 finished

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
pohl4711
Posts: 2821
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

SPCC: Testruns of Stockfish 220917 finished

Post by pohl4711 »

Ratinglist- and regression-testruns of Stockfish 220917 finished.


https://www.sp-cc.de

Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm

(Perhaps you have to clear your browsercache or reload the website)
mehmet123
Posts: 692
Joined: Sun Jan 26, 2020 10:38 pm
Location: Turkey
Full name: Mehmet Karaman

Re: SPCC: Testruns of Stockfish 220917 finished

Post by mehmet123 »

Elo difference at Rating List and VLTC Regression Test is very different.
I dont think its relevant about the patch (simplfy trend and optimism), because there isnt any radical change at the code. NCM test show only -1.3 elo difference and this is a very smalll elo change. Some new values at search codes are very effective at long time controls and time control at VLTC is more than 3x according to Ratlng List test.
User avatar
pohl4711
Posts: 2821
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: SPCC: Testruns of Stockfish 220917 finished

Post by pohl4711 »

mehmet123 wrote: Wed Sep 21, 2022 8:09 pm Elo difference at Rating List and VLTC Regression Test is very different.
I dont think its relevant about the patch (simplfy trend and optimism), because there isnt any radical change at the code. NCM test show only -1.3 elo difference and this is a very smalll elo change. Some new values at search codes are very effective at long time controls and time control at VLTC is more than 3x according to Ratlng List test.
IMHO the "problem" is, that new SF plays a little bit more "drawish". In my ratinglist-testrun, the draw-rate of SF 220917 is 62% (and SF 220907 (same opponents) had "only" 60.3% draws)). This reduces the number of wins for SF 220917 and the score (and the Elo).
In my VLTC Regression-testruns, I use my UHO-openings (otherwise there would be 95%+ draws). Using these openings reduces the draw-rate massively and this "hides" the fact, that one engine-version plays more drawish than another one... So in this test-setup, we see a progress and in the ratinglis-testsetup we see a regress. My 2 cents...
Lazy_Frank
Posts: 74
Joined: Mon Jul 23, 2018 10:56 pm
Location: Latvia
Full name: Raivis Baumanis

Re: SPCC: Testruns of Stockfish 220917 finished

Post by Lazy_Frank »

Let me give you analogy with bridge (card game, as well interactive two players (in bridge two pairs) game).

Let's assume you always with partner get the deals where you can bid 3NT and play such contracts (balance and suits structure is well suited for that).

After thousands or millions games you can say: i know everything how to bid and play 3NT.
Fine. Sounds great and true.

But you do not have big clue how to defend 3NT,
because you always get the deals where can bid 3NT (believe me, in defense everything looks much more complicated then from declarer side).

Also you do not have clue (in practical sense) about other contracts besides 3NT as small or grand slam contracts (for example 6 clubs) apart from general game principles.
Jouni
Posts: 3715
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: SPCC: Testruns of Stockfish 220917 finished

Post by Jouni »

Sorry, but I find EAS testing total waste of time. Only thing that matters is result. Remember SF 200 moves shuffling wins at TCEC. 1-0!
Jouni