Stockfish 2.3.1 weaker than 2.2.2?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

melajara
Posts: 213
Joined: Thu Dec 16, 2010 4:39 pm

Stockfish 2.3.1 weaker than 2.2.2?

Post by melajara »

Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
Per ardua ad astra
Modern Times
Posts: 3546
Joined: Thu Jun 07, 2012 11:02 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Modern Times »

At CCRL blitz it is about the same

At CCRL chess960 it is a little stronger

but nowhere near as many games as IPON.
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by ZirconiumX »

Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by gladius »

Agreed, the results are very disappointing. The improvements were tested against Stockfish 2.2.2. It seems while they were good in heads up matches, they made things worse against weaker opponents, and didn't help against stronger ones.
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Uri Blass »

ZirconiumX wrote:Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out
I am not sure and the question is if the stockfish team used self-play
not at bullet time control.

maybe at bullet time control stockfish is better but not at blitz time control.
bupalo
Posts: 82
Joined: Fri Mar 16, 2012 2:04 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by bupalo »

We should wait for Graham's 8 core tournament next weekend.
There Stochfish is number 1 or 2
zamar
Posts: 613
Joined: Sun Jan 18, 2009 7:03 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by zamar »

melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).
Joona Kiiski
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Uri Blass »

zamar wrote:
melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).
As was expected?
I read that people expected stockfish2.3.1 to be stronger than 2.2.2
and not the same strength.

Edit:I remember reading that Kai claimed 26 elo improvement at ultra-short time control 2.5 s+0.04 s and expected 10-15 elo improvement at blitz time control.

I do not remember reading that we should expect no improvement before the tests of the IPON and other people.
styx
Posts: 338
Joined: Tue Mar 13, 2012 9:59 pm
Location: Germany

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by styx »

Changes:

ELO increase is very limited, it is mainly a minor release to flush
the accumulated work. From the user point of view the biggest thing is
that SF should not crash anymore even with many threads because a
nasty SMP bug causing a rare but repetitive crash has been fixed.

Marco.
--> http://talkchess.com/forum/viewtopic.php?t=45163

some rating lists show an increment. IPON has very good testing conditions, but it's not the final truth.
Jouni
Posts: 3281
Joined: Wed Mar 08, 2006 8:15 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Jouni »

I found one improvement :)

[D]2n5/kP6/8/K7/4B3/8/8/8 w - - 0 1

Here 2.3.1 knows that only 1. bxc8=N wins. 2.2.2 failed.
Jouni