Stockfish 2.3.1 weaker than 2.2.2?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
melajara
Posts: 213
Joined: Thu Dec 16, 2010 3:39 pm

Stockfish 2.3.1 weaker than 2.2.2?

Post by melajara » Tue Sep 25, 2012 1:53 pm

Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
Per ardua ad astra

Modern Times
Posts: 2410
Joined: Thu Jun 07, 2012 9:02 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Modern Times » Tue Sep 25, 2012 1:55 pm

At CCRL blitz it is about the same

At CCRL chess960 it is a little stronger

but nowhere near as many games as IPON.

ZirconiumX
Posts: 1327
Joined: Sun Jul 17, 2011 9:14 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by ZirconiumX » Tue Sep 25, 2012 2:02 pm

Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out
Some believe in the almighty dollar.

I believe in the almighty printf statement.

gladius
Posts: 538
Joined: Tue Dec 12, 2006 9:10 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by gladius » Tue Sep 25, 2012 2:12 pm

Agreed, the results are very disappointing. The improvements were tested against Stockfish 2.2.2. It seems while they were good in heads up matches, they made things worse against weaker opponents, and didn't help against stronger ones.

Uri Blass
Posts: 8593
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Uri Blass » Tue Sep 25, 2012 2:28 pm

ZirconiumX wrote:Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out
I am not sure and the question is if the stockfish team used self-play
not at bullet time control.

maybe at bullet time control stockfish is better but not at blitz time control.

bupalo
Posts: 82
Joined: Fri Mar 16, 2012 1:04 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by bupalo » Tue Sep 25, 2012 2:30 pm

We should wait for Graham's 8 core tournament next weekend.
There Stochfish is number 1 or 2

zamar
Posts: 613
Joined: Sun Jan 18, 2009 6:03 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by zamar » Tue Sep 25, 2012 2:45 pm

melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).
Joona Kiiski

Uri Blass
Posts: 8593
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Uri Blass » Tue Sep 25, 2012 3:20 pm

zamar wrote:
melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).
As was expected?
I read that people expected stockfish2.3.1 to be stronger than 2.2.2
and not the same strength.

Edit:I remember reading that Kai claimed 26 elo improvement at ultra-short time control 2.5 s+0.04 s and expected 10-15 elo improvement at blitz time control.

I do not remember reading that we should expect no improvement before the tests of the IPON and other people.

styx
Posts: 338
Joined: Tue Mar 13, 2012 8:59 pm
Location: Germany

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by styx » Tue Sep 25, 2012 4:36 pm

Changes:

ELO increase is very limited, it is mainly a minor release to flush
the accumulated work. From the user point of view the biggest thing is
that SF should not crash anymore even with many threads because a
nasty SMP bug causing a rare but repetitive crash has been fixed.

Marco.
--> http://talkchess.com/forum/viewtopic.php?t=45163

some rating lists show an increment. IPON has very good testing conditions, but it's not the final truth.

Jouni
Posts: 2007
Joined: Wed Mar 08, 2006 7:15 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Jouni » Tue Sep 25, 2012 5:58 pm

I found one improvement :)

[D]2n5/kP6/8/K7/4B3/8/8/8 w - - 0 1

Here 2.3.1 knows that only 1. bxc8=N wins. 2.2.2 failed.
Jouni

Post Reply