Page 1 of 6

Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 3:53 pm
by melajara
Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 3:55 pm
by Modern Times
At CCRL blitz it is about the same

At CCRL chess960 it is a little stronger

but nowhere near as many games as IPON.

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 4:02 pm
by ZirconiumX
Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 4:12 pm
by gladius
Agreed, the results are very disappointing. The improvements were tested against Stockfish 2.2.2. It seems while they were good in heads up matches, they made things worse against weaker opponents, and didn't help against stronger ones.

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 4:28 pm
by Uri Blass
ZirconiumX wrote:Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out
I am not sure and the question is if the stockfish team used self-play
not at bullet time control.

maybe at bullet time control stockfish is better but not at blitz time control.

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 4:30 pm
by bupalo
We should wait for Graham's 8 core tournament next weekend.
There Stochfish is number 1 or 2

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 4:45 pm
by zamar
melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 5:20 pm
by Uri Blass
zamar wrote:
melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).
As was expected?
I read that people expected stockfish2.3.1 to be stronger than 2.2.2
and not the same strength.

Edit:I remember reading that Kai claimed 26 elo improvement at ultra-short time control 2.5 s+0.04 s and expected 10-15 elo improvement at blitz time control.

I do not remember reading that we should expect no improvement before the tests of the IPON and other people.

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 6:36 pm
by styx
Changes:

ELO increase is very limited, it is mainly a minor release to flush
the accumulated work. From the user point of view the biggest thing is
that SF should not crash anymore even with many threads because a
nasty SMP bug causing a rare but repetitive crash has been fixed.

Marco.
--> http://talkchess.com/forum/viewtopic.php?t=45163

some rating lists show an increment. IPON has very good testing conditions, but it's not the final truth.

Re: Stockfish 2.3.1 weaker than 2.2.2?

Posted: Tue Sep 25, 2012 7:58 pm
by Jouni
I found one improvement :)

[D]2n5/kP6/8/K7/4B3/8/8/8 w - - 0 1

Here 2.3.1 knows that only 1. bxc8=N wins. 2.2.2 failed.