TalkChess.com

Posted: **Tue Sep 25, 2012 3:53 pm**

Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering

Posted: **Tue Sep 25, 2012 3:55 pm**

At CCRL blitz it is about the same

At CCRL chess960 it is a little stronger

but nowhere near as many games as IPON.

Posted: **Tue Sep 25, 2012 4:02 pm**

Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out

Posted: **Tue Sep 25, 2012 4:12 pm**

Agreed, the results are very disappointing. The improvements were tested against Stockfish 2.2.2. It seems while they were good in heads up matches, they made things worse against weaker opponents, and didn't help against stronger ones.

Posted: **Tue Sep 25, 2012 4:28 pm**

ZirconiumX wrote:Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.

Matthew:out

I am not sure and the question is if the stockfish team used self-play
not at bullet time control.

maybe at bullet time control stockfish is better but not at blitz time control.

Posted: **Tue Sep 25, 2012 4:30 pm**

We should wait for Graham's 8 core tournament next weekend.
There Stochfish is number 1 or 2

Posted: **Tue Sep 25, 2012 4:45 pm**

melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering

All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).

Posted: **Tue Sep 25, 2012 5:20 pm**

zamar wrote:
melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).

As was expected?
I read that people expected stockfish2.3.1 to be stronger than 2.2.2
and not the same strength.

Edit:I remember reading that Kai claimed 26 elo improvement at ultra-short time control 2.5 s+0.04 s and expected 10-15 elo improvement at blitz time control.

I do not remember reading that we should expect no improvement before the tests of the IPON and other people.

Posted: **Tue Sep 25, 2012 6:36 pm**

Changes:

ELO increase is very limited, it is mainly a minor release to flush
the accumulated work. From the user point of view the biggest thing is
that SF should not crash anymore even with many threads because a
nasty SMP bug causing a rare but repetitive crash has been fixed.

Marco.

--> http://talkchess.com/forum/viewtopic.php?t=45163

some rating lists show an increment. IPON has very good testing conditions, but it's not the final truth.

Posted: **Tue Sep 25, 2012 7:58 pm**

I found one improvement

[D]2n5/kP6/8/K7/4B3/8/8/8 w - - 0 1

Here 2.3.1 knows that only 1. bxc8=N wins. 2.2.2 failed.

TalkChess.com

Stockfish 2.3.1 weaker than 2.2.2?

Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?

Re: Stockfish 2.3.1 weaker than 2.2.2?