Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2
Something wrong in testing or the last tweaks just didn't make it.
Just wondering
Stockfish 2.3.1 weaker than 2.2.2?
Moderators: hgm, Rebel, chrisw
-
- Posts: 213
- Joined: Thu Dec 16, 2010 4:39 pm
Stockfish 2.3.1 weaker than 2.2.2?
Per ardua ad astra
-
- Posts: 3550
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Stockfish 2.3.1 weaker than 2.2.2?
At CCRL blitz it is about the same
At CCRL chess960 it is a little stronger
but nowhere near as many games as IPON.
At CCRL chess960 it is a little stronger
but nowhere near as many games as IPON.
-
- Posts: 1334
- Joined: Sun Jul 17, 2011 11:14 am
Re: Stockfish 2.3.1 weaker than 2.2.2?
Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.
Matthew:out
Matthew:out
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: Stockfish 2.3.1 weaker than 2.2.2?
Agreed, the results are very disappointing. The improvements were tested against Stockfish 2.2.2. It seems while they were good in heads up matches, they made things worse against weaker opponents, and didn't help against stronger ones.
-
- Posts: 10296
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish 2.3.1 weaker than 2.2.2?
I am not sure and the question is if the stockfish team used self-playZirconiumX wrote:Stockfish uses self-play for Elo measuring, and it looks like this has backfired. Shame really.
Matthew:out
not at bullet time control.
maybe at bullet time control stockfish is better but not at blitz time control.
-
- Posts: 82
- Joined: Fri Mar 16, 2012 2:04 pm
Re: Stockfish 2.3.1 weaker than 2.2.2?
We should wait for Graham's 8 core tournament next weekend.
There Stochfish is number 1 or 2
There Stochfish is number 1 or 2
-
- Posts: 613
- Joined: Sun Jan 18, 2009 7:03 am
Re: Stockfish 2.3.1 weaker than 2.2.2?
All published results so far are easily within error bars.melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2
Something wrong in testing or the last tweaks just didn't make it.
Just wondering
IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo
Based on the results so far, it looks like that these versions are around the same strength (as was expected).
Joona Kiiski
-
- Posts: 10296
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish 2.3.1 weaker than 2.2.2?
As was expected?zamar wrote:All published results so far are easily within error bars.melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2
Something wrong in testing or the last tweaks just didn't make it.
Just wondering
IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo
Based on the results so far, it looks like that these versions are around the same strength (as was expected).
I read that people expected stockfish2.3.1 to be stronger than 2.2.2
and not the same strength.
Edit:I remember reading that Kai claimed 26 elo improvement at ultra-short time control 2.5 s+0.04 s and expected 10-15 elo improvement at blitz time control.
I do not remember reading that we should expect no improvement before the tests of the IPON and other people.
-
- Posts: 338
- Joined: Tue Mar 13, 2012 9:59 pm
- Location: Germany
Re: Stockfish 2.3.1 weaker than 2.2.2?
--> http://talkchess.com/forum/viewtopic.php?t=45163Changes:
ELO increase is very limited, it is mainly a minor release to flush
the accumulated work. From the user point of view the biggest thing is
that SF should not crash anymore even with many threads because a
nasty SMP bug causing a rare but repetitive crash has been fixed.
Marco.
some rating lists show an increment. IPON has very good testing conditions, but it's not the final truth.
-
- Posts: 3291
- Joined: Wed Mar 08, 2006 8:15 pm
Re: Stockfish 2.3.1 weaker than 2.2.2?
I found one improvement
[D]2n5/kP6/8/K7/4B3/8/8/8 w - - 0 1
Here 2.3.1 knows that only 1. bxc8=N wins. 2.2.2 failed.
[D]2n5/kP6/8/K7/4B3/8/8/8 w - - 0 1
Here 2.3.1 knows that only 1. bxc8=N wins. 2.2.2 failed.
Jouni