Stockfish 2.3.1 weaker than 2.2.2?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 1:02 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by IWB » Tue Sep 25, 2012 7:39 pm

zamar wrote:
All published results so far are easily within error bars.

IPON: -10 elo
....
This has to be taken very carefully! The Elobasis is Bayeselo, the calculation during the match is pure eloformula - The final result will be a little bit different - and I dont know yet if it will be higher or lower ...
But I am sure that all will be within error bar and it seems that there is no clear improvement over 2.2.2.
But that plateau is seen by quite a few releases of top enignes recently. The resons why might be interesting but for sure it is just a speculation.

But, we are complaining on a very high level.


Bye
Ingo

melajara
Posts: 213
Joined: Thu Dec 16, 2010 3:39 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by melajara » Tue Sep 25, 2012 8:05 pm

Indeed, the plateau is clearly visible from Rybka 4 to 4.1, Houdini 1.5 to 2, Critter 1.4 to 1.6, Kommodo 4 to 5 and now Stockfish 2.2.2 to 2.3.1

Unfortunately this should not entice Mr Houdart to release Houdini 3, why would he cannibalize himself ? :(
Per ardua ad astra

melajara
Posts: 213
Joined: Thu Dec 16, 2010 3:39 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by melajara » Tue Sep 25, 2012 8:16 pm

On the other hand, some programs are still improving forcefully, see e.g. Bouquet, seemingly +116 from 1.4 to 1.5, could you test it?
Per ardua ad astra

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 1:02 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by IWB » Tue Sep 25, 2012 8:18 pm

melajara wrote:On the other hand, some programs are still improving forcefully, see e.g. Bouquet, seemingly +116 from 1.4 to 1.5, could you test it?
Yes it is easy to improve by going from somewhere to Ippo ... but just very few improve that.

No, I will not test that.

Bye
Ingo

MM
Posts: 766
Joined: Sun Oct 16, 2011 9:25 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by MM » Tue Sep 25, 2012 8:45 pm

Uri Blass wrote:
zamar wrote:
melajara wrote:Very disappointing IPON results so far (1672 played games from 2550)
with a 2962 ELO provisional rating or 10 ELO LESS than Stockfish 2.2.2

Something wrong in testing or the last tweaks just didn't make it.

Just wondering
All published results so far are easily within error bars.

IPON: -10 elo
CCRL, FRC: +15 elo
CCRL, 40/4: +2 elo
CEGT 40/20 (1 CPU): -1 elo

Based on the results so far, it looks like that these versions are around the same strength (as was expected).
As was expected?
I read that people expected stockfish2.3.1 to be stronger than 2.2.2
and not the same strength.

Edit:I remember reading that Kai claimed 26 elo improvement at ultra-short time control 2.5 s+0.04 s and expected 10-15 elo improvement at blitz time control.

I do not remember reading that we should expect no improvement before the tests of the IPON and other people.
+1
MM

Izak Pretorius
Posts: 34
Joined: Wed May 11, 2011 5:44 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Izak Pretorius » Wed Sep 26, 2012 5:56 am

Perhaps you are asking the wrong question?!
If true,could lead to wrong conclusions when answering your incorrect question.

I suggest,that another scenario could be a faulty compile,as indeed the first one of Stockfish 2.3.1 was faulty,and we don't know if they tested this version or the "corrected" version if indeed corrected.

I would suggest testing the Quocvuong compile of the latest Stockfish source as i have seen people getting better results with this compile.

I am not suggesting that JA or Quocvuong has tampered with the code,but the gcc compiler or optimization settings may have caused a bug or 2 in the code,or may not.It is open for anyone willing to verify this,before we may or may not jump to wrong questions and conclusions.

And congratulations to the StockFish team for the best open source program available to mankind,that even Houdart shamelessly used and uses to improve his commercial Houdini.

Thank you for your time reading this :)

User avatar
Werner
Posts: 2416
Joined: Wed Mar 08, 2006 9:09 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by Werner » Wed Sep 26, 2012 8:49 am

There is quite a difference between the last JA compile and the QI compile. Here a position calculated with 1CPU versions:

r3k1nr/pp1b1ppp/2n1p3/1B4B1/3N4/2q5/P1P2PPP/R2Q1RK1 w kq -

Engine: Stockfish 2.3.1JA x64 4CPU (256 MB)
22/42 0:50 0.00 1.Le3 Sf6 2.Tb1 Td8 3.Le2 Sxd4 4.Lxd4 Dc7 5.Lxf6 gxf6 6.Dd4 Lc6 7.Dxf6 Tg8 8.Lf3 Lxf3 9.Dxf3 Dc6 10.Dxc6+ bxc6 11.Tb7 a6 12.f4 Td2 13.Tb8+ Td8 14.Tb7 (83.086.557) 1637

23/42 0:59 +0.08++ 1.Sf5 (97.375.230) 1638
23/42 1:00 +0.16++ 1.Sf5 exf5 2.Dd6 Le6 3.Tad1 Sf6 4.Lxf6 gxf6 5.Tfe1 Dxc2 6.Te2 Dc3 7.Txe6+ fxe6 8.Dxe6+ Kf8 9.Td7 (98.961.073) 1639
23/42 1:01 +0.28++ 1.Sf5 exf5 2.Dd6 Le6 3.Tad1 Sf6 4.Lxf6 gxf6 5.Tfe1 Dxc2 6.Te2 Dc3 7.Txe6+ fxe6 8.Dxe6+ Kf8 9.Td7 (100.205.769) 1639
23/42 1:02 +0.46++ 1.Sf5 exf5 2.Dd6 Le6 3.Tad1 Sf6 4.Lxf6 gxf6 5.Tfe1 Dxc2 6.Te2 Dc3 7.Txe6+ fxe6 8.Dxe6+ Kf8 9.Td7 (103.250.706) 1639
23/42 1:13 +0.73++ 1.Sf5 exf5 2.Te1+ Le6 3.Dd6 a6 4.Ld2 Dxc2 5.Lb4 axb5 6.Df8+ Kd7 7.Tad1+ Kc7 8.Dxa8 Da4 9.Ld6+ Kb6 10.De8 Ka7 11.Lc5+ b6 12.Dxc6 (120.742.513) 1642
23/42 1:15 +1.14++ 1.Sf5 exf5 2.Te1+ Le6 3.Dd6 a6 4.Ld2 Dxc2 5.Lb4 axb5 6.Df8+ Kd7 7.Tad1+ Kc7 8.Dxa8 Da4 9.Ld6+ Kb6 10.De8 Ka7 11.Lc5+ b6 12.Dxc6 (124.379.587) 1642
23/42 1:18 +1.75++ 1.Sf5 exf5 2.Te1+ Le6 3.Dd6 a6 4.Ld2 Dxc2 5.Lb4 axb5 6.Df8+ Kd7 7.Tad1+ Kc7 8.Dxa8 Da4 9.Ld6+ Kb6 10.De8 Ka7 11.Lc5+ b6 12.Dxc6 (129.630.553) 1643
23/42 1:36 +2.14 1.Sf5 Dc5 2.Sd6+ Kf8 3.Le3 Dd5 4.Dxd5 exd5 5.Sxb7 Sf6 6.Sc5 Ke7 7.Tfd1 Kd6 8.Sxd7 Kxd7 9.c4 Kd6 10.cxd5 Se5 11.Tab1 Sxd5 12.Le2 Thb8 13.Txb8 Txb8 14.Lxa7 (159.057.785) 1653
24/42 1:43 +2.06-- 1.Sf5 Dc5 2.Sd6+ Kf8 3.Le3 Dd5 4.Dxd5 exd5 5.Sxb7 Sf6 6.Sc5 Se5 7.Sxd7+ Sexd7 8.Tfd1 Ke7 9.Te1 a6 10.Ld3 The8 11.Tab1 Kf8 12.Tb7 Se5 13.Lc5+ Kg8 (171.146.120) 1655
24/42 1:49 +2.22++ 1.Sf5 Dc5 2.Sd6+ Kf8 3.Le3 Dd5 4.c4 Dxd1 5.Tfxd1 Sf6 6.Sxb7 Se5 7.f4 Lxb5 8.fxe5 Lxc4 9.exf6 gxf6 10.Sd6 Ld5 11.Lc5 Kg7 12.Sf5+ (181.553.225) 1654
24/42 1:52 +1.97-- 1.Sf5 Dc5 2.Sd6+ Kf8 3.Le3 Dd5 4.c4 Dxd1 5.Tfxd1 a6 6.Sxf7 Kxf7 7.Txd7+ Sge7 8.La4 Thb8 9.c5 Se5 10.Tc7 Kf6 11.h4 h6 12.Tb1 Sf5 (186.870.740) 1656
24/48 1:58 +2.06 1.Sf5 Dc5 2.Sd6+ Kf8 3.Le3 Dd5 4.c4 Dxd1 5.Tfxd1 a6 6.Sxf7 Kxf7 7.Txd7+ Sge7 8.Lxc6 bxc6 9.Tad1 The8 10.g4 c5 11.Kg2 Tec8 12.Tb7 Tcb8 13.Tc7 Tc8 14.Tdd7 (196.015.681) 1660
Bester Zug: Sd4-f5 Zeit: 2:10.916 min K/s: 1.660.439 Knoten: 212.521.291


r3k1nr/pp1b1ppp/2n1p3/1B4B1/3N4/2q5/P1P2PPP/R2Q1RK1 w kq -

Engine: Stockfish 2.3.1Qi x64 1CPU (256 MB)


26/58 1:27 +0.08++ 1.Lxc6 Lxc6 2.Te1 Sf6 3.Sxc6 Dxc6
4.Dd4 Td8 5.De5 Ke7 6.Dg3 h6 7.Lf4 Kf8
8.Le5 Dxc2 9.Da3+ Kg8 10.Tac1 Dd2
11.Dxa7 Sg4 12.Lc7 Td3 13.h3 (152.827.277) 1738

26/58 1:53 +0.16++ 1.Sf5 exf5 2.Dd6 Sf6 3.Tae1+ Le6
4.Txe6+ fxe6 5.Dxe6+ Kf8 6.Lc4 Sd8
7.Dd6+ Ke8 8.Dc5 (195.255.212) 1720

26/58 1:59 +0.28++ 1.Sf5 (204.810.084) 1713

Bester Zug: Sd4-f5 Zeit: 2:10.916 min K/s: 1.713.619 Knoten: 204.810.084

The Qi compile shows a deeper search but needs longer for the solution Sf5. I will try some more postions.
Yes - found another position where Qi compile was a bit faster. Solution were found on different search depths with different evaluation. I think the source code must be different too.
Werner

lkaufman
Posts: 3724
Joined: Sun Jan 10, 2010 5:15 am
Location: Maryland USA
Contact:

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by lkaufman » Wed Sep 26, 2012 2:52 pm

gladius wrote:Agreed, the results are very disappointing. The improvements were tested against Stockfish 2.2.2. It seems while they were good in heads up matches, they made things worse against weaker opponents, and didn't help against stronger ones.
My tests indicate that 2.3.1 is a clear improvement even against foreign opponents at hyperspeed levels. So the problem, if there is one, is not the choice of opponents but the time control of the tests. My guess is that the change involving lateral attacks on pawns, being a tactical term, is great at speeds like game/10" but pretty useless at IPON levels.

gladius
Posts: 538
Joined: Tue Dec 12, 2006 9:10 am

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by gladius » Wed Sep 26, 2012 3:29 pm

lkaufman wrote:
gladius wrote:Agreed, the results are very disappointing. The improvements were tested against Stockfish 2.2.2. It seems while they were good in heads up matches, they made things worse against weaker opponents, and didn't help against stronger ones.
My tests indicate that 2.3.1 is a clear improvement even against foreign opponents at hyperspeed levels. So the problem, if there is one, is not the choice of opponents but the time control of the tests. My guess is that the change involving lateral attacks on pawns, being a tactical term, is great at speeds like game/10" but pretty useless at IPON levels.
Interesting, thanks Larry. A few of the evaluation changes were more tactical terms (pinned piece penalty, undefended pieces, and rook-pawn-rank bonus). So, that could be an explanation.

I'm going back now and applying each eval change to 2.2.2, and testing against a wider set of opponents (still at hyperblitz, 4s+0.05). It will be interesting to see how the changes do there. If they do the same, I guess testing at longer TC is the only way to go.

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Stockfish 2.3.1 weaker than 2.2.2?

Post by mcostalba » Wed Sep 26, 2012 5:45 pm

gladius wrote: I'm going back now and applying each eval change to 2.2.2, and testing against a wider set of opponents (still at hyperblitz, 4s+0.05).
Hi Gary,

what you are going to do is really a dirty and grunt work, I know because I made in the past myself. There is no joy and a lot of frustration ahead. You are really to praise for willing to do this !

All in all, my personal opinion is that the best thing 2.3 brought to the table is your active contribution to the project.

I would like also to thank Ingo, Werner and the CEGT, Ray and all the other people that are testing this release: I know I made your job a tad difficult due to the small ELO increase and the different releases. I promise, also to myself, that the next one will be better prepared.

Thanks
Marco

Post Reply