Stable and/or accurate eval

mclane · Post by **mclane** » Wed Jul 04, 2007 3:19 pm

there is no accurate evaluation.
reason is the positions HAVE no accurate evaluation.

only mate scores can be accurate.
everything far from mate is unclear. therefore an engine cannot tell you WHICH value is right. each engine makes a thesis.
and the engines let their thesis fight each other.

to come back to my topic, Loop12.32 make a very good thesis about the position, it says:

Schachprogrammierer - Gegner
[d] r2q1rk1/2p2p1p/p3P1p1/1p2P1b1/n7/1B2BQ2/PP3PPP/R5K1 w - - 0 1

Analysis by LoopMP 12.32:

21.Rd1 fxe6 22.Rxd8 Raxd8 23.Qc6 Bxe3 24.fxe3 Kh8 25.Qxa6 Rd2 26.Qxb5 Rxb2 27.h4 Nb6 28.Qc6 Rff2
± (1.26) depth: 18 00:01:23 95510kN

AGove · Post by **AGove** » Wed Jul 04, 2007 4:06 pm

there is no accurate evaluation.
reason is the positions HAVE no accurate evaluation.

only mate scores can be accurate.
everything far from mate is unclear. therefore an engine cannot tell you WHICH value is right.

Tablebase draws can be accurately evaluated as 0.00, but I'd like to agree with what you say. When one engine says +0.32 and another +0.68 how can it be proven which is right? In fact Lev Alburt's book Test and Improve Your Chess suggested a way of evaluating positions based on the results obtained when the positions are played out - which of course isn't how engines analyse at all, but could be a way of checking and objectifying their numerical evaluations.

maxchgr · Post by **maxchgr** » Wed Jul 04, 2007 8:00 pm

obviously if the engines are strong they are not 'stupid', i'm obviously not going to care what a weak engine thinks about a position because its more likely to be 'wrong' about the position.

this link is indeed interesting

im wondering if its better to use engines with similar evals or opposite points of views for a better analysis, i suppose an argument can be made for each one

AGove · Post by **AGove** » Wed Jul 04, 2007 8:03 pm

im wondering if its better to use engines with similar evals or opposite points of views for a better analysis, i suppose an argument can be made for each one

What's the argument for using similar engines?

maxchgr · Post by **maxchgr** » Wed Jul 04, 2007 8:20 pm

i cant think of one actually, youre right

btw, i found it interesting that in that correlations page, fritz 10 was neither very similiar nor very different to any other top 10 non-fritz engine

bedouin · Post by **bedouin** » Wed Jul 04, 2007 9:01 pm

Which opening book does 9. d4?? feature in?

Code: Select all

[Event "?"]
[Site "?"]
[Date "2007.07.04"]
[Round "?"]
[White "Schachprogrammierer"]
[Black "Gegner"]
[Result "*"]
[WhiteElo "2587"]
[BlackElo "2592"]
[PlyCount "40"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 O-O 8. c3
d5 9. d4?? exd4 10. e5 Ne4 11. cxd4 Bb4 12. Nc3 Nxc3 13. Qc2 Na4 14. Ng5 g6 15.
Qxc6 Bd7 16. Qxd5 Bxe1 17. Qf3 Be6 18. Be3 Bb4 19. d5 Be7 20. dxe6 Bxg5 *

Why not take the position that the 64 CPU Freestyler resigned in and see which engine can see as far without such hardware?

maxchgr · Post by **maxchgr** » Wed Jul 04, 2007 9:13 pm

I think that generally I trust rybka's evaluation most since it is the strongest + has an extremely stable eval. I think adding another strong engine that can cover rybka's flaws, like fritz or hiarcs for understanding king attacks makes for a complete package to be able to evaluate positions with confidence.

i was wondering if anybody else noticed that spike seems to always completely disagree with most other strong engines, or is that just me.

smirobth · Post by **smirobth** » Wed Jul 04, 2007 10:34 pm

bedouin wrote:
smirobth wrote:
I think Fritz and Hiarcs are both pretty good. Junior is OK too most of the time. Rybka as someone else noted is a little bit conservative compared to the aforementioned engines, but one can easily adjust for Rybka's slight conservatism if you know it is there. Shredder is on the other extreme, often showing huge advantages where the advantage is actually fairly small. Of all the engines I mention here I think only Shredder's eval is really way out of whack. But you can still try just dividing its numbers in half (or maybe even divide by closer to 3 as they get larger!) and be about right most of the time. The numbers don't really matter as long as you know the quirks of the particular program you are using.
This is a bit off the cuff. Illustrating this with a few positions would be helpful as it would help others arrive at the same conclusions. Some claim that this engine or the other is not good at tactics or good at analysis but until you have something other than your word for it, you cannot be taken seriously.

I am not saying that the engines I mentioned are not good for analysis. I use them all. But when analyzing, Shredder's evaluations sometimes cannot be taken too seriously and if you do you will be misled. And Rybka is often a bit more conservative in its evaluations than most engines, so just be aware of this. These are just my opinions, based on a lot of analysis and thousands of different positions in my games and the games of others. If you do not want to take my opinion seriously that's OK with me. Feel free to learn all this for yourself if you like, using your own positions. Almost any set of positions will work for this purpose as long as they are positions that are both not too tactical and also not too obviously equal.

smirobth · Post by **smirobth** » Wed Jul 04, 2007 10:47 pm

maxchgr wrote:I think that generally I trust rybka's evaluation most since it is the strongest + has an extremely stable eval.

Yes Rybka is strongest, and has a very stable eval. These are both desirable features for analysis. But beware that I think Rybka sometimes shows smaller advantages than other engines. For example in this positions from one of my postal games:
[d] 1r2nrk1/6pp/1b1p1pP1/3RnP2/7P/2B5/1PN5/1K3B1R w - - 0 32
Rybka shows only a tiny bit more than a 1/3 pawn advantage for White, and yet Black is close to being lost, and perhaps is already completely lost with best play by both sides. Most other programs see closer to a 1 pawn advantage, which I think is more accurate. Best for analysis is to use several engines. I usually like to use at least 4.

AGove · Post by **AGove** » Wed Jul 04, 2007 11:54 pm

At first glance the following reinforces the general impression that Rybka slightly understates while Shredder greatly overstates an advantage. Having said that, Robin Smith claims "Black is close to being lost, and perhaps is already completely lost " - so perhaps in this case Shredder is actually right and Rybka not. That's the problem with single examples.

[d]1r2nrk1/6pp/1b1p1pP1/3RnP2/7P/2B5/1PN5/1K3B1R w - - 0 1

Analysis by Anaconda 2.0.1:

32.h5 h6 33.Bg2 Kh8 34.Rdd1 Bf2 35.Rhf1 Bg3 36.Ne3 Rb6 37.Nd5 Ra6 38.b4 Ra7 39.Bd4
+/- (0.81) Depth: 16/56 00:04:12 103mN, tb=59

Analysis by Fritz 10:

32.h5 Bc5 33.Kc1 h6 34.b4 Bf2 35.Bg2 Kh8 36.Rh3 Nc7 37.Bxe5 fxe5 38.Rxd6 Nb5 39.Rd5 Rxf5
+/- (0.79) Depth: 18/41 00:03:56 342mN

Analysis by HIARCS 11.1 UCI:

32.h5 Bc5 33.b4 h6 34.Ka2 Bf2 35.Be2 Nc7 36.Rxd6 Nb5 37.Bxb5 Rxb5 38.Nd4 Rbb8 39.Kb3 Rfc8
+/= (0.55) Depth: 17/38 00:03:43 80994kN

Analysis by Ktulu 8:

32.h5 h6 33.Rd2 Nf3 34.Bc4+ Kh8 35.Rdd1 Ne5 36.Be2 Rd8 37.Rh4 Rd7 38.Bb4 Re7 39.Bxd6 Nxd6 40.Rxd6 Bc5 41.Rd5 Rc8 42.Ra4
+/= (0.57) Depth: 17 00:05:12 369mN

Analysis by ProDeo 1.5:

32.Nb4 Nc7 33.Rxd6 Bc5 34.Bxe5 fxe5 35.Rc6 Bxb4 36.Rxc7 Bd6 37.Rc2 Rxf5 38.gxh7+ Kh8 39.Bd3 Rf4
+/= (0.53) Depth: 14 00:03:39 360mN

Analysis by Rybka 2.3.2a 32-bit :

32.b4 hxg6 33.fxg6 f5 34.Bxe5 dxe5 35.Bc4 Kh8 36.Rxe5 Nd6 37.Bb3 Rf6 38.h5 f4
+/= (0.45) Depth: 18 00:05:29 29407kN

Analysis by Shredder 10.1 UCI:

32.h5 Bc5 33.Rd2 h6 34.Kc1 Rc8 35.Ba6 Rb8 36.Be2 Bb6 37.Rhd1
+- (1.72) Depth: 19/41 00:03:57 143mN

Analysis by Zap!Chess Zanzibar:

32.h5 h6 33.Rd1 Rd8 34.Rh2 Nc7 35.Rhd2 d5 36.Bg2 Rfe8 37.b4 Nc4 38.Bxd5+ Nxd5 39.Rxd5 Rxd5 40.Rxd5 Bf2
+/- (0.77) Depth: 16/39 00:02:56 70245kN

Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval