Stable and/or accurate eval

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

maxchgr

Stable and/or accurate eval

Post by maxchgr »

There are many strong engines, but I know that some are not as useful as others for analyzing your games and evaluating positions. As Vas once put it in a post on rybkaforum, "engines are better at playing moves than evaluating".

I was wondering if anyone knows or has an opinion on which engines are best for analyzing games or getting somewhat accurate evaluation of positions. For example, I have noticed that despite that Loop is a very strong engine, the evaluations constantly jump up and down by a large margin. Spike is not nearly as bad in my opinion, and despite its strength, for better or for worse always seems to disagree with other engines on the evaluations :).

In my experience, a combination of rybka and fritz is best for an eval, but I do not know because I don't have many engines. I was told that it seems that zanzibar often gives an eval somewhere in between the eval given by fritz and rybka (in my opinion this is probably accurate eval, but again, i can't be sure)

engine strength is easily determined with rating lists but this type of thing i believe i need to ask here, i hope i can get some of your observations or opinions.

thanks
max
bedouin

Re: Stable and/or accurate eval

Post by bedouin »

Do you have an example position as well as output from those engines? There are things like SinglePV/MultiPV but those who know these things can tell you more.
AGove

Re: Stable and/or accurate eval

Post by AGove »

People have got used to Fritz's evaluations down the years and the way it assesses positions. It's hardly the strongest engine, but it's been going long enough and sold enough copies (and reached the PCs of almost all GMs and writers) to have become something of an industry standard.

Rybka is very conservative in its evaluations. When Rybka assesses a small advantage +/= for itself its opponent is often already clearly worse.
maxchgr

Re: Stable and/or accurate eval

Post by maxchgr »

how does hiarcs compare to fritz evaluations, do they usually agree
User avatar
smirobth
Posts: 2307
Joined: Wed Mar 08, 2006 8:41 pm
Location: Brownsville Texas USA

Re: Stable and/or accurate eval

Post by smirobth »

I think Fritz and Hiarcs are both pretty good. Junior is OK too most of the time. Rybka as someone else noted is a little bit conservative compared to the aforementioned engines, but one can easily adjust for Rybka's slight conservatism if you know it is there. Shredder is on the other extreme, often showing huge advantages where the advantage is actually fairly small. Of all the engines I mention here I think only Shredder's eval is really way out of whack. But you can still try just dividing its numbers in half (or maybe even divide by closer to 3 as they get larger!) and be about right most of the time. The numbers don't really matter as long as you know the quirks of the particular program you are using.
- Robin Smith
bedouin

Re: Stable and/or accurate eval

Post by bedouin »

This is a bit off the cuff. Illustrating this with a few positions would be helpful as it would help others arrive at the same conclusions. Some claim that this engine or the other is not good at tactics or good at analysis but until you have something other than your word for it, you cannot be taken seriously.
User avatar
mclane
Posts: 18900
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: Stable and/or accurate eval

Post by mclane »

i do have a position that illustrates the "problem"
[d] r1bq1rk1/2p2ppp/p1n5/1p1pP3/nb1P4/1B3N2/PPQ2PPP/R1B1R1K1 w -- 0 14

White moved Ng5 and the game continued

[Event "?"]
[Site "?"]
[Date "2007.07.04"]
[Round "?"]
[White "Schachprogrammierer"]
[Black "Gegner"]
[Result "*"]
[WhiteElo "2587"]
[BlackElo "2592"]
[PlyCount "40"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 O-O 8. c3
d5 9. d4 exd4 10. e5 Ne4 11. cxd4 Bb4 12. Nc3 Nxc3 13. Qc2 Na4 14. Ng5 g6 15.
Qxc6 Bd7 16. Qxd5 Bxe1 17. Qf3 Be6 18. Be3 Bb4 19. d5 Be7 20. dxe6 Bxg5 *

which program evaluates the position really as good for white ?
which program would play 14.Ng5 (instead of Qxc6).
CSTAL2.03 plays 14.Ng5 with >0.51 for white after ~2700 seconds on my Intel6300.

http://www.abload.de/image.php?img=ng5y3f.jpg

Schachprogrammierer - Gegner
r1bq1rk1/2p2ppp/p1n5/1p1pP3/nb1P4/1B3N2/PPQ2PPP/R1B1R1K1 w - - 0 1

Analysis by HIARCS 11.1 UCI:

14.Dxc6 Ld7 15.Dc2 Lxe1 16.Lg5 Lxf2+ 17.Dxf2 De8 18.Lxd5 c6 19.Lb3 h6 20.Dc2 Le6 21.Lxa4 bxa4 22.Ld2 Tb8 23.Lc3 a3 24.bxa3
³ (-0.61) Tiefe: 16/42 00:15:48 68598kN

hiarcs says black is better.

Schachprogrammierer - Gegner
r1bq1rk1/2p2ppp/p1n5/1p1pP3/nb1P4/1B3N2/PPQ2PPP/R1B1R1K1 w - - 0 1

Analysis by Rybka 2.3 32-bit :

14.Dxc6 Ld7 15.Dxd5 Lxe1 16.Sxe1 Lf5 17.Df3 Dd7 18.Df4 Tfd8 19.Sf3 Sc5 20.h3 Sxb3
= (-0.09) Tiefe: 20 00:22:03 23525kN

rybka says black is better.


More questions: 2.which program plays 16.Qxd5 with which evaluation.
3.Which program plays 17.Qf3
4.which program 18.Be3 and how do the program evaluate the end position after
[d] r2q1rk1/2p2p1p/p3P1p1/1p2P1b1/n7/1B2BQ2/PP3PPP/R5K1 w - - 0 21
chess is complicate. especially if the material is different and there is a king attack.

Chess System Tal of course is made to deal with the "problem".
Last edited by mclane on Wed Jul 04, 2007 2:41 pm, edited 3 times in total.
AGove

Re: Stable and/or accurate eval

Post by AGove »

Maybe this correlation page will lead to the kinds of answers desired:

http://computerchess.org.uk/ccrl/4040/correlation.html
User avatar
mclane
Posts: 18900
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: Stable and/or accurate eval

Post by mclane »

if 2 engines are similar it does not tell us ANYTHING about the evaluation.
they could be both stupid. and would have a high correlation.
AGove

Re: Stable and/or accurate eval

Post by AGove »

if 2 engines are similar it does not tell us ANYTHING about the evaluation.
they could be both stupid. and would have a high correlation.
Are any of us talking about stupid engines? I suppose someone, somewhere, someday, will actually ask, "Which stupid engines are best for accurate evaluations?"