Engine evaluation consistency/stability

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
gordonr
Posts: 137
Joined: Thu Aug 06, 2009 6:04 pm
Location: UK

Engine evaluation consistency/stability

Post by gordonr » Wed Oct 09, 2019 11:15 am

Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon

Ferdy
Posts: 4077
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Engine evaluation consistency/stability

Post by Ferdy » Wed Oct 09, 2019 1:18 pm

gordonr wrote:
Wed Oct 09, 2019 11:15 am
Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon
I did a test some months ago. Search for EARS or engine analysis reliabilty score. I think I did a 6 ply comparison. Source code is available so you can do some experiments with latest engines.

jdart
Posts: 3816
Joined: Fri Mar 10, 2006 4:23 am
Location: http://www.arasanchess.org

Re: Engine evaluation consistency/stability

Post by jdart » Wed Oct 09, 2019 1:33 pm

I prefer Houdini for most analysis because its eval doesn't bounce around so much.

Jouni
Posts: 1979
Joined: Wed Mar 08, 2006 7:15 pm

Re: Engine evaluation consistency/stability

Post by Jouni » Wed Oct 09, 2019 2:05 pm

In TCEC SF has many games (white and black) with 0,00 evaluation entire game :!: Some games are over 100 moves. Chess is soon solved?
Jouni

Uri
Posts: 418
Joined: Thu Dec 27, 2007 8:34 pm

Re: Engine evaluation consistency/stability

Post by Uri » Wed Oct 09, 2019 3:27 pm

Jouni wrote:
Wed Oct 09, 2019 2:05 pm
In TCEC SF has many games (white and black) with 0,00 evaluation entire game :!: Some games are over 100 moves. Chess is soon solved?
I believe that chess is still very far away from being solved. Even in the year 4020 (we are now in the year 2020) chess would still not be completely solved.

You see chess is so very complex that chess engines still have many weaknesses in their chess understanding and knowledge of the game, compared to humans.

gordonr
Posts: 137
Joined: Thu Aug 06, 2009 6:04 pm
Location: UK

Re: Engine evaluation consistency/stability

Post by gordonr » Wed Oct 09, 2019 9:09 pm

Thanks everyone for their help. Ferdy, I found your excellent post. Very interesting and useful indeed.

viewtopic.php?f=2&t=70151&p=792684&hilit=ears#p792684

Dann Corbit
Posts: 9994
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Engine evaluation consistency/stability

Post by Dann Corbit » Wed Oct 09, 2019 9:19 pm

gordonr wrote:
Wed Oct 09, 2019 11:15 am
Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon
You can fix the stockfish sewing machine with this simple thing:

In ucioptions.cpp (set to false from the GUI):

Code: Select all

    o["Show Fail High and Fail Low"] << Option(true);
In search.cpp:

Code: Select all

    bool bSewingMachine = Options["Show Fail High and Fail Low"];
Then do this:

Code: Select all

               // When failing high/low give some update (without cluttering
                // the UI) before a re-search.
                if (   mainThread
                        && multiPV == 1
                        && (bestValue <= alpha || bestValue >= beta)
                        && (Time.elapsed() > 3000 ) && bSewingMachine)
                    sync_cout << UCI::pv(rootPos, rootDepth, alpha, beta) << sync_endl;

Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

User avatar
Ovyron
Posts: 2505
Joined: Tue Jul 03, 2007 2:30 am

Re: Engine evaluation consistency/stability

Post by Ovyron » Thu Oct 10, 2019 8:06 am

Jouni wrote:
Wed Oct 09, 2019 2:05 pm
In TCEC SF has many games (white and black) with 0,00 evaluation entire game :!: Some games are over 100 moves. Chess is soon solved?
Chess isn't close at all to being solved. People have claimed that they can produce perfect chess moves on the fly, but if this was true, they could make an opening book where their moves were played up to the point where an unassisted engine could draw the game from there at bullet chess. That this hasn't been done and bullet chess is fine draw-wise means those people still have to work hard to produce "perfect chess", and the only reason those people haven't lost yet is because they haven't played enough games for that.
Great spirits have always encountered violent opposition from mediocre minds.

User avatar
Laskos
Posts: 9410
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Engine evaluation consistency/stability

Post by Laskos » Thu Oct 10, 2019 9:26 am

Hmmm, no word about Leela?
It is stable in both time (search) and along the PV, if there is not a lot of tactics. Compared to AB engines I know, much more stable. And for tactics, AB engines are complementary to Leela, they find it and stick to it.

User avatar
Ovyron
Posts: 2505
Joined: Tue Jul 03, 2007 2:30 am

Re: Engine evaluation consistency/stability

Post by Ovyron » Thu Oct 10, 2019 2:15 pm

Laskos wrote:
Thu Oct 10, 2019 9:26 am
Hmmm, no word about Leela?
It is stable in both time (search) and along the PV, if there is not a lot of tactics.
Or one big tactic. But you never know, the analysis will become inconsistent once she sees it, so it's not stable.
Great spirits have always encountered violent opposition from mediocre minds.

Post Reply