Engine evaluation consistency/stability

gordonr · Post by **gordonr** » Wed Oct 09, 2019 1:15 pm

Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon

Ferdy · Post by **Ferdy** » Wed Oct 09, 2019 3:18 pm

gordonr wrote: ↑Wed Oct 09, 2019 1:15 pm Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon

I did a test some months ago. Search for EARS or engine analysis reliabilty score. I think I did a 6 ply comparison. Source code is available so you can do some experiments with latest engines.

jdart · Post by **jdart** » Wed Oct 09, 2019 3:33 pm

I prefer Houdini for most analysis because its eval doesn't bounce around so much.

Jouni · Post by **Jouni** » Wed Oct 09, 2019 4:05 pm

In TCEC SF has many games (white and black) with 0,00 evaluation entire game

Some games are over 100 moves. Chess is soon solved?

Uri · Post by **Uri** » Wed Oct 09, 2019 5:27 pm

Jouni wrote: ↑Wed Oct 09, 2019 4:05 pm In TCEC SF has many games (white and black) with 0,00 evaluation entire game Some games are over 100 moves. Chess is soon solved?

I believe that chess is still very far away from being solved. Even in the year 4020 (we are now in the year 2020) chess would still not be completely solved.

You see chess is so very complex that chess engines still have many weaknesses in their chess understanding and knowledge of the game, compared to humans.

gordonr · Post by **gordonr** » Wed Oct 09, 2019 11:09 pm

Thanks everyone for their help. Ferdy, I found your excellent post. Very interesting and useful indeed.

http://talkchess.com/forum3/viewtopic.p ... rs#p792684

Dann Corbit · Post by **Dann Corbit** » Wed Oct 09, 2019 11:19 pm

gordonr wrote: ↑Wed Oct 09, 2019 1:15 pm Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon

You can fix the stockfish sewing machine with this simple thing:

In ucioptions.cpp (set to false from the GUI):

Code: Select all

    o["Show Fail High and Fail Low"] << Option(true);

In search.cpp:

Code: Select all

    bool bSewingMachine = Options["Show Fail High and Fail Low"];

Then do this:

Code: Select all

               // When failing high/low give some update (without cluttering
                // the UI) before a re-search.
                if (   mainThread
                        && multiPV == 1
                        && (bestValue <= alpha || bestValue >= beta)
                        && (Time.elapsed() > 3000 ) && bSewingMachine)
                    sync_cout << UCI::pv(rootPos, rootDepth, alpha, beta) << sync_endl;

Ovyron · Post by **Ovyron** » Thu Oct 10, 2019 10:06 am

Jouni wrote: ↑Wed Oct 09, 2019 4:05 pm In TCEC SF has many games (white and black) with 0,00 evaluation entire game Some games are over 100 moves. Chess is soon solved?

Chess isn't close at all to being solved. People have claimed that they can produce perfect chess moves on the fly, but if this was true, they could make an opening book where their moves were played up to the point where an unassisted engine could draw the game from there at bullet chess. That this hasn't been done and bullet chess is fine draw-wise means those people still have to work hard to produce "perfect chess", and the only reason those people haven't lost yet is because they haven't played enough games for that.

Laskos · Post by **Laskos** » Thu Oct 10, 2019 11:26 am

Hmmm, no word about Leela?
It is stable in both time (search) and along the PV, if there is not a lot of tactics. Compared to AB engines I know, much more stable. And for tactics, AB engines are complementary to Leela, they find it and stick to it.

Ovyron · Post by **Ovyron** » Thu Oct 10, 2019 4:15 pm

Laskos wrote: ↑Thu Oct 10, 2019 11:26 am Hmmm, no word about Leela?
It is stable in both time (search) and along the PV, if there is not a lot of tactics.

Or one big tactic. But you never know, the analysis will become inconsistent once she sees it, so it's not stable.

Engine evaluation consistency/stability

Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability

Re: Engine evaluation consistency/stability