How often does an engine change is best move?

casaschi · Post by **casaschi** » Wed Oct 09, 2013 2:48 pm

Hello,

I was following some of the discussion about the Ivanov cheating case and I have a couple of questions for engine programmers.

In the discussions about Ivanov you often read that his moves matched the "first choice of houdini" in the given position. However it is rarely defined how the "first choice of houdini" is selected. I suppose you'd get different moves more often than not if you run houdini on a slow computer for few seconds or if you sun houdini on a fast computer for several minutes.

First question: assuming to start an engine on infinite analysis, how often would the engine on average change his mind with respect to the best move (assuming it would stabilize on a selection on the long term)?

The discussion about Ivanov for some reason uses houdini as the default benchmark, but he might have used any other engine or a combination.

Second question: assuming to compare two different engines (for example houdini and stockfish) and run them for a good amount of time on a decent hardware (so to avoid the inial noise at low depth), how often would you expect the best move of the two engines to be the same?

I tried to look at this correlation looking at the last houdini/stockfish game from TCEC, comparing the move predicted by each engine in their PV with the actual move played by the other engine and I got a relatively low match of about 45%, I wonder if this estimate has any general validity.

For all example assume to average results over a large number of positions of different nature (like all positions of a number of recent GM games, so a mix of tactical/forced positions and more strategic positions).

I wonder if anyone calculated similar stats based on real games, otherwise the "best guess" of engine programmers would be welcome.

Thanks.

Don · Post by **Don** » Wed Oct 09, 2013 3:08 pm

casaschi wrote:Hello,

I was following some of the discussion about the Ivanov cheating case and I have a couple of questions for engine programmers.

In the discussions about Ivanov you often read that his moves matched the "first choice of houdini" in the given position. However it is rarely defined how the "first choice of houdini" is selected. I suppose you'd get different moves more often than not if you run houdini on a slow computer for few seconds or if you sun houdini on a fast computer for several minutes.

First question: assuming to start an engine on infinite analysis, how often would the engine on average change his mind with respect to the best move (assuming it would stabilize on a selection on the long term)?

The discussion about Ivanov for some reason uses houdini as the default benchmark, but he might have used any other engine or a combination.

Second question: assuming to compare two different engines (for example houdini and stockfish) and run them for a good amount of time on a decent hardware (so to avoid the inial noise at low depth), how often would you expect the best move of the two engines to be the same?

I tried to look at this correlation looking at the last houdini/stockfish game from TCEC, comparing the move predicted by each engine in their PV with the actual move played by the other engine and I got a relatively low match of about 45%, I wonder if this estimate has any general validity.

For all example assume to average results over a large number of positions of different nature (like all positions of a number of recent GM games, so a mix of tactical/forced positions and more strategic positions).

I wonder if anyone calculated similar stats based on real games, otherwise the "best guess" of engine programmers would be welcome.

Thanks.

I did my own study of move frequency matches with strong programs and came up with some numbers. It's posted here somewhere but I don't remember when or where. It was less than a year ago.

bob · Post by **bob** » Wed Oct 09, 2013 3:18 pm

Don wrote:
casaschi wrote:Hello,

I was following some of the discussion about the Ivanov cheating case and I have a couple of questions for engine programmers.

In the discussions about Ivanov you often read that his moves matched the "first choice of houdini" in the given position. However it is rarely defined how the "first choice of houdini" is selected. I suppose you'd get different moves more often than not if you run houdini on a slow computer for few seconds or if you sun houdini on a fast computer for several minutes.

First question: assuming to start an engine on infinite analysis, how often would the engine on average change his mind with respect to the best move (assuming it would stabilize on a selection on the long term)?

The discussion about Ivanov for some reason uses houdini as the default benchmark, but he might have used any other engine or a combination.

Second question: assuming to compare two different engines (for example houdini and stockfish) and run them for a good amount of time on a decent hardware (so to avoid the inial noise at low depth), how often would you expect the best move of the two engines to be the same?

I tried to look at this correlation looking at the last houdini/stockfish game from TCEC, comparing the move predicted by each engine in their PV with the actual move played by the other engine and I got a relatively low match of about 45%, I wonder if this estimate has any general validity.

For all example assume to average results over a large number of positions of different nature (like all positions of a number of recent GM games, so a mix of tactical/forced positions and more strategic positions).

I wonder if anyone calculated similar stats based on real games, otherwise the "best guess" of engine programmers would be welcome.

Thanks.
I did my own study of move frequency matches with strong programs and came up with some numbers. It's posted here somewhere but I don't remember when or where. It was less than a year ago.

There have been several studies done, and papers published. Monty and I did one, Heinz did one, and there was at least one more. We all found that if you search one ply deeper, a program will change its best move somewhere around 15% of the time. Note that this kind of test would not include tactical positions with just one best move like a mate in N.

JVMerlino · Post by **JVMerlino** » Wed Oct 09, 2013 6:05 pm

bob wrote: There have been several studies done, and papers published. Monty and I did one, Heinz did one, and there was at least one more. We all found that if you search one ply deeper, a program will change its best move somewhere around 15% of the time. Note that this kind of test would not include tactical positions with just one best move like a mate in N.

Is this true for any depth? It would seem that the percentage would be much higher for lower search depths. I also assume the tests for the paper were done with an empty hash, starting the search from scratch, right?

jm

Antonio Torrecillas · Wed Oct 09, 2013 10:06 pm

A few months ago I made the following test that can provide some data for your segond question.

how often would you expect the best move of the two engines to be the same?

Traveling a pgn file, skipping the first 10 ply of each game, I build a set of positions.
Note that this set is not filtered by tactics and probably there is more end game positions...

Then I evaluate this set with the following engines: Komodo 3, Ivanhoe 999949j and fruit 2.3.1.
In a first set of 10.000 positions, the agreement rate was:

Code: Select all

for depth 1 -> 3950 agreements
    depth 2 -> 3089
          3 -> 2659
          4 -> 2353
          5 -> 2151
          6 -> 1976

with a second set of 10.000 position
depth 1 -> 4019 agreements
      2 -> 3123
      3 -> 2659
      4 -> 2334
      5 -> 2149
      6 -> 1973

then a set of 100.000  position
depth 1 -> 39582 agreeements
      2 -> 30849
      3 and up -> beyond the user's patience ;-)

At depth 1, when the recaptures (quiesce search) and static evaluation dominate the result, the rate of agreement was 39-40%
As the search progresses to greater depths, the agreement rate is falling below 20%

unfortunately, I can not say what happens at greater depths, but I hope this data help you to get some insight.
Best regards.

How often does an engine change is best move?

How often does an engine change is best move?

Re: How often does an engine change is best move?

Re: How often does an engine change is best move?

Re: How often does an engine change is best move?

Re: How often does an engine change is best move?