Pairwise Analysis of Chess Engine Move Selections

michiguel · Post by **michiguel** » Fri May 20, 2011 7:12 am

Adam Hair wrote:Here is a graph of the 3403 data points. There is a perceptible negative tilt
to the data as the elo difference per pair increases. I will try to upload all data as soon as possible.

Y axis = Percentage of moves matched
X axis = Difference in Elo

What is the pair that is ~550 ELO different and has >60% matches?

Miguel

michiguel · Post by **michiguel** » Fri May 20, 2011 7:23 am

Adam Hair wrote:
Jan Brouwer wrote:
Adam Hair wrote:Yes. Though, many of the other engines I have tested are actually earlier versions of engines listed above.

My main focus at this point is to see if every group of engines that tend to choose similar moves at a higher rate include an open source engine that preceded the other engines.
It may be interesting to include Rebel in your comparison because it is an engine with a detailed description of the ideas used on Ed's web pages, but with no access to the source code. If there happens to be an engine with a high similarity to Rebel, it may indicate a lower bound to the similarity achievable with copying code.
ProDeo is one of the engines that I am trying to include. Many winboard
engines are proving difficult to test. But, I think I know a way around the
problems I am having.

If only two or more closed source engines showed a high level of similarity
it would be an interesting result that would highlight an additional limitation
of this sort of comparison. Unfortunately, every such case that I have
found so far also involves an older open source engine.

Question: What is the highest similarity between two engines (OS or not) in which this number is higher than the similarity between any of these two engines with any OS one.

e.g. engine A and B shares a 60 % similarity, and no OS engine shares more than 60% with either A or B.

Miguel

Jan Brouwer wrote: PS: there are methods which allow "cloning" of software where it can be proven to a certain degree that no copyright violation occurred, such as clean room design. It is possible that a (commercial) engine is developed by carefully observing the behaviour of a specific competitor without looking at its source code or without the source code even being available. This would almost by definition optimize the similarity of move selection without any copyright violation.
Could this happened with in the case of the Rybka clones?

Jan

michiguel · Post by **michiguel** » Fri May 20, 2011 7:39 am

Adam Hair wrote:
Dann Corbit wrote:
Laskos wrote:
Dann Corbit wrote:
CRoberson wrote:
Dann Corbit wrote: I would also like to see a correlation based upon strength. For instance, suppose that engine A is 50 Elo stronger than engine B... Then run an experiment where A has one thread and B has two threads (or some similar way to try to match impedence). Perhaps correlations are *largely* a function of strength, and perhaps not.
It is not strength that matters in how deterministic a program is. You sent me email sometime around 2000 give or take a year about how deterministic NoonianChess is. While it is very deterministic, it is not very strong compared to Rybka and the like.
The reason I posed that response is that I suspect super GMs will make similar moves to each other and GMs will make similar moves to each other and IMs will make similar moves to each other.

Suppose there is some very difficult and esoteric move in a position that arises in a game. Paul Morphy finds it, Capablanca finds it, Kasparov finds it. But maybe other {slightly lesser} players won't.

Consider fairly difficult test sets... The best engines get similar (good) scores. So I do not think we can rule out similarity of moves chosen as a basis of strength (at least to some degree) until it is tested.
This is a worn out discussion several months old which I had with Miguel, strange you missed it. Yes, there is a correlation with the strength, there is also a correlation of self-similarity and even plain similarity with the time control. Adam is using time control adjustments to compensate for strength differences, using the formula

Time = 10ms * 2^(Elo diff/100)

I have only two small objections:
1. 10ms for Houdini 1.5 seems to me too little. On Windows the standard C clock() function only has a resolution of 16ms.
2. 100 Elo points per doubling seems a little too much, maybe 70-80.

These things are pretty much irrelevant, the main thing is interpreting the results.

Kai
I agree, and I am also not sure what will happen when you test various engines against themselves. But I think that the experiment will be very valuable as a control. If we do not run the experiment, then we have no idea how an identical engine would perform. Without that information, I don't think we have any real idea what "these engines make similar moves" even means.
I can't really agree with your last statement, but I have the advantage of seeing much more data from sim testing than you have. If the majority of the engines chose the same moves at 42% to 52% in pairs but some pairs chose the same moves at 60% to 70% or more and there are enough pairs to make some normality assumptions, then we can start forming an idea about what "similarity in move selections" means. It in no way proves anything about the code for each engine, but it does give information regarding the characteristics of playing style.

I have seen and analyzed data from 4 diff. people, from diff. set of positions, and I have no doubt whatsoever in my mind that these type of tests give results that in no way are product of a coincidence. How to interpret the results may be up to debate, but an extremely high number of matches is not a random thing. Strength does not justify it.

I believe there are certain elements in the evaluation with a higher impact in causing the similarities or differences, but this is speculation.

Miguel

Adam Hair · Post by **Adam Hair** » Fri May 20, 2011 1:33 pm

michiguel wrote:
Adam Hair wrote:Here is a graph of the 3403 data points. There is a perceptible negative tilt
to the data as the elo difference per pair increases. I will try to upload all data as soon as possible.

Y axis = Percentage of moves matched
X axis = Difference in Elo

What is the pair that is ~550 ELO different and has >60% matches?

Miguel

That would be Philou 3.5.1 and Stockfish.

Dann Corbit · Post by **Dann Corbit** » Fri May 20, 2011 9:28 pm

Have you done a Toga verses Toga similarity test?
The reason I ask is that Fruit verses Fruit gives extremely high similarities (95% IIRC) and I wonder if a definite Fruit clone like Toga has a highly similar profile.

Adam Hair · Post by **Adam Hair** » Sat May 21, 2011 10:34 pm

Dann Corbit wrote:Have you done a Toga verses Toga similarity test?
The reason I ask is that Fruit verses Fruit gives extremely high similarities (95% IIRC) and I wonder if a definite Fruit clone like Toga has a highly similar profile.

I will do that this weekend. I do know that at 2 secs per position, Toga 1.41 SE had a self similarity of 98.03%.

And this is what I have for Toga at 100ms per position:

Code: Select all

C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 1

sim version 3
------ Grapefruit 1.0 (time: 100 ms  scale: 1.0) ------
 98.01  Grapefruit 1.0_ (time: 100 ms  scale: 1.0)
 60.88  Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0)
 60.85  Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0)
 60.69  Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0)
 60.66  Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0)
 60.60  Toga II 1.4.1SE (time: 100 ms  scale: 1.0)
 60.50  Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0)


C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 2

sim version 3
------ Grapefruit 1.0_ (time: 100 ms  scale: 1.0) ------
 98.01  Grapefruit 1.0 (time: 100 ms  scale: 1.0)
 60.69  Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0)
 60.68  Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0)
 60.65  Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0)
 60.63  Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0)
 60.37  Toga II 1.4.1SE (time: 100 ms  scale: 1.0)
 60.32  Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0)


C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 3

sim version 3
------ Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0) ------
 98.57  Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0)
 79.12  Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0)
 78.62  Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0)
 64.86  Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0)
 64.65  Toga II 1.4.1SE (time: 100 ms  scale: 1.0)
 60.66  Grapefruit 1.0 (time: 100 ms  scale: 1.0)
 60.65  Grapefruit 1.0_ (time: 100 ms  scale: 1.0)


C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 4

sim version 3
------ Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0) ------
 98.57  Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0)
 79.07  Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0)
 78.65  Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0)
 65.04  Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0)
 64.83  Toga II 1.4.1SE (time: 100 ms  scale: 1.0)
 60.69  Grapefruit 1.0_ (time: 100 ms  scale: 1.0)
 60.69  Grapefruit 1.0 (time: 100 ms  scale: 1.0)


C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 5

sim version 3
------ Toga II 1.4.1SE (time: 100 ms  scale: 1.0) ------
 98.15  Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0)
 65.38  Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0)
 65.10  Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0)
 64.83  Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0)
 64.65  Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0)
 60.60  Grapefruit 1.0 (time: 100 ms  scale: 1.0)
 60.37  Grapefruit 1.0_ (time: 100 ms  scale: 1.0)


C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 6

sim version 3
------ Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0) ------
 98.15  Toga II 1.4.1SE (time: 100 ms  scale: 1.0)
 65.61  Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0)
 65.26  Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0)
 65.04  Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0)
 64.86  Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0)
 60.50  Grapefruit 1.0 (time: 100 ms  scale: 1.0)
 60.32  Grapefruit 1.0_ (time: 100 ms  scale: 1.0)


C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 7

sim version 3
------ Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0) ------
 98.09  Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0)
 78.65  Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0)
 78.62  Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0)
 65.61  Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0)
 65.38  Toga II 1.4.1SE (time: 100 ms  scale: 1.0)
 60.88  Grapefruit 1.0 (time: 100 ms  scale: 1.0)
 60.68  Grapefruit 1.0_ (time: 100 ms  scale: 1.0)


C:\2sec_1\Similarity_Tests(01_13_2011)>sim03w64.exe -r 8

sim version 3
------ Toga II 3.1.2SE JA_ (time: 100 ms  scale: 1.0) ------
 98.09  Toga II 3.1.2SE JA (time: 100 ms  scale: 1.0)
 79.12  Toga II 1.3 Beta1 (time: 100 ms  scale: 1.0)
 79.07  Toga II 1.3 Beta1_ (time: 100 ms  scale: 1.0)
 65.26  Toga II 1.4.1SE_ (time: 100 ms  scale: 1.0)
 65.10  Toga II 1.4.1SE (time: 100 ms  scale: 1.0)
 60.85  Grapefruit 1.0 (time: 100 ms  scale: 1.0)
 60.63  Grapefruit 1.0_ (time: 100 ms  scale: 1.0)

Adam Hair · Post by **Adam Hair** » Sun May 22, 2011 2:25 am

Here is the data for the 83 engines:

http://www.mediafire.com/file/p6vj9k7d3 ... engines.7z

Pairwise Analysis of Chess Engine Move Selections

Re: Pairwise Analysis of Chess Engine Move Selections

Re: Pairwise Analysis of Chess Engine Move Selections

Re: Pairwise Analysis of Chess Engine Move Selections

Re: Pairwise Analysis of Chess Engine Move Selections

Re: Pairwise Analysis of Chess Engine Move Selections

Re: Pairwise Analysis of Chess Engine Move Selections

Re: Pairwise Analysis of Chess Engine Move Selections