Clone detection test

Milos · Post by **Milos** » Thu Jan 28, 2010 4:13 am

Don wrote:
Milos wrote:The number of the left represents in fact "Average ELO of both engines" - k*"actual difference between engines" and not "actual similarity between engines" as you think.
The number on the left is the count of the number of positions where the programs agreed on the best move. It's not an ELO calculation.

Ofc I know that, I was just stating what is the number really correlated to.
It's mostly not correlated to a level of "cloneness" but to an average ELO level between the engines in question.

Don · Post by **Don** » Thu Jan 28, 2010 4:21 am

Milos wrote:
Don wrote:
Milos wrote:The number of the left represents in fact "Average ELO of both engines" - k*"actual difference between engines" and not "actual similarity between engines" as you think.
The number on the left is the count of the number of positions where the programs agreed on the best move. It's not an ELO calculation.
Ofc I know that, I was just stating what is the number really correlated to.
It's mostly not correlated to a level of "cloneness" but to an average ELO level between the engines in question.

I'm pretty sure that this is not a measurement of strength differences between programs.

Both Larry and I tried tests where we raised the level of a program significantly and surprisingly it did not change the number much. For example strong_sf is stockfish 1.6 running at 2.5 times longer than I tested at, which means it would be playing stronger than Rybka or Robbolito. And yet the number is very close to the stockfish 1.6 number.

Milos · Post by **Milos** » Thu Jan 28, 2010 4:29 am

Don wrote:I'm pretty sure that this is not a measurement of strength differences between programs.

Both Larry and I tried tests where we raised the level of a program significantly and surprisingly it did not change the number much. For example strong_sf is stockfish 1.6 running at 2.5 times longer than I tested at, which means it would be playing stronger than Rybka or Robbolito. And yet the number is very close to the stockfish 1.6 number.

Probably I didn't phrase myself clearly.
So, let me try to explain.
You have engine A and engine B.
What I say is that number on the left side is directly correlated to both:
sqrt(Elo(A)*Elo(B)), and similarity of engine A and B in terms of search and evaluation. However, I think the first correlation is stronger.

Spacious_Mind · Post by **Spacious_Mind** » Thu Jan 28, 2010 5:40 am

I am really sorry for perhaps asking some questions for which the answers from all you learned gentlemen might be far too obvious. But perhaps it will help me understand what you are trying to achieve and enable me to try some other tests with dedicateds. If someone would kindly provide some answers to a question below that I am asking based on your proposed test, I would appreciate this.
I have some interest in clone tests based on dedicated computer. And here is a test I made a while back on a few, which is not too dissimilar to what you are now discussing. But obviously very basic.

http://spacious-mind.com/html/gk_2100_clones_test.html

Now in my test I had several known clones and also two computers that are known to be totally different but of similar playing strength, and one previously suspected clone.

As you can see the real clones scored on average about 95%. playing back exactly the same moves. The related computer to the clones played back 78% percent of all the moves, the non related clones 65% and 60%.

Now here is my question with regards to your proposed tests. If you assume that each moves was a board set-up move (game position) then for dedicateds the number 595 of your test would in my test suggest that all the computers are clones of each other. Ponder was off in my test therefore each position was in reality a unique test position.

Therefore what I am not sure about is the number 595 and how in your test a good number could be provided because in my dedicated test that would actually mean that each computer in the test was in fact a clone?

Or did I understand your proposed test wrong?

Best regards

Nick

lkaufman · Post by **lkaufman** » Thu Jan 28, 2010 6:27 am

A score of 595 on our test would mean that the two programs in question played the same move 59.5% of the time. It does not mean that they each matched the move actually played in the game from which the position was derived any particular percentage of the time. Probably Carlsen plays the same move as Rybka something like 60% of the time (a guess), but if so that does not make Carlsen a clone of Rybka (or vice-versa)! In general scores below 500 indicate programs with little in common besides what all programs have in common, and as the score goes up it suggests that the two programs have more in common than just this. If you just take a program and make a few tiny changes to eval numbers the score would be very high, in the 900s.

Spacious_Mind · Post by **Spacious_Mind** » Thu Jan 28, 2010 6:40 am

lkaufman wrote:A score of 595 on our test would mean that the two programs in question played the same move 59.5% of the time. It does not mean that they each matched the move actually played in the game from which the position was derived any particular percentage of the time. Probably Carlsen plays the same move as Rybka something like 60% of the time (a guess), but if so that does not make Carlsen a clone of Rybka (or vice-versa)! In general scores below 500 indicate programs with little in common besides what all programs have in common, and as the score goes up it suggests that the two programs have more in common than just this. If you just take a program and make a few tiny changes to eval numbers the score would be very high, in the 900s.

Hi Larry

Thanks for your response and I understand that. But if I just stick to dedicateds, I could set up almost any position and I would almost bet that I would get around 95% (950) for the clones in my example test (since they really are clones). 78% (780) with the radioshack machine and 65% and 60% with the other two. Therefore I am still stuck only with regards to dedicateds not your engines understanding the 595 number. Since I know that my clones would follow a 95% (950) same moves path?

That is the part that mystifies me, not the test itself which I find highly interesting.

Best regards

Nick

MattieShoes · Post by **MattieShoes** » Thu Jan 28, 2010 7:13 am

One way to make it less of a hotbutton would simply be to call it a similarity index or something, without throwing the word "clone" around. Then accept that some modified clones will show up different. People could draw their own conclusions. It removes you from the role of policeman.

Edmund · Post by **Edmund** » Thu Jan 28, 2010 7:15 am

Spacious_Mind wrote:
lkaufman wrote:A score of 595 on our test would mean that the two programs in question played the same move 59.5% of the time. It does not mean that they each matched the move actually played in the game from which the position was derived any particular percentage of the time. Probably Carlsen plays the same move as Rybka something like 60% of the time (a guess), but if so that does not make Carlsen a clone of Rybka (or vice-versa)! In general scores below 500 indicate programs with little in common besides what all programs have in common, and as the score goes up it suggests that the two programs have more in common than just this. If you just take a program and make a few tiny changes to eval numbers the score would be very high, in the 900s.
Hi Larry

Thanks for your response and I understand that. But if I just stick to dedicateds, I could set up almost any position and I would almost bet that I would get around 95% (950) for the clones in my example test (since they really are clones). 78% (780) with the radioshack machine and 65% and 60% with the other two. Therefore I am still stuck only with regards to dedicateds not your engines understanding the 595 number. Since I know that my clones would follow a 95% (950) same moves path?

That is the part that mystifies me, not the test itself which I find highly interesting.

Best regards

Nick

If I understand your setup correctly you replayed whole games, whilest Don Dailey just tests on individual postitions. The positions in this tests are not very tactical in nature as Don stated, but only represent the engine style or preference.

Looking at your games, at most moves all the engines play the same move. That is not surprising, as in most chess positions arising in a game one move will be significantly superiour to others. Interesting are only those moves where a lot of different good moves are possible. Maybe like in your first testgame at move #61, where 9 engines played 3 different moves.

Furthermore, you only test if or if not a certain move was matched by all engines. Don in the contrary doesn't test whether or not a certain move was found, but whether it was matched with some other engine.

regards,
Edmund

mjlef · Post by **mjlef** » Thu Jan 28, 2010 7:21 am

lkaufman wrote:A score of 595 on our test would mean that the two programs in question played the same move 59.5% of the time. It does not mean that they each matched the move actually played in the game from which the position was derived any particular percentage of the time. Probably Carlsen plays the same move as Rybka something like 60% of the time (a guess), but if so that does not make Carlsen a clone of Rybka (or vice-versa)! In general scores below 500 indicate programs with little in common besides what all programs have in common, and as the score goes up it suggests that the two programs have more in common than just this. If you just take a program and make a few tiny changes to eval numbers the score would be very high, in the 900s.

Larry,

If you have data on games with human GMs, it would be interesting to see which programs play most like Anand, Kasparov, etc.

Mark

Spacious_Mind · Post by **Spacious_Mind** » Thu Jan 28, 2010 7:42 am

Edmund wrote:
Spacious_Mind wrote:
lkaufman wrote:A score of 595 on our test would mean that the two programs in question played the same move 59.5% of the time. It does not mean that they each matched the move actually played in the game from which the position was derived any particular percentage of the time. Probably Carlsen plays the same move as Rybka something like 60% of the time (a guess), but if so that does not make Carlsen a clone of Rybka (or vice-versa)! In general scores below 500 indicate programs with little in common besides what all programs have in common, and as the score goes up it suggests that the two programs have more in common than just this. If you just take a program and make a few tiny changes to eval numbers the score would be very high, in the 900s.
Hi Larry

Thanks for your response and I understand that. But if I just stick to dedicateds, I could set up almost any position and I would almost bet that I would get around 95% (950) for the clones in my example test (since they really are clones). 78% (780) with the radioshack machine and 65% and 60% with the other two. Therefore I am still stuck only with regards to dedicateds not your engines understanding the 595 number. Since I know that my clones would follow a 95% (950) same moves path?

That is the part that mystifies me, not the test itself which I find highly interesting.

Best regards

Nick
If I understand your setup correctly you replayed whole games, whilest Don Dailey just tests on individual postitions. The positions in this tests are not very tactical in nature as Don stated, but only represent the engine style or preference.

Looking at your games, at most moves all the engines play the same move. That is not surprising, as in most chess positions arising in a game one move will be significantly superiour to others. Interesting are only those moves where a lot of different good moves are possible. Maybe like in your first testgame at move #61, where 9 engines played 3 different moves.

Furthermore, you only test if or if not a certain move was matched by all engines. Don in the contrary doesn't test whether or not a certain move was found, but whether it was matched with some other engine.

regards,
Edmund

Hi Edmund

I understand what Larry is trying to achieve and I am not questioning the test. I am trying to see if I can use it for similar tests. But still my logic tells me that taking into consideration suitable test positions (since they are not 3000 ELO computers) I would still have the example dedicated computer clones doing more than 59.5%. They have already proven to me that they repeat 95% of all moves.

The number 59.5 may well be perfect for engines for this test, but I am still trying to get to grips of what number would be suitble for known clones in my collection that play 95% of the time exactly the same moves. Same applies for absolutely non clones playing regular chess at 60-65%.

The same logic really applies also if knew that my two theoretically speaking totally the same engines that repeat 95% of all regular moves were to do this test. Would the number 595 still hold good? If the answer is yes then my questions are answered.

best regards

Nick

Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test

Re: Clone detection test