Similarity Detector Available

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity Detector Available

Post by Don »

perejaslav wrote:

Code: Select all

Microsoft Windows [Version 6.1.7600]


D:\Test\sim02>sim02w64.exe


usage:
   similar
   similar -test INVOKE {time_in_ms}
   similar -report N
   similar -config FILE


D:\Test\sim02>sim02w64.exe -t Rybka4x64.exe 200


D:\Test\sim02>
What am I doing wrong, Don. Nothing happens after I inputing those commands :evil:
It looks correct. I would check to make sure the executable is in your path and spelled correctly.

By the way, I see you have discovered that you can use the shortcut -t -r and -c

Don
perejaslav
Posts: 240
Joined: Sat Mar 18, 2006 4:01 am
Location: Cold

Re: Similarity Detector Available

Post by perejaslav »

Don wrote:
It looks correct. I would check to make sure the executable is in your path and spelled correctly.

By the way, I see you have discovered that you can use the shortcut -t -r and -c

Don
I used the first version of your utility without any problems. :oops:
Here you can see the list of files that I have extracted from archive. They are all untouched. I only appended Rybka4x64.exe (removed quotas in the exe name). But can't get it working...

Image
Uri Blass
Posts: 10903
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: clone tester available

Post by Uri Blass »

Laskos wrote:
Don wrote:
The program is at the same site but is called a.exe and has 10,000 position minus the ones that EVERY program played the same. So it will give substantially different scores.
Thanks Don, now it's much better, the standard deviation is about 0.5% and the span of "not similar" to "very similar" is more than 10% instead of 7%, separations are much clearer. I read that you put a config file there, that would be useful for me to go to 1 core, not only it flattens the strength, but the self-similarity is higher on 1 core (more deterministic outcome).

Code: Select all


C:\similar>a -r 1
------ Fruit 2.1 (time: 300 ms) ------
 56.30  Rybka 1.0 Beta 32-bit (time: 100 ms)
 56.14  Strelka 1.8 UCI (time: 100 ms)
 53.54  Glaurung 2-epsilon/5 (time: 100 ms)
 51.31  Rybka 3  (time: 100 ms)
 50.29  RobboLito 0.09 x64 (time: 100 ms)
 50.00  Deep Rybka _4_x64 (time: 100 ms)
 48.92  IvanHoe_B49jAx64 (time: 100 ms)
 47.79  Deep Shredder 12 x64 (time: 100 ms)
 46.94  Houdini 1.51x64 (time: 100 ms)
 46.66  Houdini 1.5 x64 (time: 100 ms)
 46.61  Ruffian 1.0.1 (time: 100 ms)
 44.00  Deep Shredder 9 UCI (time: 100 ms)
 43.15  Chess Tiger 2007  (time: 100 ms)


C:\similar>a -r 2
------ Houdini 1.5 x64 (time: 100 ms) ------
 67.81  Houdini 1.51x64 (time: 100 ms)
 62.31  IvanHoe_B49jAx64 (time: 100 ms)
 61.22  RobboLito 0.09 x64 (time: 100 ms)
 55.79  Rybka 3  (time: 100 ms)
 54.61  Deep Rybka _4_x64 (time: 100 ms)
 47.29  Strelka 1.8 UCI (time: 100 ms)
 47.28  Rybka 1.0 Beta 32-bit (time: 100 ms)
 46.99  Deep Shredder 12 x64 (time: 100 ms)
 46.66  Fruit 2.1 (time: 300 ms)
 43.81  Chess Tiger 2007  (time: 100 ms)
 42.60  Glaurung 2-epsilon/5 (time: 100 ms)
 42.27  Deep Shredder 9 UCI (time: 100 ms)
 39.39  Ruffian 1.0.1 (time: 100 ms)


C:\similar>a -r 13
------ Rybka 3  (time: 100 ms) ------
 60.83  RobboLito 0.09 x64 (time: 100 ms)
 59.19  Deep Rybka _4_x64 (time: 100 ms)
 58.99  IvanHoe_B49jAx64 (time: 100 ms)
 55.79  Houdini 1.5 x64 (time: 100 ms)
 55.68  Houdini 1.51x64 (time: 100 ms)
 52.54  Strelka 1.8 UCI (time: 100 ms)
 52.25  Rybka 1.0 Beta 32-bit (time: 100 ms)
 51.31  Fruit 2.1 (time: 300 ms)
 49.07  Deep Shredder 12 x64 (time: 100 ms)
 47.26  Glaurung 2-epsilon/5 (time: 100 ms)
 45.04  Chess Tiger 2007  (time: 100 ms)
 44.71  Deep Shredder 9 UCI (time: 100 ms)
 43.86  Ruffian 1.0.1 (time: 100 ms)


C:\similar>a -r 10
------ Rybka 1.0 Beta 32-bit (time: 100 ms) ------
 73.56  Strelka 1.8 UCI (time: 100 ms)
 56.30  Fruit 2.1 (time: 300 ms)
 52.25  Rybka 3  (time: 100 ms)
 51.72  Deep Rybka _4_x64 (time: 100 ms)
 51.08  RobboLito 0.09 x64 (time: 100 ms)
 49.99  Glaurung 2-epsilon/5 (time: 100 ms)
 49.44  IvanHoe_B49jAx64 (time: 100 ms)
 47.34  Houdini 1.51x64 (time: 100 ms)
 47.28  Houdini 1.5 x64 (time: 100 ms)
 46.01  Deep Shredder 12 x64 (time: 100 ms)
 44.46  Ruffian 1.0.1 (time: 100 ms)
 42.09  Deep Shredder 9 UCI (time: 100 ms)
 41.94  Chess Tiger 2007  (time: 100 ms)
Now the numbers are a little different, one has to get used to the new matrix. I put in many original programs of the Rybka epoch, Ruffian is also interesting, as it's original in the sense that it's probably a UCI original Fritz 6 :), of course some Ippos, Houdini and several Rybkas.

I have to get used to these numbers, but not putting many latter fruitish engines, Rybka 1.0 beta seems one of the most fruitish. Ippos the most Rybkish 3. Strelka 1.8 is probably extremely similar to Rybka 1. Houdini 1.5 seems similar to Ippos.

Thanks,
Kai
I think that fruit2.1(300 ms) is more similiar in strength to rybka1 beta(100 ms) relative to part of the other engines in the list.

If you use unequal times to make similiarity in strength then you should use more time than 300 ms for engines that are weaker than fruit2.1 and less than 100 ms for engines that are stronger than rybka1 beta.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: clone tester available

Post by Laskos »

Uri Blass wrote:
I think that fruit2.1(300 ms) is more similiar in strength to rybka1 beta(100 ms) relative to part of the other engines in the list.

If you use unequal times to make similiarity in strength then you should use more time than 300 ms for engines that are weaker than fruit2.1 and less than 100 ms for engines that are stronger than rybka1 beta.
Yes, I know, I did on purpose this, as I cannot assign ~5 ms to Houdini 1.5 to equal the strength. Many programs are in different weight categories, if I'll set 100 ms to Houdini, I cannot assign ~10,000 ms to Fruit 2.1 or Ruffian, the test would last forever. Nevertheless, there are programs of strength similar to Rybka 1.0 and Fruit 2.1 (3x), like Shredder 9, CT 2007, Glaurung. They all show a lesser similarity.

Kai
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity Detector Available

Post by Don »

Don wrote:
perejaslav wrote:

Code: Select all

Microsoft Windows [Version 6.1.7600]


D:\Test\sim02>sim02w64.exe


usage:
   similar
   similar -test INVOKE {time_in_ms}
   similar -report N
   similar -config FILE


D:\Test\sim02>sim02w64.exe -t Rybka4x64.exe 200


D:\Test\sim02>
What am I doing wrong, Don. Nothing happens after I inputing those commands :evil:
It looks correct. I would check to make sure the executable is in your path and spelled correctly.

By the way, I see you have discovered that you can use the shortcut -t -r and -c

Don
Ok, I think there is a bug. Evidently you MUST configure it with a configuration file. I will try to fix the bug in the next couple of days.

So make a file that looks something like this:

----[snip]-----
exe = Rybka4x64.exe
scale = 1.0
Max CPUs = 1


Then do this:

sim02w64.exe -c rybka.config_file 200

Where rybka.config_file is the name of the file you make to configure rybka.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: clone tester available

Post by Don »

Laskos wrote:
Uri Blass wrote:
I think that fruit2.1(300 ms) is more similiar in strength to rybka1 beta(100 ms) relative to part of the other engines in the list.

If you use unequal times to make similiarity in strength then you should use more time than 300 ms for engines that are weaker than fruit2.1 and less than 100 ms for engines that are stronger than rybka1 beta.
Yes, I know, I did on purpose this, as I cannot assign ~5 ms to Houdini 1.5 to equal the strength. Many programs are in different weight categories, if I'll set 100 ms to Houdini, I cannot assign ~10,000 ms to Fruit 2.1 or Ruffian, the test would last forever. Nevertheless, there are programs of strength similar to Rybka 1.0 and Fruit 2.1 (3x), like Shredder 9, CT 2007, Glaurung. They all show a lesser similarity.

Kai
If a program requires over 100x more time to play as strong, it's probably not so important to obsess over how similar it is.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Similarity Detector Available

Post by bob »

my only comment here is that this is likely going to run afoul of the "birthday paradox" frequently. Given enough programs. A new program will frequently choose the same moves as another program, "just because". The more samples, the greater the probability this will happen. Lots of false positives are not going to help a thing...
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity Detector Available

Post by Don »

You were not doing anything wrong - the whole thing was buggy.

It is fixed now - sorry about any inconvenience.

Get it at: http://komodochess.com

Don

perejaslav wrote:

Code: Select all

Microsoft Windows [Version 6.1.7600]


D:\Test\sim02>sim02w64.exe


usage:
   similar
   similar -test INVOKE {time_in_ms}
   similar -report N
   similar -config FILE


D:\Test\sim02>sim02w64.exe -t Rybka4x64.exe 200


D:\Test\sim02>
What am I doing wrong, Don. Nothing happens after I inputing those commands :evil:
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity Detector Available

Post by Don »

bob wrote:my only comment here is that this is likely going to run afoul of the "birthday paradox" frequently. Given enough programs. A new program will frequently choose the same moves as another program, "just because". The more samples, the greater the probability this will happen. Lots of false positives are not going to help a thing...
In order to have a false positive you need context. All this utility does is counts how many moves (out of approx 8000) that 2 programs play in common and returns the percentage. How can that be a false positive? It will be whatever it will be for any two programs.

I continue to get comments over and over again from people who are assuming context which betrays a fundamental misunderstanding of what this tool does and how it works.

If you view this utility as a "clone tester", and you assign some arbitrary percentage value to signify that a program is a "clone", then you can have false positives. But that is not what this utility does and it's not what it's for.

For example: When I tested Robbolito and Houdini, I got a ridiculously high match rate, higher than most other pairs of programs and in many cases much higher than the match rate between two versions of the SAME chess program!

So is that a false positive? No, it's just a fact. The two program play a lot of moves the same. It does not mean Robbolito is a clone of Houdini or a derivative or anything else, it just means they both play the same move a lot more than almost any other program.

My intent for the tool was as a diagnostic aid and a tool to examine the playing styles of programs. It returns some result and it's up to you to figure out what it means or doesn't mean and to use good sense and judgement, an increasingly rare commodity these days.

I actually got the idea for this from YOU and John Stanback. I was at a tournament where a version of Crafty was claimed to be heavily modified in the evaluation and was allowed in the tournament. However this program was doing unusually well and Vincent suspected something and you were contacted and consulted. From what I was told, you checked the moves of the game against Crafty and felt too many were the same.

John Stanback in another tournament noticed the same thing simply by watching the tournament games on line and comparing the moves to his own program.

I think every good chess player who gets really familiar with chess program agree's that each program has it's own individual personality. Of course that can only be revealed through the moves it makes.

I understand what you are saying about the birthday paradox and agree, I just think it's not relevant without assuming the context of "clone testing." However, if you tested 1000 unique program by different authors who did not share ideas, etc. you would surely find 2 programs that played very similar chess. The fact that they might play very similar is not a paradox or a lie, it's just how it is.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Similarity Detector Available

Post by bob »

Don wrote:
bob wrote:my only comment here is that this is likely going to run afoul of the "birthday paradox" frequently. Given enough programs. A new program will frequently choose the same moves as another program, "just because". The more samples, the greater the probability this will happen. Lots of false positives are not going to help a thing...
In order to have a false positive you need context. All this utility does is counts how many moves (out of approx 8000) that 2 programs play in common and returns the percentage. How can that be a false positive? It will be whatever it will be for any two programs.
Simple. Someone is going to choose a number. Say 70%. If A matches B 70% of the time, it is likely a derivative. (replace 70% by any reasonable number you want). If you take program A and compare it to B, you might get 40%. If you compare it to C, you might get 50%. If you compare it to enough programs, you will get at least one 70% or higher. From unrelated programs...

When you produce numbers, you have to expect _someone_ to use them to reach a conclusion. In this case, the conclusion might be right, wrong, or random.

I continue to get comments over and over again from people who are assuming context which betrays a fundamental misunderstanding of what this tool does and how it works.

If you view this utility as a "clone tester", and you assign some arbitrary percentage value to signify that a program is a "clone", then you can have false positives. But that is not what this utility does and it's not what it's for.

For example: When I tested Robbolito and Houdini, I got a ridiculously high match rate, higher than most other pairs of programs and in many cases much higher than the match rate between two versions of the SAME chess program!

So is that a false positive? No, it's just a fact. The two program play a lot of moves the same. It does not mean Robbolito is a clone of Houdini or a derivative or anything else, it just means they both play the same move a lot more than almost any other program.
All well and good. But the moment you produce numbers, you have to expect someone to take them at face value. I wouldn't consider such comparisons myself. But many will. And they will draw the wrong conclusion.

Exactly what do you expect the numbers to show? What does it mean when two programs match 70% of the time? That they have the same search but different evals? Same evals but different search? A combination of both? It is pretty much meaningless. Perhaps a good way to compute some random numbers for a Zobrist hashing scheme... but there are less expensive ways to do that.


My intent for the tool was as a diagnostic aid and a tool to examine the playing styles of programs. It returns some result and it's up to you to figure out what it means or doesn't mean and to use good sense and judgement, an increasingly rare commodity these days.

I actually got the idea for this from YOU and John Stanback. I was at a tournament where a version of Crafty was claimed to be heavily modified in the evaluation and was allowed in the tournament. However this program was doing unusually well and Vincent suspected something and you were contacted and consulted. From what I was told, you checked the moves of the game against Crafty and felt too many were the same.

John Stanback in another tournament noticed the same thing simply by watching the tournament games on line and comparing the moves to his own program.
For an isolated data point, that is a good place to start. But to compare a suspected clone against a huge suite of others? Again, false positives. Too many samples.


I think every good chess player who gets really familiar with chess program agree's that each program has it's own individual personality. Of course that can only be revealed through the moves it makes.

I understand what you are saying about the birthday paradox and agree, I just think it's not relevant without assuming the context of "clone testing." However, if you tested 1000 unique program by different authors who did not share ideas, etc. you would surely find 2 programs that played very similar chess. The fact that they might play very similar is not a paradox or a lie, it's just how it is.
The danger is, as I said, that some will take these numbers to be something like a correlation coefficient, with some threshold beyond which clone is proven...