Similarity Detector Available

Laskos · Post by **Laskos** » Thu Dec 30, 2010 12:41 am

Don wrote:
Laskos wrote: Houdini 1.51 is a pure copy of Houdini 1.5, I put it to set the upper boundary of these pretty non-deterministic engines. My problem is mostly statistical, 10,000 or so positions are needed at least, and in no case the claim about cloning could be made more than circumstantially. More like a property of some engines.
I actually have a version with 10,000 positions. I recently removed almost 2000 of them due to the fact that EVERY engine played the same move (out of 18 engines I tested)

I agree with you about the circumstantial evidence. The only thing this test can show you is how much stylistic similarity there is between two programs.

In fact, I think the test tells you mostly about the evaluation function of the two programs, and hardly anything about any other part of the program. However it's my belief that this is the most important and difficult part of a chess program and what separates the men from the boys.

Don, how do I make 10,000 positions thingie? Could you post a Win x64 executable of this? I browsed the threads and found only bigsimilat.kit which seems indeed big compared to similar.kit, but I don't know what to do with it. With 2,000 positions the standard deviation of the results seems like 1.5%, which is way too large. The whole span of little similar to very similar engines is about 7%, 3% error margins 95% confidence may distort heavily the results.
The second issue, some engines (quite a few) refuse to perform the test correctly, either halting or giving some outlandishly low results (like 12%).

Kai

Allard Siemelink · Post by **Allard Siemelink** » Thu Dec 30, 2010 12:46 am

El Gringo wrote: Spark doesn't like the similar utility. It hangs after 35%

I am not sure that it is Spark who is at fault. On my 32-bit machine, it looks like it is similar.exe that cannot keep up with Spark's admittedly verbose output.
The test does finish, but from about 40% into the test, Spark gets very few CPU cycles (system cpu usage <10%) and returns moves
from very shallow searches.
If I run the test multiple times, similar.exe does not even think that Spark is very similar to itself

:

Code: Select all

------ spark-1.0 (time: 100 ms) ------
 73.20  spark-1.0 (time: 101 ms)
 41.40  spark-1.0 (time: 99 ms)
 36.70  bright-0.4a (time: 100 ms)
 35.95  spark-dev (time: 50 ms)
 28.15  bright-0.5c (time: 100 ms)
 11.75  Strelka 1.8 UCI (time: 100 ms)
 11.35  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.20  Komodo32 1.0 JA (time: 101 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 10.95  Stockfish 1.6 JA (time: 100 ms)
 10.55  Toga II 1.4.1SE (time: 100 ms)

I had a closer look, it turns out that Spark is frequently blocked when it tries to communicate with similar.exe. Most probably, the buffer of the communication channel (pipe) from Spark to similar.exe is too small, forcing Spark to wait until similar.exe has processed some older data. When I disable UCI reporting of pv's and current moves, all goes well:
cpu usage is at 100% during the whole test and similare.exe now recognizes that single cpu spark-dev is at least very similar to itself:

Code: Select all

------ spark-dev (time: 100 ms) ------
 99.55  spark-dev (time: 101 ms)
 99.50  spark-dev (time: 99 ms)
 86.35  spark-dev (time: 50 ms)
 30.60  bright-0.5c (time: 100 ms)
 28.55  bright-0.4a (time: 100 ms)
 11.50  Strelka 1.8 UCI (time: 100 ms)
 11.25  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 11.00  Toga II 1.4.1SE (time: 100 ms)
 11.00  Komodo32 1.0 JA (time: 101 ms)
 10.80  Stockfish 1.6 JA (time: 100 ms)

Sure, Spark does not appear to be similar at all to other engines, but this of course is intended behaviour

Don, if you're reading this, perhaps you could try to increase the size of the communication (pipe) buffer to e.g. 32KB in the next release?
I believe this may yield more accurate results for verbose engines.
(and to further enhance accuracy, you might try running all engines single core, avoiding mp randomness)

Don · Post by **Don** » Thu Dec 30, 2010 3:12 am

Laskos wrote:
Don wrote:
Laskos wrote: Houdini 1.51 is a pure copy of Houdini 1.5, I put it to set the upper boundary of these pretty non-deterministic engines. My problem is mostly statistical, 10,000 or so positions are needed at least, and in no case the claim about cloning could be made more than circumstantially. More like a property of some engines.
I actually have a version with 10,000 positions. I recently removed almost 2000 of them due to the fact that EVERY engine played the same move (out of 18 engines I tested)

I agree with you about the circumstantial evidence. The only thing this test can show you is how much stylistic similarity there is between two programs.

In fact, I think the test tells you mostly about the evaluation function of the two programs, and hardly anything about any other part of the program. However it's my belief that this is the most important and difficult part of a chess program and what separates the men from the boys.
Don, how do I make 10,000 positions thingie? Could you post a Win x64 executable of this? I browsed the threads and found only bigsimilat.kit which seems indeed big compared to similar.kit, but I don't know what to do with it. With 2,000 positions the standard deviation of the results seems like 1.5%, which is way too large. The whole span of little similar to very similar engines is about 7%, 3% error margins 95% confidence may distort heavily the results.
The second issue, some engines (quite a few) refuse to perform the test correctly, either halting or giving some outlandishly low results (like 12%).

Kai

The program is at the same site but is called a.exe and has 10,000 position minus the ones that EVERY program played the same. So it will give substantially different scores.

Don · Post by **Don** » Thu Dec 30, 2010 3:18 am

Allard Siemelink wrote:
El Gringo wrote: Spark doesn't like the similar utility. It hangs after 35%

I am not sure that it is Spark who is at fault. On my 32-bit machine, it looks like it is similar.exe that cannot keep up with Spark's admittedly verbose output.
The test does finish, but from about 40% into the test, Spark gets very few CPU cycles (system cpu usage <10%) and returns moves
from very shallow searches.
If I run the test multiple times, similar.exe does not even think that Spark is very similar to itself :
Code: Select all
------ spark-1.0 (time: 100 ms) ------
 73.20  spark-1.0 (time: 101 ms)
 41.40  spark-1.0 (time: 99 ms)
 36.70  bright-0.4a (time: 100 ms)
 35.95  spark-dev (time: 50 ms)
 28.15  bright-0.5c (time: 100 ms)
 11.75  Strelka 1.8 UCI (time: 100 ms)
 11.35  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.20  Komodo32 1.0 JA (time: 101 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 10.95  Stockfish 1.6 JA (time: 100 ms)
 10.55  Toga II 1.4.1SE (time: 100 ms)
I had a closer look, it turns out that Spark is frequently blocked when it tries to communicate with similar.exe. Most probably, the buffer of the communication channel (pipe) from Spark to similar.exe is too small, forcing Spark to wait until similar.exe has processed some older data. When I disable UCI reporting of pv's and current moves, all goes well:
cpu usage is at 100% during the whole test and similare.exe now recognizes that single cpu spark-dev is at least very similar to itself:
Code: Select all
------ spark-dev (time: 100 ms) ------
 99.55  spark-dev (time: 101 ms)
 99.50  spark-dev (time: 99 ms)
 86.35  spark-dev (time: 50 ms)
 30.60  bright-0.5c (time: 100 ms)
 28.55  bright-0.4a (time: 100 ms)
 11.50  Strelka 1.8 UCI (time: 100 ms)
 11.25  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 11.00  Toga II 1.4.1SE (time: 100 ms)
 11.00  Komodo32 1.0 JA (time: 101 ms)
 10.80  Stockfish 1.6 JA (time: 100 ms)
Sure, Spark does not appear to be similar at all to other engines, but this of course is intended behaviour

Don, if you're reading this, perhaps you could try to increase the size of the communication (pipe) buffer to e.g. 32KB in the next release?
I believe this may yield more accurate results for verbose engines.
(and to further enhance accuracy, you might try running all engines single core, avoiding mp randomness)

The utility is actually written in tcl and there should be no such issues, but I will look into it. I downloaded spark 1.0 recently and got the same problem as reported here.

The utility actually send the go command and just waits for 100 ms before sending the stop command so I probably just need to read the output during that wait.

Don · Post by **Don** » Thu Dec 30, 2010 6:48 pm

A new version of the similarity tester is available on the komodo web site.

It has over 8000 positions instead of 2000 and the engines can be configured separately by means of a configuration file. This was a requested feature.

Two additional features:

You can select your own name for a given chess engine to be more descriptive if you wish, such as "komodo 1.3 bigHash"

You can select a SCALING factor for each program. This feature is provided to enable users to equalize ELO levels. For example a weaker program can be given a scale factor in order to equalize it's strength with a stronger program.

You can get it here: http://komodochess.com

Don · Post by **Don** » Thu Dec 30, 2010 6:50 pm

Did you get this to work?

Will Singleton wrote:Anyone get SD to work with wb2uci? My first try:

C:\chess\similar>similar -t wb2uci
program: amateur295x4 (time: 100 ms)

0.0 percent .error writing "file2537fb0": broken pipe
while executing
"puts $fh "stop""
("for" body line 15)
invoked from within
"for { set n $start } { $n < $e } { incr n } {

if { ($n % 50) == 0 } {
puts ""
set perc [expr ($cc * 100.0) / 2000.0]
puts -nonewline [format "..."
(file "C:/chess/similar/similar.exe/lib/app-clone/clone.tcl" line 141)
invoked from within
"source C:/chess/similar/similar.exe/lib/app-clone/clone.tcl"
("package ifneeded app-clone 1.0" script)
invoked from within
"package require app-clone"
(file "C:/chess/similar/similar.exe/main.tcl" line 4)

C:\chess\similar>

Don · Post by **Don** » Thu Dec 30, 2010 7:08 pm

Hi George,

I'm sorry I forget to answer your question, I'll try to answer now.

George Tsavdaris wrote: In what theory does this tool is based on?
I mean what kind of positions are these? I hope they are not best move positions right? Because then good programs would have similar selections.

The position are random positions from top level games (although I don't think that is important for this purpose) except that I have removed positions where EVERY chess program agree's upon the best move.

I would like the positions to have many different possible good moves to play. Let's say the best 5 moves to be playable. Semi-Opening positions are good for this.
If there are 5 moves per position to choose then we have 5^2000 different combinations the engines can choose from, so the randomness is very good and the statistics at the end are strong and can be a good indicator.

I want to improve the set by determining which positions return the most useful information, similar to what you are describing.

However, this must be done with care to be valid. I would focus ONLY on culling positions which strengthened the similarity scores of 2 different runs of the same programs. As you say, we want positions where a variety of reasonable moves are possible.

I did an experiment where I removed positions that the majority (but not all) programs changed their minds on from run to run and this greatly strengthened the similarity scores of the same program vs itself on a separate run. But I'm not sure I like that approach because it is very much dependent on which programs I am using and the specific time setting.

I also discovered that some programs are much more stable than other programs, they will tend to pick the SAME move from run to run much more often than other programs. This tool can measure that too (yet another use of the tool.)

Don

michiguel · Post by **michiguel** » Thu Dec 30, 2010 7:17 pm

Don wrote:Hi George,

I'm sorry I forget to answer your question, I'll try to answer now.

George Tsavdaris wrote: In what theory does this tool is based on?
I mean what kind of positions are these? I hope they are not best move positions right? Because then good programs would have similar selections.
The position are random positions from top level games (although I don't think that is important for this purpose) except that I have removed positions where EVERY chess program agree's upon the best move.

It is important to be random. I would not spend too much time in removing positions that are easy. The only thing they do is to consume time, but they do not hurt the test in any way.

I would like the positions to have many different possible good moves to play. Let's say the best 5 moves to be playable. Semi-Opening positions are good for this.
If there are 5 moves per position to choose then we have 5^2000 different combinations the engines can choose from, so the randomness is very good and the statistics at the end are strong and can be a good indicator.
I want to improve the set by determining which positions return the most useful information, similar to what you are describing.

However, this must be done with care to be valid. I would focus ONLY on culling positions which strengthened the similarity scores of 2 different runs of the same programs. As you say, we want positions where a variety of reasonable moves are possible.

I did an experiment where I removed positions that the majority (but not all) programs changed their minds on from run to run and this greatly strengthened the similarity scores of the same program vs itself on a separate run. But I'm not sure I like that approach because it is very much dependent on which programs I am using and the specific time setting.

I also discovered that some programs are much more stable than other programs, they will tend to pick the SAME move from run to run much more often than other programs. This tool can measure that too (yet another use of the tool.)

Don

A good feature for your tool will be to let the user choose a set of positions from an epd file.

Miguel

Laskos · Post by **Laskos** » Thu Dec 30, 2010 7:46 pm

Don wrote:
The program is at the same site but is called a.exe and has 10,000 position minus the ones that EVERY program played the same. So it will give substantially different scores.

Thanks Don, now it's much better, the standard deviation is about 0.5% and the span of "not similar" to "very similar" is more than 10% instead of 7%, separations are much clearer. I read that you put a config file there, that would be useful for me to go to 1 core, not only it flattens the strength, but the self-similarity is higher on 1 core (more deterministic outcome).

Code: Select all


C:\similar>a -r 1
------ Fruit 2.1 (time: 300 ms) ------
 56.30  Rybka 1.0 Beta 32-bit (time: 100 ms)
 56.14  Strelka 1.8 UCI (time: 100 ms)
 53.54  Glaurung 2-epsilon/5 (time: 100 ms)
 51.31  Rybka 3  (time: 100 ms)
 50.29  RobboLito 0.09 x64 (time: 100 ms)
 50.00  Deep Rybka _4_x64 (time: 100 ms)
 48.92  IvanHoe_B49jAx64 (time: 100 ms)
 47.79  Deep Shredder 12 x64 (time: 100 ms)
 46.94  Houdini 1.51x64 (time: 100 ms)
 46.66  Houdini 1.5 x64 (time: 100 ms)
 46.61  Ruffian 1.0.1 (time: 100 ms)
 44.00  Deep Shredder 9 UCI (time: 100 ms)
 43.15  Chess Tiger 2007  (time: 100 ms)


C:\similar>a -r 2
------ Houdini 1.5 x64 (time: 100 ms) ------
 67.81  Houdini 1.51x64 (time: 100 ms)
 62.31  IvanHoe_B49jAx64 (time: 100 ms)
 61.22  RobboLito 0.09 x64 (time: 100 ms)
 55.79  Rybka 3  (time: 100 ms)
 54.61  Deep Rybka _4_x64 (time: 100 ms)
 47.29  Strelka 1.8 UCI (time: 100 ms)
 47.28  Rybka 1.0 Beta 32-bit (time: 100 ms)
 46.99  Deep Shredder 12 x64 (time: 100 ms)
 46.66  Fruit 2.1 (time: 300 ms)
 43.81  Chess Tiger 2007  (time: 100 ms)
 42.60  Glaurung 2-epsilon/5 (time: 100 ms)
 42.27  Deep Shredder 9 UCI (time: 100 ms)
 39.39  Ruffian 1.0.1 (time: 100 ms)


C:\similar>a -r 13
------ Rybka 3  (time: 100 ms) ------
 60.83  RobboLito 0.09 x64 (time: 100 ms)
 59.19  Deep Rybka _4_x64 (time: 100 ms)
 58.99  IvanHoe_B49jAx64 (time: 100 ms)
 55.79  Houdini 1.5 x64 (time: 100 ms)
 55.68  Houdini 1.51x64 (time: 100 ms)
 52.54  Strelka 1.8 UCI (time: 100 ms)
 52.25  Rybka 1.0 Beta 32-bit (time: 100 ms)
 51.31  Fruit 2.1 (time: 300 ms)
 49.07  Deep Shredder 12 x64 (time: 100 ms)
 47.26  Glaurung 2-epsilon/5 (time: 100 ms)
 45.04  Chess Tiger 2007  (time: 100 ms)
 44.71  Deep Shredder 9 UCI (time: 100 ms)
 43.86  Ruffian 1.0.1 (time: 100 ms)


C:\similar>a -r 10
------ Rybka 1.0 Beta 32-bit (time: 100 ms) ------
 73.56  Strelka 1.8 UCI (time: 100 ms)
 56.30  Fruit 2.1 (time: 300 ms)
 52.25  Rybka 3  (time: 100 ms)
 51.72  Deep Rybka _4_x64 (time: 100 ms)
 51.08  RobboLito 0.09 x64 (time: 100 ms)
 49.99  Glaurung 2-epsilon/5 (time: 100 ms)
 49.44  IvanHoe_B49jAx64 (time: 100 ms)
 47.34  Houdini 1.51x64 (time: 100 ms)
 47.28  Houdini 1.5 x64 (time: 100 ms)
 46.01  Deep Shredder 12 x64 (time: 100 ms)
 44.46  Ruffian 1.0.1 (time: 100 ms)
 42.09  Deep Shredder 9 UCI (time: 100 ms)
 41.94  Chess Tiger 2007  (time: 100 ms)

Now the numbers are a little different, one has to get used to the new matrix. I put in many original programs of the Rybka epoch, Ruffian is also interesting, as it's original in the sense that it's probably a UCI original Fritz 6

, of course some Ippos, Houdini and several Rybkas.

I have to get used to these numbers, but not putting many latter fruitish engines, Rybka 1.0 beta seems one of the most fruitish. Ippos the most Rybkish 3. Strelka 1.8 is probably extremely similar to Rybka 1. Houdini 1.5 seems similar to Ippos.

Thanks,
Kai

perejaslav · Post by **perejaslav** » Thu Dec 30, 2010 7:52 pm

Code: Select all

Microsoft Windows [Version 6.1.7600]


D:\Test\sim02>sim02w64.exe


usage:
   similar
   similar -test INVOKE {time_in_ms}
   similar -report N
   similar -config FILE


D:\Test\sim02>sim02w64.exe -t Rybka4x64.exe 200


D:\Test\sim02>

What am I doing wrong, Don. Nothing happens after I inputing those commands

Similarity Detector Available

Re: clone tester available

Re: Similarity Detector Available

Re: clone tester available

Re: Similarity Detector Available

Re: Similarity Detector Available

Re: Similarity Detector & wb2uci

Re: clone tester available

Re: clone tester available

Re: clone tester available

Re: Similarity Detector Available