Similarity Detector Available

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: clone tester available

Post by Laskos »

Don wrote:
Laskos wrote: Houdini 1.51 is a pure copy of Houdini 1.5, I put it to set the upper boundary of these pretty non-deterministic engines. My problem is mostly statistical, 10,000 or so positions are needed at least, and in no case the claim about cloning could be made more than circumstantially. More like a property of some engines.
I actually have a version with 10,000 positions. I recently removed almost 2000 of them due to the fact that EVERY engine played the same move (out of 18 engines I tested)

I agree with you about the circumstantial evidence. The only thing this test can show you is how much stylistic similarity there is between two programs.

In fact, I think the test tells you mostly about the evaluation function of the two programs, and hardly anything about any other part of the program. However it's my belief that this is the most important and difficult part of a chess program and what separates the men from the boys.
Don, how do I make 10,000 positions thingie? Could you post a Win x64 executable of this? I browsed the threads and found only bigsimilat.kit which seems indeed big compared to similar.kit, but I don't know what to do with it. With 2,000 positions the standard deviation of the results seems like 1.5%, which is way too large. The whole span of little similar to very similar engines is about 7%, 3% error margins 95% confidence may distort heavily the results.
The second issue, some engines (quite a few) refuse to perform the test correctly, either halting or giving some outlandishly low results (like 12%).

Kai
Allard Siemelink
Posts: 297
Joined: Fri Jun 30, 2006 9:30 pm
Location: Netherlands

Re: Similarity Detector Available

Post by Allard Siemelink »

El Gringo wrote: Spark doesn't like the similar utility. It hangs after 35%
I am not sure that it is Spark who is at fault. On my 32-bit machine, it looks like it is similar.exe that cannot keep up with Spark's admittedly verbose output.
The test does finish, but from about 40% into the test, Spark gets very few CPU cycles (system cpu usage <10%) and returns moves
from very shallow searches.
If I run the test multiple times, similar.exe does not even think that Spark is very similar to itself :shock: :

Code: Select all

------ spark-1.0 (time: 100 ms) ------
 73.20  spark-1.0 (time: 101 ms)
 41.40  spark-1.0 (time: 99 ms)
 36.70  bright-0.4a (time: 100 ms)
 35.95  spark-dev (time: 50 ms)
 28.15  bright-0.5c (time: 100 ms)
 11.75  Strelka 1.8 UCI (time: 100 ms)
 11.35  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.20  Komodo32 1.0 JA (time: 101 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 10.95  Stockfish 1.6 JA (time: 100 ms)
 10.55  Toga II 1.4.1SE (time: 100 ms)
I had a closer look, it turns out that Spark is frequently blocked when it tries to communicate with similar.exe. Most probably, the buffer of the communication channel (pipe) from Spark to similar.exe is too small, forcing Spark to wait until similar.exe has processed some older data. When I disable UCI reporting of pv's and current moves, all goes well:
cpu usage is at 100% during the whole test and similare.exe now recognizes that single cpu spark-dev is at least very similar to itself:

Code: Select all

------ spark-dev (time: 100 ms) ------
 99.55  spark-dev (time: 101 ms)
 99.50  spark-dev (time: 99 ms)
 86.35  spark-dev (time: 50 ms)
 30.60  bright-0.5c (time: 100 ms)
 28.55  bright-0.4a (time: 100 ms)
 11.50  Strelka 1.8 UCI (time: 100 ms)
 11.25  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 11.00  Toga II 1.4.1SE (time: 100 ms)
 11.00  Komodo32 1.0 JA (time: 101 ms)
 10.80  Stockfish 1.6 JA (time: 100 ms)
Sure, Spark does not appear to be similar at all to other engines, but this of course is intended behaviour :)

Don, if you're reading this, perhaps you could try to increase the size of the communication (pipe) buffer to e.g. 32KB in the next release?
I believe this may yield more accurate results for verbose engines.
(and to further enhance accuracy, you might try running all engines single core, avoiding mp randomness)
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: clone tester available

Post by Don »

Laskos wrote:
Don wrote:
Laskos wrote: Houdini 1.51 is a pure copy of Houdini 1.5, I put it to set the upper boundary of these pretty non-deterministic engines. My problem is mostly statistical, 10,000 or so positions are needed at least, and in no case the claim about cloning could be made more than circumstantially. More like a property of some engines.
I actually have a version with 10,000 positions. I recently removed almost 2000 of them due to the fact that EVERY engine played the same move (out of 18 engines I tested)

I agree with you about the circumstantial evidence. The only thing this test can show you is how much stylistic similarity there is between two programs.

In fact, I think the test tells you mostly about the evaluation function of the two programs, and hardly anything about any other part of the program. However it's my belief that this is the most important and difficult part of a chess program and what separates the men from the boys.
Don, how do I make 10,000 positions thingie? Could you post a Win x64 executable of this? I browsed the threads and found only bigsimilat.kit which seems indeed big compared to similar.kit, but I don't know what to do with it. With 2,000 positions the standard deviation of the results seems like 1.5%, which is way too large. The whole span of little similar to very similar engines is about 7%, 3% error margins 95% confidence may distort heavily the results.
The second issue, some engines (quite a few) refuse to perform the test correctly, either halting or giving some outlandishly low results (like 12%).

Kai
The program is at the same site but is called a.exe and has 10,000 position minus the ones that EVERY program played the same. So it will give substantially different scores.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity Detector Available

Post by Don »

Allard Siemelink wrote:
El Gringo wrote: Spark doesn't like the similar utility. It hangs after 35%
I am not sure that it is Spark who is at fault. On my 32-bit machine, it looks like it is similar.exe that cannot keep up with Spark's admittedly verbose output.
The test does finish, but from about 40% into the test, Spark gets very few CPU cycles (system cpu usage <10%) and returns moves
from very shallow searches.
If I run the test multiple times, similar.exe does not even think that Spark is very similar to itself :shock: :

Code: Select all

------ spark-1.0 (time: 100 ms) ------
 73.20  spark-1.0 (time: 101 ms)
 41.40  spark-1.0 (time: 99 ms)
 36.70  bright-0.4a (time: 100 ms)
 35.95  spark-dev (time: 50 ms)
 28.15  bright-0.5c (time: 100 ms)
 11.75  Strelka 1.8 UCI (time: 100 ms)
 11.35  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.20  Komodo32 1.0 JA (time: 101 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 10.95  Stockfish 1.6 JA (time: 100 ms)
 10.55  Toga II 1.4.1SE (time: 100 ms)
I had a closer look, it turns out that Spark is frequently blocked when it tries to communicate with similar.exe. Most probably, the buffer of the communication channel (pipe) from Spark to similar.exe is too small, forcing Spark to wait until similar.exe has processed some older data. When I disable UCI reporting of pv's and current moves, all goes well:
cpu usage is at 100% during the whole test and similare.exe now recognizes that single cpu spark-dev is at least very similar to itself:

Code: Select all

------ spark-dev (time: 100 ms) ------
 99.55  spark-dev (time: 101 ms)
 99.50  spark-dev (time: 99 ms)
 86.35  spark-dev (time: 50 ms)
 30.60  bright-0.5c (time: 100 ms)
 28.55  bright-0.4a (time: 100 ms)
 11.50  Strelka 1.8 UCI (time: 100 ms)
 11.25  Rybka 1.0 Beta 32-bit (time: 100 ms)
 11.05  Komodo32 1.0 JA (time: 100 ms)
 11.00  Toga II 1.4.1SE (time: 100 ms)
 11.00  Komodo32 1.0 JA (time: 101 ms)
 10.80  Stockfish 1.6 JA (time: 100 ms)
Sure, Spark does not appear to be similar at all to other engines, but this of course is intended behaviour :)

Don, if you're reading this, perhaps you could try to increase the size of the communication (pipe) buffer to e.g. 32KB in the next release?
I believe this may yield more accurate results for verbose engines.
(and to further enhance accuracy, you might try running all engines single core, avoiding mp randomness)
The utility is actually written in tcl and there should be no such issues, but I will look into it. I downloaded spark 1.0 recently and got the same problem as reported here.

The utility actually send the go command and just waits for 100 ms before sending the stop command so I probably just need to read the output during that wait.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity Detector Available

Post by Don »

A new version of the similarity tester is available on the komodo web site.

It has over 8000 positions instead of 2000 and the engines can be configured separately by means of a configuration file. This was a requested feature.

Two additional features:

You can select your own name for a given chess engine to be more descriptive if you wish, such as "komodo 1.3 bigHash"

You can select a SCALING factor for each program. This feature is provided to enable users to equalize ELO levels. For example a weaker program can be given a scale factor in order to equalize it's strength with a stronger program.

You can get it here: http://komodochess.com
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Similarity Detector & wb2uci

Post by Don »

Did you get this to work?
Will Singleton wrote:Anyone get SD to work with wb2uci? My first try:

C:\chess\similar>similar -t wb2uci
program: amateur295x4 (time: 100 ms)

0.0 percent .error writing "file2537fb0": broken pipe
while executing
"puts $fh "stop""
("for" body line 15)
invoked from within
"for { set n $start } { $n < $e } { incr n } {

if { ($n % 50) == 0 } {
puts ""
set perc [expr ($cc * 100.0) / 2000.0]
puts -nonewline [format "..."
(file "C:/chess/similar/similar.exe/lib/app-clone/clone.tcl" line 141)
invoked from within
"source C:/chess/similar/similar.exe/lib/app-clone/clone.tcl"
("package ifneeded app-clone 1.0" script)
invoked from within
"package require app-clone"
(file "C:/chess/similar/similar.exe/main.tcl" line 4)

C:\chess\similar>
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: clone tester available

Post by Don »

Hi George,

I'm sorry I forget to answer your question, I'll try to answer now.
George Tsavdaris wrote: In what theory does this tool is based on?
I mean what kind of positions are these? I hope they are not best move positions right? Because then good programs would have similar selections.
The position are random positions from top level games (although I don't think that is important for this purpose) except that I have removed positions where EVERY chess program agree's upon the best move.

I would like the positions to have many different possible good moves to play. Let's say the best 5 moves to be playable. Semi-Opening positions are good for this.
If there are 5 moves per position to choose then we have 5^2000 different combinations the engines can choose from, so the randomness is very good and the statistics at the end are strong and can be a good indicator.
I want to improve the set by determining which positions return the most useful information, similar to what you are describing.

However, this must be done with care to be valid. I would focus ONLY on culling positions which strengthened the similarity scores of 2 different runs of the same programs. As you say, we want positions where a variety of reasonable moves are possible.

I did an experiment where I removed positions that the majority (but not all) programs changed their minds on from run to run and this greatly strengthened the similarity scores of the same program vs itself on a separate run. But I'm not sure I like that approach because it is very much dependent on which programs I am using and the specific time setting.

I also discovered that some programs are much more stable than other programs, they will tend to pick the SAME move from run to run much more often than other programs. This tool can measure that too (yet another use of the tool.)

Don
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: clone tester available

Post by michiguel »

Don wrote:Hi George,

I'm sorry I forget to answer your question, I'll try to answer now.
George Tsavdaris wrote: In what theory does this tool is based on?
I mean what kind of positions are these? I hope they are not best move positions right? Because then good programs would have similar selections.
The position are random positions from top level games (although I don't think that is important for this purpose) except that I have removed positions where EVERY chess program agree's upon the best move.
It is important to be random. I would not spend too much time in removing positions that are easy. The only thing they do is to consume time, but they do not hurt the test in any way.

I would like the positions to have many different possible good moves to play. Let's say the best 5 moves to be playable. Semi-Opening positions are good for this.
If there are 5 moves per position to choose then we have 5^2000 different combinations the engines can choose from, so the randomness is very good and the statistics at the end are strong and can be a good indicator.
I want to improve the set by determining which positions return the most useful information, similar to what you are describing.

However, this must be done with care to be valid. I would focus ONLY on culling positions which strengthened the similarity scores of 2 different runs of the same programs. As you say, we want positions where a variety of reasonable moves are possible.

I did an experiment where I removed positions that the majority (but not all) programs changed their minds on from run to run and this greatly strengthened the similarity scores of the same program vs itself on a separate run. But I'm not sure I like that approach because it is very much dependent on which programs I am using and the specific time setting.

I also discovered that some programs are much more stable than other programs, they will tend to pick the SAME move from run to run much more often than other programs. This tool can measure that too (yet another use of the tool.)

Don
A good feature for your tool will be to let the user choose a set of positions from an epd file.

Miguel
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: clone tester available

Post by Laskos »

Don wrote:
The program is at the same site but is called a.exe and has 10,000 position minus the ones that EVERY program played the same. So it will give substantially different scores.
Thanks Don, now it's much better, the standard deviation is about 0.5% and the span of "not similar" to "very similar" is more than 10% instead of 7%, separations are much clearer. I read that you put a config file there, that would be useful for me to go to 1 core, not only it flattens the strength, but the self-similarity is higher on 1 core (more deterministic outcome).

Code: Select all


C:\similar>a -r 1
------ Fruit 2.1 (time: 300 ms) ------
 56.30  Rybka 1.0 Beta 32-bit (time: 100 ms)
 56.14  Strelka 1.8 UCI (time: 100 ms)
 53.54  Glaurung 2-epsilon/5 (time: 100 ms)
 51.31  Rybka 3  (time: 100 ms)
 50.29  RobboLito 0.09 x64 (time: 100 ms)
 50.00  Deep Rybka _4_x64 (time: 100 ms)
 48.92  IvanHoe_B49jAx64 (time: 100 ms)
 47.79  Deep Shredder 12 x64 (time: 100 ms)
 46.94  Houdini 1.51x64 (time: 100 ms)
 46.66  Houdini 1.5 x64 (time: 100 ms)
 46.61  Ruffian 1.0.1 (time: 100 ms)
 44.00  Deep Shredder 9 UCI (time: 100 ms)
 43.15  Chess Tiger 2007  (time: 100 ms)


C:\similar>a -r 2
------ Houdini 1.5 x64 (time: 100 ms) ------
 67.81  Houdini 1.51x64 (time: 100 ms)
 62.31  IvanHoe_B49jAx64 (time: 100 ms)
 61.22  RobboLito 0.09 x64 (time: 100 ms)
 55.79  Rybka 3  (time: 100 ms)
 54.61  Deep Rybka _4_x64 (time: 100 ms)
 47.29  Strelka 1.8 UCI (time: 100 ms)
 47.28  Rybka 1.0 Beta 32-bit (time: 100 ms)
 46.99  Deep Shredder 12 x64 (time: 100 ms)
 46.66  Fruit 2.1 (time: 300 ms)
 43.81  Chess Tiger 2007  (time: 100 ms)
 42.60  Glaurung 2-epsilon/5 (time: 100 ms)
 42.27  Deep Shredder 9 UCI (time: 100 ms)
 39.39  Ruffian 1.0.1 (time: 100 ms)


C:\similar>a -r 13
------ Rybka 3  (time: 100 ms) ------
 60.83  RobboLito 0.09 x64 (time: 100 ms)
 59.19  Deep Rybka _4_x64 (time: 100 ms)
 58.99  IvanHoe_B49jAx64 (time: 100 ms)
 55.79  Houdini 1.5 x64 (time: 100 ms)
 55.68  Houdini 1.51x64 (time: 100 ms)
 52.54  Strelka 1.8 UCI (time: 100 ms)
 52.25  Rybka 1.0 Beta 32-bit (time: 100 ms)
 51.31  Fruit 2.1 (time: 300 ms)
 49.07  Deep Shredder 12 x64 (time: 100 ms)
 47.26  Glaurung 2-epsilon/5 (time: 100 ms)
 45.04  Chess Tiger 2007  (time: 100 ms)
 44.71  Deep Shredder 9 UCI (time: 100 ms)
 43.86  Ruffian 1.0.1 (time: 100 ms)


C:\similar>a -r 10
------ Rybka 1.0 Beta 32-bit (time: 100 ms) ------
 73.56  Strelka 1.8 UCI (time: 100 ms)
 56.30  Fruit 2.1 (time: 300 ms)
 52.25  Rybka 3  (time: 100 ms)
 51.72  Deep Rybka _4_x64 (time: 100 ms)
 51.08  RobboLito 0.09 x64 (time: 100 ms)
 49.99  Glaurung 2-epsilon/5 (time: 100 ms)
 49.44  IvanHoe_B49jAx64 (time: 100 ms)
 47.34  Houdini 1.51x64 (time: 100 ms)
 47.28  Houdini 1.5 x64 (time: 100 ms)
 46.01  Deep Shredder 12 x64 (time: 100 ms)
 44.46  Ruffian 1.0.1 (time: 100 ms)
 42.09  Deep Shredder 9 UCI (time: 100 ms)
 41.94  Chess Tiger 2007  (time: 100 ms)
Now the numbers are a little different, one has to get used to the new matrix. I put in many original programs of the Rybka epoch, Ruffian is also interesting, as it's original in the sense that it's probably a UCI original Fritz 6 :), of course some Ippos, Houdini and several Rybkas.

I have to get used to these numbers, but not putting many latter fruitish engines, Rybka 1.0 beta seems one of the most fruitish. Ippos the most Rybkish 3. Strelka 1.8 is probably extremely similar to Rybka 1. Houdini 1.5 seems similar to Ippos.

Thanks,
Kai
Last edited by Laskos on Thu Dec 30, 2010 7:58 pm, edited 1 time in total.
perejaslav
Posts: 240
Joined: Sat Mar 18, 2006 4:01 am
Location: Cold

Re: Similarity Detector Available

Post by perejaslav »

Code: Select all

Microsoft Windows [Version 6.1.7600]


D:\Test\sim02>sim02w64.exe


usage:
   similar
   similar -test INVOKE {time_in_ms}
   similar -report N
   similar -config FILE


D:\Test\sim02>sim02w64.exe -t Rybka4x64.exe 200


D:\Test\sim02>
What am I doing wrong, Don. Nothing happens after I inputing those commands :evil: