IPON results for Houdini 4

mar · Post by **mar** » Wed Dec 04, 2013 1:19 pm

Laskos wrote:Houdini 4 diverged quite a bit from Robbollito, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it. The only obstacles Houdini has are Strelka and Critter, but the pre-eminence is clear looking at their release dates. Can you guys stop being so arduous justiciaries? Already Rybka 4 was banned for a sin with Rybka 1, now you will cry "fault" on Houdini 7 based on Houdini 1.00?

I'm glad we agree that Houdini "evolved" from clone to derivative. The only one who cries is actually Houdart, what's his problem with fellow copycats?

At least they didn't lie about the origin of their clones.

pohl4711 · Post by **pohl4711** » Wed Dec 04, 2013 2:01 pm

Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.

The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%. I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!
And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan

Laskos · Post by **Laskos** » Wed Dec 04, 2013 3:13 pm

pohl4711 wrote:
Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.
The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%.

Yes, self-similarity is often at this level, which is a very high level, much above the threshold of 60%. so a complete copy wouldn't make it.

I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!

And you probably weakened the engine by 200-500 ELO points. No wonder you will get a low similarity, and that trick cannot be used to present a clone of similar strength as the original engine. Similarity test works as first hand indication of a clone, in need of further investigation.

And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan

Uri Blass · Post by **Uri Blass** » Wed Dec 04, 2013 3:27 pm

pohl4711 wrote:
Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.
The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%. I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!
And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan

The sim tester is not supposed to find clones but to find similiarities.

My opinion is that the 60% rule is a good rule to decide if to accept a new engine to the competition when the target is to have different playing styles and not to avoid derivatives that I consider to be a bad rule because people often disagree if a program is a derivative(see the rybka case) and I think that rules should be clear.

I suggest the following steps for a competition:
Step 1:Allow every programmer to decide about the engines that he wants to send to the competition and include also all the public versions of chess programs(in case of stockfish it mean including all the public developement versions).

Order the engines based on date of release(the date of release is considered to be the date that people could download the program in the first time or the date that the programmer sent the program to the competition in case of private engines)
accept the first engine (that is the oldest engine based on release date)

Repeat step 2 again and again until you repeat it for all the engines

Step 2:Take the next released engine(based on release date) and accept it if it has no 60% similiarity to a previous accepted engine and reject it if it has 60% similiarity to 2 accepted engines.

If it has 60% similiarity to only one previous accepted engine then there are 2 possibilities:

1)The previous accepted engine is by the same author and in this case the author decide which version to prefer and only one of them is accepted.
2)The previous accepted engine is by a different author and in this case it is rejected.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Wed Dec 04, 2013 4:31 pm

pohl4711 wrote:
Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.
The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%. I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!
And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan

The most accurate method is to put the source code of the 2 programs on the table under a big loop.Period.

So basicly,I do agrre with you my friend regards,
Dr.D

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Wed Dec 04, 2013 4:34 pm

IGarcia wrote:
Houdini wrote:
pohl4711 wrote:Perhaps take a look at the LS-ratinglist, where each engine plays only against 10 real strong opponents and the score of Houdini 4 was "only" 67%...And each individual match contains 1000 games and can be seen in the crosstable of the LS-top10-tournament.

Stefan

http://ls-ratinglist.beepworld.de
The LS-Top10 rating list has a different issue: there is too little filtering of participants. The "Top 10" list includes Strelka 5 (=Houdini 1.5), Amitis (=Stockfish) and Bouquet/Pan Chess/Mars/Robbolito (=Ivanhoe).

You have "solved" the issue of average strength of opponents by letting the same programs play multiple times.
Besides you are correct pointing the mistake of allowing very "similar" programs its funny the way you do.

The fact you are selling H2, H3, H4 and nobody is suing you, does not make your engine original. It has been marked as controversial (along with Rybka and other engines) in CCRL and by community in general.

Its funny to see your writings against "clones", where several people thinks H4 is an abomination as same you called robbodini. Probably the strongest abomination today.

Regards.

Houdini is an abominiation

No way

Modern Times · Post by **Modern Times** » Wed Dec 04, 2013 7:41 pm

IGarcia wrote: The fact you are selling H2, H3, H4 and nobody is suing you, does not make your engine original. It has been marked as controversial (along with Rybka and other engines) in CCRL and by community in general.

CCRL got rid of the controversial designation a long time ago. It is too subjective, and it isn't our job to be making judgements like that. I agree with what Stefan has to say about this issue.

Modern Times · Post by **Modern Times** » Wed Dec 04, 2013 7:43 pm

lkaufman wrote:If you believed that time limit doesn't matter (other than the spread of the ratings), why would CEGT and CCRL bother to test at short and long time controls?

In CCRL it is simply because some testers prefer blitz, and some prefer long time control. Nothing more than that.

pohl4711 · Post by **pohl4711** » Thu Dec 05, 2013 9:30 am

Laskos wrote: And you probably weakened the engine by 200-500 ELO points. No wonder you will get a low similarity...

Definitly not! It is a aggressive-playing setting, which I used some time ago on playchess. It is perhaps 30 Elo weaker than Stockfish default, definitly not 200-500...

Stefan

Laskos · Post by **Laskos** » Thu Dec 05, 2013 10:05 am

pohl4711 wrote:
Laskos wrote: And you probably weakened the engine by 200-500 ELO points. No wonder you will get a low similarity...
Definitly not! It is a aggressive-playing setting, which I used some time ago on playchess. It is perhaps 30 Elo weaker than Stockfish default, definitly not 200-500...

Stefan

Maybe you can post here the settings, a self-play match (not necessarily very long, just to have the order of magnitude of weakening), and the results from Sim.

IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4