IPON results for Houdini 4

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

mar
Posts: 2559
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: IPON results for Houdini 4

Post by mar »

Laskos wrote:Houdini 4 diverged quite a bit from Robbollito, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it. The only obstacles Houdini has are Strelka and Critter, but the pre-eminence is clear looking at their release dates. Can you guys stop being so arduous justiciaries? Already Rybka 4 was banned for a sin with Rybka 1, now you will cry "fault" on Houdini 7 based on Houdini 1.00?
I'm glad we agree that Houdini "evolved" from clone to derivative. The only one who cries is actually Houdart, what's his problem with fellow copycats? :roll: At least they didn't lie about the origin of their clones.
User avatar
pohl4711
Posts: 2438
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: IPON results for Houdini 4

Post by pohl4711 »

Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.
The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%. I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!
And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: IPON results for Houdini 4

Post by Laskos »

pohl4711 wrote:
Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.
The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%.
Yes, self-similarity is often at this level, which is a very high level, much above the threshold of 60%. so a complete copy wouldn't make it.

I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!
And you probably weakened the engine by 200-500 ELO points. No wonder you will get a low similarity, and that trick cannot be used to present a clone of similar strength as the original engine. Similarity test works as first hand indication of a clone, in need of further investigation.


And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan
Uri Blass
Posts: 10296
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: IPON results for Houdini 4

Post by Uri Blass »

pohl4711 wrote:
Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.
The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%. I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!
And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan
The sim tester is not supposed to find clones but to find similiarities.

My opinion is that the 60% rule is a good rule to decide if to accept a new engine to the competition when the target is to have different playing styles and not to avoid derivatives that I consider to be a bad rule because people often disagree if a program is a derivative(see the rybka case) and I think that rules should be clear.

I suggest the following steps for a competition:
Step 1:Allow every programmer to decide about the engines that he wants to send to the competition and include also all the public versions of chess programs(in case of stockfish it mean including all the public developement versions).

Order the engines based on date of release(the date of release is considered to be the date that people could download the program in the first time or the date that the programmer sent the program to the competition in case of private engines)
accept the first engine (that is the oldest engine based on release date)

Repeat step 2 again and again until you repeat it for all the engines

Step 2:Take the next released engine(based on release date) and accept it if it has no 60% similiarity to a previous accepted engine and reject it if it has 60% similiarity to 2 accepted engines.

If it has 60% similiarity to only one previous accepted engine then there are 2 possibilities:

1)The previous accepted engine is by the same author and in this case the author decide which version to prefer and only one of them is accepted.
2)The previous accepted engine is by a different author and in this case it is rejected.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: IPON results for Houdini 4

Post by Dr.Wael Deeb »

pohl4711 wrote:
Laskos wrote:, if ICGA will ever set the 60% rule on Sim tester, Houdini will pass it.
The problem is, that the Sim-tester is not able to find clones. I tried 2 runs with one engine (2 files, one renamed) and the similarity was only 73%. I tried Stockfish with a aggressive parameter-setting ( so the code fas nearly 100% identical (except the settings of aggressiveness, cowardice and mobiltiy)) and the Sim-tester said: 53% similarity to default-Stockfish.
So forget this tool. It is an interesting idea, but it doesnt work!
And because of this problem (nobody can say,how much similarity the code of an engine has), I decided to test everything, which is not a 100% clone (like Samsung is 100% Robodini). For serious testing, I see no other way.

Stefan
The most accurate method is to put the source code of the 2 programs on the table under a big loop.Period.

So basicly,I do agrre with you my friend regards,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: IPON results for Houdini 4

Post by Dr.Wael Deeb »

IGarcia wrote:
Houdini wrote:
pohl4711 wrote:Perhaps take a look at the LS-ratinglist, where each engine plays only against 10 real strong opponents and the score of Houdini 4 was "only" 67%...And each individual match contains 1000 games and can be seen in the crosstable of the LS-top10-tournament.

Stefan

http://ls-ratinglist.beepworld.de
The LS-Top10 rating list has a different issue: there is too little filtering of participants. The "Top 10" list includes Strelka 5 (=Houdini 1.5), Amitis (=Stockfish) and Bouquet/Pan Chess/Mars/Robbolito (=Ivanhoe).

You have "solved" the issue of average strength of opponents by letting the same programs play multiple times.
Besides you are correct pointing the mistake of allowing very "similar" programs its funny the way you do.

The fact you are selling H2, H3, H4 and nobody is suing you, does not make your engine original. It has been marked as controversial (along with Rybka and other engines) in CCRL and by community in general.

Its funny to see your writings against "clones", where several people thinks H4 is an abomination as same you called robbodini. Probably the strongest abomination today.

Regards.
Houdini is an abominiation :!: :?:

No way :shock:
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
Modern Times
Posts: 3550
Joined: Thu Jun 07, 2012 11:02 pm

Re: IPON results for Houdini 4

Post by Modern Times »

IGarcia wrote: The fact you are selling H2, H3, H4 and nobody is suing you, does not make your engine original. It has been marked as controversial (along with Rybka and other engines) in CCRL and by community in general.
CCRL got rid of the controversial designation a long time ago. It is too subjective, and it isn't our job to be making judgements like that. I agree with what Stefan has to say about this issue.
Modern Times
Posts: 3550
Joined: Thu Jun 07, 2012 11:02 pm

Re: IPON results for Houdini 4

Post by Modern Times »

lkaufman wrote:If you believed that time limit doesn't matter (other than the spread of the ratings), why would CEGT and CCRL bother to test at short and long time controls?
In CCRL it is simply because some testers prefer blitz, and some prefer long time control. Nothing more than that.
User avatar
pohl4711
Posts: 2438
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: IPON results for Houdini 4

Post by pohl4711 »

Laskos wrote: And you probably weakened the engine by 200-500 ELO points. No wonder you will get a low similarity...
Definitly not! It is a aggressive-playing setting, which I used some time ago on playchess. It is perhaps 30 Elo weaker than Stockfish default, definitly not 200-500...

Stefan
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: IPON results for Houdini 4

Post by Laskos »

pohl4711 wrote:
Laskos wrote: And you probably weakened the engine by 200-500 ELO points. No wonder you will get a low similarity...
Definitly not! It is a aggressive-playing setting, which I used some time ago on playchess. It is perhaps 30 Elo weaker than Stockfish default, definitly not 200-500...

Stefan
Maybe you can post here the settings, a self-play match (not necessarily very long, just to have the order of magnitude of weakening), and the results from Sim.