When will we see HOUDINI in official tournaments?

Uri Blass · Post by **Uri Blass** » Sat May 12, 2012 8:14 pm

LudiBuda wrote:What are you talking about? Did you read my post at all?

What I am suggesting is to take Ivanhoe, run the test games against 10 opponents, then put random weight between lets say 0 and 20 for each eval term and run the test again.

Your experiment is bogus. Try doing the same test for the search. Have a brute force search with the state of the art eval and see what you get.

I do not think that it is the same test.

having random weight to evaluation terms still have the relevant evaluation terms in the program.

If your point is that changes in the search helped chess programs in the last years more than changes in the evaluation then I agree about this opinion but I think that the test that you suggest does not prove it.

Note that I disagree with the following claim that you made:
"Evaluation of the engine is of almost no importance for the ELO strength."

I even disagree that changes in the evaluation are of almost no importance for the ELO strength.

My guess is something like the following:
"100 of the last 300 elo that the program gained is thanks to changes in the evaluation and if Komodo keep the same evaluation that it had when it was 300 elo weaker than it could be 200 elo stronger instead of 300 elo stronger"

Of course these are not exact numbers and it is only the idea.

michiguel · Post by **michiguel** » Sat May 12, 2012 8:33 pm

Uri Blass wrote:
Don wrote:
Uri Blass wrote:I believe that random changes hurt komodo's playing strength and the same for other top programs but the question is how much and if people can start with something that is 50 elo weaker or 100 elo weaker than Ivanhoe only by some random modifications of the evaluation of IvanHoe and escape the similiarity test.
If they have to give up 50 - 100 ELO it is a disincentive not to cheat but a cheater by his very nature doesn't want to give up any ELO. I should qualify that. Some cheat just because they cannot program and will be happy with a pretty weak program, but the more typical case I call the "Rosie Ruiz" style cheater.

But I don't think this weakening can easily fool the test because Doch through Komodo had well over 50 ELO due just to evaluation improvements which did not fool the test. This was a lot of weight changes and added terms.

It's surprising to me how resilient that test is to trickery - but I don't think we yet understand it's limits. It seems likely to me that it will have some weaknesses.
I think that some of the tricks to fool the test may be to change the move generator.

No chance that this could have an influence. The results are the same if the positions are re-shuffled and taken randomly, if different positions are used, or even if random positions are used from games. With all that, the influence of a move generator is completely discarded.

Miguel

Imagine that Be2 and Bc2 have exactly the same score at every depth and are the 2 best move.

The choice if to play Be2 or Bc2 may be dependent on the move generator.

If the move generator generates first Be2 then Be2 may become the pv and if the move generator generates first Bc2 then Bc2 may become the pv.

programmers who do not like to change the move generator may change the program to change its mind at depth 1 if they have at least the same score and not if they have better score when the search at bigger depth is the same.

I wonder if you tested changes in the move generator to see if they help komodo or not.

Dan Honeycutt · Post by **Dan Honeycutt** » Sat May 12, 2012 8:53 pm

michiguel wrote:
Uri Blass wrote:I think that some of the tricks to fool the test may be to change the move generator.

No chance that this could have an influence. The results are the same if the positions are re-shuffled and taken randomly, if different positions are used, or even if random positions are used from games. With all that, the influence of a move generator is completely discarded.

Miguel

Changes to the move generator can result in a different move selection at times though I would think the effect on similarity of this would be very tiny.

Best
Dan H.

Uri Blass · Post by **Uri Blass** » Sun May 13, 2012 12:18 am

Dan Honeycutt wrote:
michiguel wrote:
Uri Blass wrote:I think that some of the tricks to fool the test may be to change the move generator.

No chance that this could have an influence. The results are the same if the positions are re-shuffled and taken randomly, if different positions are used, or even if random positions are used from games. With all that, the influence of a move generator is completely discarded.

Miguel
Changes to the move generator can result in a different move selection at times though I would think the effect on similarity of this would be very tiny.

Best
Dan H.

I think that it is dependent on the evaluation.

For example with only material evaluation I expect change to the move generator to be big.

Even without using only material evaluation
if the units of evaluations are bigger then the change is going to be bigger.

Dan Honeycutt · Post by **Dan Honeycutt** » Sun May 13, 2012 2:39 am

Uri Blass wrote:
Dan Honeycutt wrote:
michiguel wrote:
Uri Blass wrote:I think that some of the tricks to fool the test may be to change the move generator.

No chance that this could have an influence. The results are the same if the positions are re-shuffled and taken randomly, if different positions are used, or even if random positions are used from games. With all that, the influence of a move generator is completely discarded.

Miguel
Changes to the move generator can result in a different move selection at times though I would think the effect on similarity of this would be very tiny.

Best
Dan H.
I think that it is dependent on the evaluation.

For example with only material evaluation I expect change to the move generator to be big.

Even without using only material evaluation
if the units of evaluations are bigger then the change is going to be bigger.

Yeah, with a coarser evaluation the effect of the move generator is greater. You could even have no evaluation and just play the first thing that comes out of the move generator in which case the move generator makes all the difference in the world. However, as a practical means of passing the similarity test, I don't think changing the move generator buys you much.

Best
Dan H.

When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?