When will we see HOUDINI in official tournaments?

Don · Post by **Don** » Fri May 11, 2012 8:46 pm

Graham Banks wrote:
Don wrote:
LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
I could hardly fail to disagree with you less.
I wonder which engine Alex is the author of.

He should team up with Fernando to help improve the Moron engine by adding random weights to all zero features.

Uri Blass · Post by **Uri Blass** » Fri May 11, 2012 9:39 pm

LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.

The question is how do you define super strong engine.

suppose you get an engine that is X elo weaker than Ivanhoe by evaluation modification for some X(X may be 20 or 40 or 100 or 200)

Do you define it to be a super strong engine in part of the cases and if yes then for what value of X?

Note that I wonder what is the correlation between gambitTiger and Tiger14.0 and if they have similiarity of more than 60%

I remember that the programmer claimed that he was surprised when gambitTiger was so strong and he learned that it is possible to do significant changes in the evaluation that change the playing style but not reduce much the playing strength.

Uri

Don · Post by **Don** » Fri May 11, 2012 9:44 pm

Uri Blass wrote:
LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.
The question is how do you define super strong engine.

suppose you get an engine that is X elo weaker than Ivanhoe by evaluation modification for some X(X may be 20 or 40 or 100 or 200)

Do you define it to be a super strong engine in part of the case and if yes then for what value of X?

Note that I wonder what is the correlation between gambitTiger and Tiger14.0 and if they have similiarity of more than 60%

I remember that the programmer claimed that he was surprised when gambitTiger was so strong and he learned that it is possible to do significant changes in the evaluation that change the playing style but not reduce much the playing strength.

Uri

None of the anecdotes resonate much with me because random changes hurt Komodo's playing strength. We have changed the style somewhat with carefully engineered changes to many terms in the past - but nothing random.

Very few people really understand how to measure program strength anyway, this forum is littered with posts where people have made some change and run 50 game matches to "prove" it.

Don

Uri Blass · Post by **Uri Blass** » Fri May 11, 2012 9:52 pm

I believe that random changes hurt komodo's playing strength and the same for other top programs but the question is how much and if people can start with something that is 50 elo weaker or 100 elo weaker than Ivanhoe only by some random modifications of the evaluation of IvanHoe and escape the similiarity test.

Don · Post by **Don** » Fri May 11, 2012 10:33 pm

Uri Blass wrote:I believe that random changes hurt komodo's playing strength and the same for other top programs but the question is how much and if people can start with something that is 50 elo weaker or 100 elo weaker than Ivanhoe only by some random modifications of the evaluation of IvanHoe and escape the similiarity test.

If they have to give up 50 - 100 ELO it is a disincentive not to cheat but a cheater by his very nature doesn't want to give up any ELO. I should qualify that. Some cheat just because they cannot program and will be happy with a pretty weak program, but the more typical case I call the "Rosie Ruiz" style cheater.

But I don't think this weakening can easily fool the test because Doch through Komodo had well over 50 ELO due just to evaluation improvements which did not fool the test. This was a lot of weight changes and added terms.

It's surprising to me how resilient that test is to trickery - but I don't think we yet understand it's limits. It seems likely to me that it will have some weaknesses.

LudiBuda · Post by **LudiBuda** » Fri May 11, 2012 11:01 pm

Hope you are not making an excuse to use Ivanhoe search 'ideas', because 'evaluation is what matters'.

Don · Post by **Don** » Sat May 12, 2012 12:25 am

LudiBuda wrote:Hope you are not making an excuse to use Ivanhoe search 'ideas', because 'evaluation is what matters'.

Please do not start this worn out rant again.

Uri Blass · Post by **Uri Blass** » Sat May 12, 2012 3:41 am

Don wrote:
Uri Blass wrote:I believe that random changes hurt komodo's playing strength and the same for other top programs but the question is how much and if people can start with something that is 50 elo weaker or 100 elo weaker than Ivanhoe only by some random modifications of the evaluation of IvanHoe and escape the similiarity test.
If they have to give up 50 - 100 ELO it is a disincentive not to cheat but a cheater by his very nature doesn't want to give up any ELO. I should qualify that. Some cheat just because they cannot program and will be happy with a pretty weak program, but the more typical case I call the "Rosie Ruiz" style cheater.

But I don't think this weakening can easily fool the test because Doch through Komodo had well over 50 ELO due just to evaluation improvements which did not fool the test. This was a lot of weight changes and added terms.

It's surprising to me how resilient that test is to trickery - but I don't think we yet understand it's limits. It seems likely to me that it will have some weaknesses.

I think that some of the tricks to fool the test may be to change the move generator.

Imagine that Be2 and Bc2 have exactly the same score at every depth and are the 2 best move.

The choice if to play Be2 or Bc2 may be dependent on the move generator.

If the move generator generates first Be2 then Be2 may become the pv and if the move generator generates first Bc2 then Bc2 may become the pv.

programmers who do not like to change the move generator may change the program to change its mind at depth 1 if they have at least the same score and not if they have better score when the search at bigger depth is the same.

I wonder if you tested changes in the move generator to see if they help komodo or not.

CRoberson · Post by **CRoberson** » Sat May 12, 2012 4:02 am

LudiBuda wrote:I couldn't agree less with you on this.
Evaluation of the engine is of almost no importance for the ELO strength. Just try to modify Ivanhoe by putting random weights for the evaluation terms. You will still have a super strong engine.
Evaluation has great influence on the playing style, but lets not kid ourselves. Most people care about ELO and ELO only.

Wow, talk about not understanding your experiment.

About 3 or 4 years ago, I devised a test to reveal the importance of modern eval vs search. The results were drastic and I reported them in the tech forum here. Bob Hyatt independently verified the results with a greater number of games for accuracy and the results were very similar.

In the end, an excellent eval vs only piece counting is worth 700 to 1000 Elo or more. The piece counting programming (all else
identical) scored only 1% vs the full version of the program. The reduced eval program gained 3 ply in search. So, it gained Elo. Add that gain to the 700 performance difference and you see that eval is paramount.

LudiBuda · Post by **LudiBuda** » Sat May 12, 2012 7:25 pm

What are you talking about? Did you read my post at all?

What I am suggesting is to take Ivanhoe, run the test games against 10 opponents, then put random weight between lets say 0 and 20 for each eval term and run the test again.

Your experiment is bogus. Try doing the same test for the search. Have a brute force search with the state of the art eval and see what you get.

When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?

Re: When will we see HOUDINI in official tournaments?