CLOP for Noisy Black-Box Parameter Optimization

Rein Halbersma · Post by **Rein Halbersma** » Sat Sep 03, 2011 7:01 pm

Rémi Coulom wrote: I don't really understand your question.

The objective function can be the expected value of any random variable that depends on parameters. So, if, instead of playing one game and getting the result, you play a tournament and observe LOS over a specific opponent, you can use CLOP to optimize it. But that would be a strange way to use CLOP. If you wish to optimize a program against a set of opponents instead of just one opponent, you can use the "Replications" option of CLOP to play a game against each opponent, and then CLOP will maximize the average winning rate against all these opponents.

I don't mean to maximize the number of points against all these opponents. I mean to maximize the chance that I end up first in a tournament. E.g. playing thousands of games against opposition with an elo of 2950 +/- 1, I would prefer my program to be rated 2940 +/- 20, rather than 2945 +/- 1. The former should have a much larger chance to finish first in a tournament than the latter, even though it would score less points on average.

Is there any way to do such optimization? Can you simply make LOS the objective function?

Rémi Coulom · Post by **Rémi Coulom** » Sat Sep 03, 2011 7:28 pm

Rein Halbersma wrote:I don't mean to maximize the number of points against all these opponents. I mean to maximize the chance that I end up first in a tournament. E.g. playing thousands of games against opposition with an elo of 2950 +/- 1, I would prefer my program to be rated 2940 +/- 20, rather than 2945 +/- 1. The former should have a much larger chance to finish first in a tournament than the latter, even though it would score less points on average.

Is there any way to do such optimization? Can you simply make LOS the objective function?

Well, the +/- interval in bayeselo's evaluation does not really depend on your program's style. It depends mostly on the number of games you play. If you get the same win rate against the same opponents with the same number of games, you'll have the same confidence interval.

Maybe the interval will be a little narrower if you have more draws. But that's an effect of the specific model bayeselo is using for draws. I don't believe it is meaningful.

If you wish to make your program more agressive to increase your chances of winning a tournament against stronger opponents, you could optimize its winning rate with CLOP while counting a draw as a loss, like I suggested in my previous message.

Rémi

petero2 · Post by **petero2** » Sat Sep 03, 2011 8:54 pm

This seem very useful. I have one question though. What does the Max-1, Max-2, ... columns mean in the "Max" tab in the gui?

Rémi Coulom · Post by **Rémi Coulom** » Sat Sep 03, 2011 9:34 pm

petero2 wrote:This seem very useful. I have one question though. What does the Max-1, Max-2, ... columns mean in the "Max" tab in the gui?

This is a feature I started to implement, but was too lazy to finish. The "Mean column" is the weighted mean of all samples. Max-1, Max-2, ... were supposed to be the maximimization of the quadratic function, starting from Mean, along each eigenvector one by one, starting from the most negative eigenvalue. In fact, only Max-1 is computed when the regression is definite negative, and it is the maximum of the regression. Max-2, ... will always remain empty. So mean is the weighted average of samples, and Max-1 is the maximum of the regression when it exists.

I'll clear that up in the next version.

Rémi

ernest · Post by **ernest** » Sun Sep 04, 2011 6:49 pm

Hi Rémi,

Since I see you are active here now, would you care to give your opinion in the discussion I initiated on Elo rating in
http://www.talkchess.com/forum/viewtopi ... 196#422196

Merci,

Ernest

Don · Post by **Don** » Wed Sep 07, 2011 6:57 pm

Rémi Coulom wrote: Comments and questions are welcome.

Rémi

Larry and I are playing around with CLOP using clop-gui.

In the Max tab, there appears the columns Mean, Max-1, Max-2 and apparently as many columns as there are numbers of parameters to be tuned.

However nothing ever appears in those columns after a day of running. What do those columns mean (no pun.)

In the paper you mention that this is supposed to converge - so given enough games and time will this eventually stop running? I told Larry that even with a very efficient algorithm it is likely to take millions of games to see values that improve on already well selected values.

And which values are taken to be the final values to use? Are the Max-n columns supposed to represent superior parameters?

Addendum: We tried only tuning a couple of parameters and values start to appear in the Max-1 column after a few minutes. But we don't really know what they mean.

Thanks.

Rémi Coulom · Post by **Rémi Coulom** » Wed Sep 07, 2011 7:47 pm

Don wrote:
Rémi Coulom wrote: Comments and questions are welcome.

Rémi
Larry and I are playing around with CLOP using clop-gui.

In the Max tab, there appears the columns Mean, Max-1, Max-2 and apparently as many columns as there are numbers of parameters to be tuned.

However nothing ever appears in those columns after a day of running. What do those columns mean (no pun.)

I explained earlier in this thread: it is a feature I did not implement. Only max-1 has a meaning: it is the maximum of the quadratic regression, if it exists. It sometimes does not exist. Mean is the weighted mean of all samples. I recommend using parameter values in the "Mean" column for your parameters. They usually perform better according to my artificial experiments.

In the paper you mention that this is supposed to converge - so given enough games and time will this eventually stop running? I told Larry that even with a very efficient algorithm it is likely to take millions of games to see values that improve on already well selected values.

It does not converge in a finite amount of games. It converges in the sense that expected simple regret decreases in time, and can be made arbitrarily small with a big enough number of games. Note that I do not have a mathematical proof of this convergence, but I am convinced it does converge if the function to be optimized is continuous.

In the paper I ran experiments on artificial problems. It gives an idea of how close to optimal parameters you get with a given number of games. Around 50% winning rate, one Elo point is about 0.0014. For the Log problem in one dimension, it takes about 10k games to have 0.001 expected simple regret (so, less than one Elo point). For the 5-dimensional problem, it takes a few hundred thousand games to reach 1-Elo expected simple regret (that's 0.2 Elo per parameter).

The magic of CLOP is that it approaches optimal performance faster than it can be measured. Simple regret is smaller than the confidence interval around the estimated win rate.

Rémi

Don · Post by **Don** » Wed Sep 07, 2011 8:09 pm

Rémi Coulom wrote:
Don wrote:
Rémi Coulom wrote: Comments and questions are welcome.

Rémi
Larry and I are playing around with CLOP using clop-gui.

In the Max tab, there appears the columns Mean, Max-1, Max-2 and apparently as many columns as there are numbers of parameters to be tuned.

However nothing ever appears in those columns after a day of running. What do those columns mean (no pun.)
I explained earlier in this thread: it is a feature I did not implement. Only max-1 has a meaning: it is the maximum of the quadratic regression, if it exists. It sometimes does not exist. Mean is the weighted mean of all samples. I recommend using parameter values in the "Mean" column for your parameters. They usually perform better according to my artificial experiments.

In the paper you mention that this is supposed to converge - so given enough games and time will this eventually stop running? I told Larry that even with a very efficient algorithm it is likely to take millions of games to see values that improve on already well selected values.
It does not converge in a finite amount of games. It converges in the sense that expected simple regret decreases in time, and can be made arbitrarily small with a big enough number of games. Note that I do not have a mathematical proof of this convergence, but I am convinced it does converge if the function to be optimized is continuous.

In the paper I ran experiments on artificial problems. It gives an idea of how close to optimal parameters you get with a given number of games. Around 50% winning rate, one Elo point is about 0.0014. For the Log problem in one dimension, it takes about 10k games to have 0.001 expected simple regret (so, less than one Elo point). For the 5-dimensional problem, it takes a few hundred thousand games to reach 1-Elo expected simple regret (that's 0.2 Elo per parameter).

The magic of CLOP is that it approaches optimal performance faster than it can be measured. Simple regret is smaller than the confidence interval around the estimated win rate.

Rémi

Thanks for explaining that Rémi. I have a script for chess if anyone wants to use it for chess. The script does not have a legal move generator for simplicity and therefore draw and win detection is a bit of a kludge, but it works well.

Rémi Coulom · Post by **Rémi Coulom** » Wed Sep 07, 2011 8:26 pm

Don wrote:Thanks for explaining that Rémi. I have a script for chess if anyone wants to use it for chess. The script does not have a legal move generator for simplicity and therefore draw and win detection is a bit of a kludge, but it works well.

I can put a link on the CLOP web site, or host it there.

Rémi

F. Bluemers · Post by **F. Bluemers** » Wed Sep 07, 2011 8:39 pm

Don wrote:
Rémi Coulom wrote:
Don wrote:
Rémi Coulom wrote: Comments and questions are welcome.

Rémi
Larry and I are playing around with CLOP using clop-gui.

In the Max tab, there appears the columns Mean, Max-1, Max-2 and apparently as many columns as there are numbers of parameters to be tuned.

However nothing ever appears in those columns after a day of running. What do those columns mean (no pun.)
I explained earlier in this thread: it is a feature I did not implement. Only max-1 has a meaning: it is the maximum of the quadratic regression, if it exists. It sometimes does not exist. Mean is the weighted mean of all samples. I recommend using parameter values in the "Mean" column for your parameters. They usually perform better according to my artificial experiments.

In the paper you mention that this is supposed to converge - so given enough games and time will this eventually stop running? I told Larry that even with a very efficient algorithm it is likely to take millions of games to see values that improve on already well selected values.
It does not converge in a finite amount of games. It converges in the sense that expected simple regret decreases in time, and can be made arbitrarily small with a big enough number of games. Note that I do not have a mathematical proof of this convergence, but I am convinced it does converge if the function to be optimized is continuous.

In the paper I ran experiments on artificial problems. It gives an idea of how close to optimal parameters you get with a given number of games. Around 50% winning rate, one Elo point is about 0.0014. For the Log problem in one dimension, it takes about 10k games to have 0.001 expected simple regret (so, less than one Elo point). For the 5-dimensional problem, it takes a few hundred thousand games to reach 1-Elo expected simple regret (that's 0.2 Elo per parameter).

The magic of CLOP is that it approaches optimal performance faster than it can be measured. Simple regret is smaller than the confidence interval around the estimated win rate.

Rémi
Thanks for explaining that Rémi. I have a script for chess if anyone wants to use it for chess. The script does not have a legal move generator for simplicity and therefore draw and win detection is a bit of a kludge, but it works well.

I wrote one that calls xboard to run the game
I redirect the output to a file and parse it like this to get the outcome:

Code: Select all

for line in open&#40;"herr")&#58;
    parts = line.split&#40;' ')
    # print len&#40;parts&#41;
    x = 0
    while x+1<len&#40;parts&#41; &#58;
    	if parts&#91;x&#93; == 'score' &#58;
           parts&#91;x+1&#93;=parts&#91;x+1&#93;.strip&#40;)
           if parts&#91;x+1&#93; == '0-0-1' &#58;
               print 'D'
           if parts&#91;x+1&#93; == '0-1-0' &#58; 
               print 'L'
           if parts&#91;x+1&#93; == '1-0-0' &#58;
               print 'W'
        x=x+1

My first hack in python ,so I bet it can be done a lot better,I parse
all lines,if xboard is invoked with -debug it contains more lines (last one holding result,but i wanted to keep the code simple)

CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization

Re: CLOP for Noisy Black-Box Parameter Optimization