Bimodal CLOP Results

Tom Likens · Post by **Tom Likens** » Sat Feb 09, 2013 5:27 pm

I've been using CLOP for a while now and I've run across an interesting phenomenon. In most instances, if I allow enough games, the parameter I'm tuning will converge to a single value. But occasionally, the plot will split and converge to two separate and distinct values. Usually, these values are fairly far apart. CLOP gives the mean of these two values as best, but in all honestly I don't think you really want to use this result since it doesn't seem to truly reflect the optimal value.

Thinking about this, I'm wondering if the reason for this is because the parameter being tuned isn't really represented correctly. As a simple example, if we only look at the static features of the pawn structure, then we could easily set the penalty for a backward pawn too high or low. A more accurate assessment would take into account the dynamic characteristics of the position, such as attacks by the enemy pieces, is the pawn blocked, are the queens still on the board etc.

If these other factors aren't taken into account then CLOP may find that in some instances setting the parameter to a high value is advantageous, while in other cases setting it low is best. Under these circumstances CLOP could not converge to a single value. Has anyone else seen this same phenomenon and/or come up with a better (i.e. perhaps more cogent) explanation?

regards,
--tom

Adam Hair · Post by **Adam Hair** » Sat Feb 09, 2013 6:10 pm

Tom Likens wrote:I've been using CLOP for a while now and I've run across an interesting phenomenon. In most instances, if I allow enough games, the parameter I'm tuning will converge to a single value. But occasionally, the plot will split and converge to two separate and distinct values. Usually, these values are fairly far apart. CLOP gives the mean of these two values as best, but in all honestly I don't think you really want to use this result since it doesn't seem to truly reflect the optimal value.

Thinking about this, I'm wondering if the reason for this is because the parameter being tuned isn't really represented correctly. As a simple example, if we only look at the static features of the pawn structure, then we could easily set the penalty for a backward pawn too high or low. A more accurate assessment would take into account the dynamic characteristics of the position, such as attacks by the enemy pieces, is the pawn blocked, are the queens still on the board etc.

If these other factors aren't taken into account then CLOP may find that in some instances setting the parameter to a high value is advantageous, while in other cases setting it low is best. Under these circumstances CLOP could not converge to a single value. Has anyone else seen this same phenomenon and/or come up with a better (i.e. perhaps more cogent) explanation?

regards,
--tom

I have seen the same phenomena as you have, and have had similar thoughts. Unfortunately, I have not formulated a better explanation than you have presented.

Tom Likens · Post by **Tom Likens** » Sun Feb 10, 2013 12:52 am

Adam Hair wrote:I have seen the same phenomena as you have, and have had similar thoughts. Unfortunately, I have not formulated a better explanation than you have presented.

Thanks Adam,

I'm considering testing my theory by resurrecting one of the bimodal distributions I previously encountered and altering the evaluation to increase the accuracy of the CLOP-adjusted variable, (i.e. I would add in the additional, dynamic information I mentioned previously). The goal would be to encompass enough knowledge so that a low or high score would actually be meaningful. Or perhaps another way to state this, is that a low or high score would always consistently be either detrimental or advantageous. Consistency would give CLOP a real chance at finding a single convergence point. If this actually works, then then new CLOP run should converge to a single value. As an extra bonus, this technique could be used to flag ambiguous knowledge in the evaluation. In and of itself, a useful indicator of a potential problem.

regards,
--tom

Evert · Post by **Evert** » Sun Feb 10, 2013 10:22 am

There can be several local optima, and it may take a while for it to settle on the right one. It's certainly true that taking the average value between two optima is not useful.

That said, I haven't been very successful using CLOP... whenever I tried to tune something, it settled on a value that actually tested worse than what I started with in a verification match...

Rémi Coulom · Post by **Rémi Coulom** » Sun Feb 10, 2013 10:50 am

It is difficult to give advice without looking at the details. What I can recommend is using a wide interval for every parameter. CLOP will work better if it can figure out by itself that extreme values are bad.

If your performance actually has two local optima, then it will be difficult for CLOP. Maybe I'll add the option to use UCT-like algorithms in CLOP, because they can deal with local optima.

Also I know that in dimension > 1 CLOP may have a lot of difficulty to converge to the maximum of a sinuous ridge, even if there is no local optima.

In dimension 1, I never observed any convergence problem.

Rémi

Tom Likens · Post by **Tom Likens** » Sun Feb 10, 2013 3:14 pm

Rémi Coulom wrote:It is difficult to give advice without looking at the details. What I can recommend is using a wide interval for every parameter. CLOP will work better if it can figure out by itself that extreme values are bad.

If your performance actually has two local optima, then it will be difficult for CLOP. Maybe I'll add the option to use UCT-like algorithms in CLOP, because they can deal with local optima.

Also I know that in dimension > 1 CLOP may have a lot of difficulty to converge to the maximum of a sinuous ridge, even if there is no local optima.

In dimension 1, I never observed any convergence problem.

Rémi

Hello Rémi,

First the preliminaries, thank you for both "bayeselo" and CLOP. They're both professional level programs and your generosity of releasing them for free is appreciated.

OK, yes I've put you at a slight disadvantage because I haven't presented any data from the previous runs. Unfortunately, I usually don't save the old runs since they can take up large amounts of space. I will make the promise that when it happens again I'll save the data and send you the relevant information.

One clarification, I'm assuming the dimensionality of the problem is equal to the number of parameters being tuned, correct? If so then another experiment I could try would be to rerun the test while tuning only that single parameter. I do remember that the particular time I saw it, I was tuning two parameters simultaneously.

Also, I'm curious, if the parameter being tuned does have two local optima what will CLOP do? Will it eventually converge to one of them or will it oscillate between the two (assuming a dimensionality of 1)?

regards,
--tom

Tom Likens · Post by **Tom Likens** » Sun Feb 10, 2013 3:41 pm

Evert wrote:There can be several local optima, and it may take a while for it to settle on the right one. It's certainly true that taking the average value between two optima is not useful.

That said, I haven't been very successful using CLOP... whenever I tried to tune something, it settled on a value that actually tested worse than what I started with in a verification match...

Hello Evert,

Have you ever observed a bimodal type pattern or has CLOP always converged to a single value? Or perhaps even a better question, in one dimension have you ever initially observed a bimodal pattern that eventually (given enough games) settled down to a single value?

thanks,
--tom

Evert · Post by **Evert** » Sun Feb 10, 2013 4:38 pm

I've seen two peaks (so bimodality), but one of them faded away after enough games. The reason is that while they may both be local optima, one is better than the other. At first it's not clear which is better, but this becomes clear eventually (with enough games).

I never tried to do a 1D test.

bob · Post by **bob** » Sun Feb 10, 2013 10:51 pm

Tom Likens wrote:I've been using CLOP for a while now and I've run across an interesting phenomenon. In most instances, if I allow enough games, the parameter I'm tuning will converge to a single value. But occasionally, the plot will split and converge to two separate and distinct values. Usually, these values are fairly far apart. CLOP gives the mean of these two values as best, but in all honestly I don't think you really want to use this result since it doesn't seem to truly reflect the optimal value.

Thinking about this, I'm wondering if the reason for this is because the parameter being tuned isn't really represented correctly. As a simple example, if we only look at the static features of the pawn structure, then we could easily set the penalty for a backward pawn too high or low. A more accurate assessment would take into account the dynamic characteristics of the position, such as attacks by the enemy pieces, is the pawn blocked, are the queens still on the board etc.

If these other factors aren't taken into account then CLOP may find that in some instances setting the parameter to a high value is advantageous, while in other cases setting it low is best. Under these circumstances CLOP could not converge to a single value. Has anyone else seen this same phenomenon and/or come up with a better (i.e. perhaps more cogent) explanation?

regards,
--tom

You are into what I call a "red flag" area of tuning. It should raise a red flag anytime you have a term that doesn't converge to one value. That can mean lots of things. From an outright bug, where the term is really meaningless, to a term that is not written very well and produces the same sort of problem.

What you are seeing is EXACTLY what I saw when trying to tune Fruit's "history pruning threshold". I couldn't make heads or tails of this in Crafty and no matter what type of threshold in my code I used, it sort of produced random Elo changes. I tested Fruit and found the same, which convinced me it was a bogus way of triggering a reduction.

I more commonly see three types of results. When I test/tune, I always try to cover both sides of the optimal value so that I see a neat curve with a clear peak at some optimal V, and which drops off on either side. And that's a good one. But I also see some terms that drop off on the small-value side, but which rise to some point and then no matter how much the value is increased further, the Elo remains stable. Ditto for the other direction where a term produces the same elo from zero to N, then starts to drop off beyond N. Those are a bit harder to choose.

But whenever there is no clear peak, it deserves a LOT of attention to figure out why.

Tom Likens · Post by **Tom Likens** » Sun Feb 10, 2013 11:15 pm

bob wrote:You are into what I call a "red flag" area of tuning. It should raise a red flag anytime you have a term that doesn't converge to one value. That can mean lots of things. From an outright bug, where the term is really meaningless, to a term that is not written very well and produces the same sort of problem.

What you are seeing is EXACTLY what I saw when trying to tune Fruit's "history pruning threshold". I couldn't make heads or tails of this in Crafty and no matter what type of threshold in my code I used, it sort of produced random Elo changes. I tested Fruit and found the same, which convinced me it was a bogus way of triggering a reduction.

I more commonly see three types of results. When I test/tune, I always try to cover both sides of the optimal value so that I see a neat curve with a clear peak at some optimal V, and which drops off on either side. And that's a good one. But I also see some terms that drop off on the small-value side, but which rise to some point and then no matter how much the value is increased further, the Elo remains stable. Ditto for the other direction where a term produces the same elo from zero to N, then starts to drop off beyond N. Those are a bit harder to choose.

But whenever there is no clear peak, it deserves a LOT of attention to figure out why.

Yes, I agree. Even if the value eventually converges, as Evert saw, it still seems suspicious. At the very least it's a term that is producing conflicting game outcomes over a wide, disparate range of values, which means it's likely not defined well enough to be correct, (or it could be an outright bug as you mentioned). I think that information is useful as an indication that something "stinks" in the engine.

Just a quick question on what you wrote above, when you test a new parameter are each of the points on the curves you describe single 30k+ runs on the cluster? I'm assuming "yes", but that's a *lot* of games. Also, what makes the last case so difficult (the zero to N case)? Is it because this term could be completely unnecessary (i.e. a value of 0 effectively removes it from the equation)?

regards,
--tom

Bimodal CLOP Results

Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results

Re: Bimodal CLOP Results