Tuning piece values with CLOP

Evert · Post by **Evert** » Mon Oct 15, 2012 10:55 am

Yesterday I started a run to tune the piece values in Jazz, primarily as a test to see how it works. I would appreciate some feedback and comments on how I've set things up and what I've found so far, since even after going through the various threads on the topic here it's still something of a black art to me.

The full set of parameters I set up for tuning include the (opening/middle game) values for all pieces, which were initially set at 100, 325, 325, 550 and 975, the bishop pair bonus (50), bad trade penalties (rook vs minor and two minors vs rook, each set at 50) and razoring and futility margins. The reason for including the latter is that they're supposedly related to the piece values and it seemed wrong to try to tune the piece values while leaving raoring and futility margins fixed.

The pool of opponents is currently quite small, consisting of the untuned version of Jazz as well as OliThink. I should expand that, but this should do for now.

Despite what I said above, I found the number of variables to tune a bit large to begin with, so I started the run with just the piece values. After some 15000 games, this seemed to converge to about 100, 365, 365, 566 and 1122. I interrupted the run and added the other variables to the mix, setting them to their fixed values in the log file.
I also switched from using cutechess-cli to using my own referee program at about this point, but that should make no difference.

The run is currently at 40000 games, and by now the bishop and knight values have dropped back to ~325 and the rook is at ~525. The queen remains at ~1100, but there is still a large scatter in the plots and it "seems" obvious that the run has not yet converged. More interestingly, the rook-vs-minor bad trade penalty seems to be bi-modal, showing an increase in the number of points near 0 and near 100 (which are the limits of the domain).

So, questions I have now:

Was it a bad idea to start the run with a small number of variables and then later on add the other ones in by hadn by faking their input? It seemed reasonable, but the convergence that seemed to be emerging from the figures disappeared when I added them and only reappeared much later, but converging on different values.
Any thoughts on why the queen might be getting a large adjustment in value and the other pieces seem to settle to where they already are? Missing evaluation terms? Poor handling of the queen later in the game?
Does anyone have experience with variables that show bimodal behaviour? Do I need to widen the range if points cluster near the edge of the domain?
Should I expect to see some measure of convergence after 40000 games?
What is a good estimate for the error bar on the values derived by CLOP? Should I just look at the width of the point cloud?

I think that's it for now. I may have more later.

diep · Post by **diep** » Mon Oct 15, 2012 6:27 pm

Evert wrote:Yesterday I started a run to tune the piece values in Jazz, primarily as a test to see how it works. I would appreciate some feedback and comments on how I've set things up and what I've found so far, since even after going through the various threads on the topic here it's still something of a black art to me.

The full set of parameters I set up for tuning include the (opening/middle game) values for all pieces, which were initially set at 100, 325, 325, 550 and 975, the bishop pair bonus (50), bad trade penalties (rook vs minor and two minors vs rook, each set at 50) and razoring and futility margins. The reason for including the latter is that they're supposedly related to the piece values and it seemed wrong to try to tune the piece values while leaving raoring and futility margins fixed.

The pool of opponents is currently quite small, consisting of the untuned version of Jazz as well as OliThink. I should expand that, but this should do for now.

Despite what I said above, I found the number of variables to tune a bit large to begin with, so I started the run with just the piece values. After some 15000 games, this seemed to converge to about 100, 365, 365, 566 and 1122. I interrupted the run and added the other variables to the mix, setting them to their fixed values in the log file.
I also switched from using cutechess-cli to using my own referee program at about this point, but that should make no difference.

The run is currently at 40000 games, and by now the bishop and knight values have dropped back to ~325 and the rook is at ~525. The queen remains at ~1100, but there is still a large scatter in the plots and it "seems" obvious that the run has not yet converged. More interestingly, the rook-vs-minor bad trade penalty seems to be bi-modal, showing an increase in the number of points near 0 and near 100 (which are the limits of the domain).

So, questions I have now:

Was it a bad idea to start the run with a small number of variables and then later on add the other ones in by hadn by faking their input? It seemed reasonable, but the convergence that seemed to be emerging from the figures disappeared when I added them and only reappeared much later, but converging on different values.

Any thoughts on why the queen might be getting a large adjustment in value and the other pieces seem to settle to where they already are? Missing evaluation terms? Poor handling of the queen later in the game?

Does anyone have experience with variables that show bimodal behaviour? Do I need to widen the range if points cluster near the edge of the domain?

Should I expect to see some measure of convergence after 40000 games?

What is a good estimate for the error bar on the values derived by CLOP? Should I just look at the width of the point cloud?

I think that's it for now. I may have more later.

Pathetic performance by CLOP. Thanks for sharing this!

Vincent

F. Bluemers · Post by **F. Bluemers** » Mon Oct 15, 2012 7:17 pm

Evert wrote:Yesterday I started a run to tune the piece values in Jazz, primarily as a test to see how it works. I would appreciate some feedback and comments on how I've set things up and what I've found so far, since even after going through the various threads on the topic here it's still something of a black art to me.

The full set of parameters I set up for tuning include the (opening/middle game) values for all pieces, which were initially set at 100, 325, 325, 550 and 975, the bishop pair bonus (50), bad trade penalties (rook vs minor and two minors vs rook, each set at 50) and razoring and futility margins. The reason for including the latter is that they're supposedly related to the piece values and it seemed wrong to try to tune the piece values while leaving raoring and futility margins fixed.

The pool of opponents is currently quite small, consisting of the untuned version of Jazz as well as OliThink. I should expand that, but this should do for now.

Despite what I said above, I found the number of variables to tune a bit large to begin with, so I started the run with just the piece values. After some 15000 games, this seemed to converge to about 100, 365, 365, 566 and 1122. I interrupted the run and added the other variables to the mix, setting them to their fixed values in the log file.
I also switched from using cutechess-cli to using my own referee program at about this point, but that should make no difference.

The run is currently at 40000 games, and by now the bishop and knight values have dropped back to ~325 and the rook is at ~525. The queen remains at ~1100, but there is still a large scatter in the plots and it "seems" obvious that the run has not yet converged. More interestingly, the rook-vs-minor bad trade penalty seems to be bi-modal, showing an increase in the number of points near 0 and near 100 (which are the limits of the domain).

So, questions I have now:

Was it a bad idea to start the run with a small number of variables and then later on add the other ones in by hadn by faking their input? It seemed reasonable, but the convergence that seemed to be emerging from the figures disappeared when I added them and only reappeared much later, but converging on different values.

Any thoughts on why the queen might be getting a large adjustment in value and the other pieces seem to settle to where they already are? Missing evaluation terms? Poor handling of the queen later in the game?

Does anyone have experience with variables that show bimodal behaviour? Do I need to widen the range if points cluster near the edge of the domain?

Should I expect to see some measure of convergence after 40000 games?

What is a good estimate for the error bar on the values derived by CLOP? Should I just look at the width of the point cloud?

I think that's it for now. I may have more later.

I would have clopped the material values first and would go for
at least 150000 games.
Material converged for Dirty but there was still a range of +- 10 points(pawns),
even after a lot of games.
Values for rooks and queens were noisier for Dirty as far as I remember.
You might have a look at the clop results from sjeng,they are on remi's site.
My results with clop on razor/pruning values did not look convincing to me.The output stayed very noisy,it did not seem to converge.

[*] Does anyone have experience with variables that show bimodal behaviour? Do I need to widen the range if points cluster near the edge of the domain?
I saw this behaviour,I always blamed it on incomplete knowledge
of the engine
[*] Should I expect to see some measure of convergence after 40000 games?
On parameters like material,yes
not on smaller stuff like "rook on open file"
[*] What is a good estimate for the error bar on the values derived by CLOP? Should I just look at the width of the point cloud?
The widht,but you can/have to test the clop results in a real engine match

lkaufman · Post by **lkaufman** » Mon Oct 15, 2012 7:23 pm

Evert wrote:Yesterday I started a run to tune the piece values in Jazz, primarily as a test to see how it works. I would appreciate some feedback and comments on how I've set things up and what I've found so far, since even after going through the various threads on the topic here it's still something of a black art to me.

The full set of parameters I set up for tuning include the (opening/middle game) values for all pieces, which were initially set at 100, 325, 325, 550 and 975, the bishop pair bonus (50), bad trade penalties (rook vs minor and two minors vs rook, each set at 50) and razoring and futility margins. The reason for including the latter is that they're supposedly related to the piece values and it seemed wrong to try to tune the piece values while leaving raoring and futility margins fixed.

The pool of opponents is currently quite small, consisting of the untuned version of Jazz as well as OliThink. I should expand that, but this should do for now.

Despite what I said above, I found the number of variables to tune a bit large to begin with, so I started the run with just the piece values. After some 15000 games, this seemed to converge to about 100, 365, 365, 566 and 1122. I interrupted the run and added the other variables to the mix, setting them to their fixed values in the log file.
I also switched from using cutechess-cli to using my own referee program at about this point, but that should make no difference.

The run is currently at 40000 games, and by now the bishop and knight values have dropped back to ~325 and the rook is at ~525. The queen remains at ~1100, but there is still a large scatter in the plots and it "seems" obvious that the run has not yet converged. More interestingly, the rook-vs-minor bad trade penalty seems to be bi-modal, showing an increase in the number of points near 0 and near 100 (which are the limits of the domain).

So, questions I have now:

Was it a bad idea to start the run with a small number of variables and then later on add the other ones in by hadn by faking their input? It seemed reasonable, but the convergence that seemed to be emerging from the figures disappeared when I added them and only reappeared much later, but converging on different values.

Any thoughts on why the queen might be getting a large adjustment in value and the other pieces seem to settle to where they already are? Missing evaluation terms? Poor handling of the queen later in the game?

Does anyone have experience with variables that show bimodal behaviour? Do I need to widen the range if points cluster near the edge of the domain?

Should I expect to see some measure of convergence after 40000 games?

What is a good estimate for the error bar on the values derived by CLOP? Should I just look at the width of the point cloud?

I think that's it for now. I may have more later.

Why do you have two terms for rook/minor and two minors/rook "bad trades"? If you believe that simply tuning the values of rooks and minors isn't good enough, you should just let that tuning take care of the rook vs. minor case and only keep the two minors/rook term. Having worthless terms (i.e. redundant ones) really messes up your results with CLOP, and clearly a rook vs minor term is almost 100% redundant with getting the relative values of those pieces right, given that you separately cover two minors vs rook. We've never found any benefit to having even that term, but I'm interested in whether you find some benefit to it if you do CLOP without the rook vs minor term.

diep · Post by **diep** » Mon Oct 15, 2012 7:31 pm

lkaufman wrote:
Evert wrote:Yesterday I started a run to tune the piece values in Jazz, primarily as a test to see how it works. I would appreciate some feedback and comments on how I've set things up and what I've found so far, since even after going through the various threads on the topic here it's still something of a black art to me.

The full set of parameters I set up for tuning include the (opening/middle game) values for all pieces, which were initially set at 100, 325, 325, 550 and 975, the bishop pair bonus (50), bad trade penalties (rook vs minor and two minors vs rook, each set at 50) and razoring and futility margins. The reason for including the latter is that they're supposedly related to the piece values and it seemed wrong to try to tune the piece values while leaving raoring and futility margins fixed.

The pool of opponents is currently quite small, consisting of the untuned version of Jazz as well as OliThink. I should expand that, but this should do for now.

Despite what I said above, I found the number of variables to tune a bit large to begin with, so I started the run with just the piece values. After some 15000 games, this seemed to converge to about 100, 365, 365, 566 and 1122. I interrupted the run and added the other variables to the mix, setting them to their fixed values in the log file.
I also switched from using cutechess-cli to using my own referee program at about this point, but that should make no difference.

The run is currently at 40000 games, and by now the bishop and knight values have dropped back to ~325 and the rook is at ~525. The queen remains at ~1100, but there is still a large scatter in the plots and it "seems" obvious that the run has not yet converged. More interestingly, the rook-vs-minor bad trade penalty seems to be bi-modal, showing an increase in the number of points near 0 and near 100 (which are the limits of the domain).

So, questions I have now:

Was it a bad idea to start the run with a small number of variables and then later on add the other ones in by hadn by faking their input? It seemed reasonable, but the convergence that seemed to be emerging from the figures disappeared when I added them and only reappeared much later, but converging on different values.

Any thoughts on why the queen might be getting a large adjustment in value and the other pieces seem to settle to where they already are? Missing evaluation terms? Poor handling of the queen later in the game?

Does anyone have experience with variables that show bimodal behaviour? Do I need to widen the range if points cluster near the edge of the domain?

Should I expect to see some measure of convergence after 40000 games?

What is a good estimate for the error bar on the values derived by CLOP? Should I just look at the width of the point cloud?

I think that's it for now. I may have more later.
Why do you have two terms for rook/minor and two minors/rook "bad trades"? If you believe that simply tuning the values of rooks and minors isn't good enough, you should just let that tuning take care of the rook vs. minor case and only keep the two minors/rook term. Having worthless terms (i.e. redundant ones) really messes up your results with CLOP, and clearly a rook vs minor term is almost 100% redundant with getting the relative values of those pieces right, given that you separately cover two minors vs rook. We've never found any benefit to having even that term, but I'm interested in whether you find some benefit to it if you do CLOP without the rook vs minor term.

Larry it's a very good idea for him to have some patterns that are a very bad idea.

There is an article from Remi showing mathematical 'proof' so you want that this tuner works.

If you introduce a few fata morgana patterns which already confuse CLOP then, that means that all what Remi Coulom wrote is total BS about CLOP, as it should be able to deal with it.

So the only question then to Evert is how many parameters he gave to CLOP to tune.

Evert · Post by **Evert** » Mon Oct 15, 2012 8:07 pm

lkaufman wrote:Why do you have two terms for rook/minor and two minors/rook "bad trades"? If you believe that simply tuning the values of rooks and minors isn't good enough, you should just let that tuning take care of the rook vs. minor case and only keep the two minors/rook term. Having worthless terms (i.e. redundant ones) really messes up your results with CLOP, and clearly a rook vs minor term is almost 100% redundant with getting the relative values of those pieces right, given that you separately cover two minors vs rook. We've never found any benefit to having even that term, but I'm interested in whether you find some benefit to it if you do CLOP without the rook vs minor term.

I borrowed the "bad trade" idea from Crafty. You're right that to an extent the term can be absorbed in the piece value, it's just a term for dealing with material imbalances.

Given that, the two trades are different and it's not obvious that they should get the same penalty, so I figured that I'd tune them independently. If they should be equal, then that's what should come out, right? If one of them should be 0, then shouldn't it just get tuned to 0?

Having said that, if two terms are in fact strongly correlated (or just get added or subtracted so that you could just tune the sum instead) then I can see that trying to tune both of them would be problematic in the sense that the problem is ill-conditioned.

I'll run some tests and let you know.

Evert · Post by **Evert** » Mon Oct 15, 2012 8:10 pm

diep wrote: So the only question then to Evert is how many parameters he gave to CLOP to tune.

Twelve in total.
Five piece values, two bad exchange modifiers, a bishop pair bonus and four pruning margins (nominally supposed to correspond to "minor", "minor", "rook" and "queen").

Evert · Post by **Evert** » Mon Oct 15, 2012 8:15 pm

F. Bluemers wrote:I would have clopped the material values first and would go for
at least 150000 games.
Material converged for Dirty but there was still a range of +- 10 points(pawns),
even after a lot of games.
Values for rooks and queens were noisier for Dirty as far as I remember.
You might have a look at the clop results from sjeng,they are on remi's site.
My results with clop on razor/pruning values did not look convincing to me.The output stayed very noisy,it did not seem to converge.

I had a look at the piece value tuning on the website, I find it hard to interpret the results though since there is no real indication of the error bar.

Interesting that you also found rooks and queens to be quite noisy. I wonder if that's something more people find? Any thoughts on the effect of missing/incorrectly tuned evaluation terms on this?

Odd that the razor and pruning values remained noisy though... doesn't that suggest that there is something wrong there? There should be a clear optimal value for those, right?

bob · Post by **bob** » Mon Oct 15, 2012 8:15 pm

Evert wrote:
diep wrote: So the only question then to Evert is how many parameters he gave to CLOP to tune.
Twelve in total.
Five piece values, two bad exchange modifiers, a bishop pair bonus and four pruning margins (nominally supposed to correspond to "minor", "minor", "rook" and "queen").

I think there is danger down that path. You do not want to test/modify overlapping terms at the same time. Convergence REALLY stretches out and you might never get there. I'd do piece values by themselves. Then move on to the next set of terms that might include the bishop pair and such. If you do both at the same time, you might see inflated bishop piece values with a strange bishop pair bonus that brings the inflated values back down to the normal range.

I've tried quite a bit of testing over multiple terms at the same time, and quit. The noise gets too large. Cozzie's annealer produced some REALLY odd parameter values that taken together were no better than the original crafty's values, but if I took any group by themselves, the results were always worse.

jdart · Post by **jdart** » Mon Oct 15, 2012 8:24 pm

lkaufman wrote:Why do you have two terms for rook/minor and two minors/rook "bad trades"? If you believe that simply tuning the values of rooks and minors isn't good enough, you should just let that tuning take care of the rook vs. minor case

Interesting you should say that because I thought your own research showed the value of the exchange is variable depending on how many other pieces there are.

In my case the values of rook and minor are 500 and 325 so it is a 1.75 pawn bonus for rook vs. minor, which is not too far off, but I do adjust it: down when fewer pieces, up when more pieces.

--Jon

Tuning piece values with CLOP

Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP

Re: Tuning piece values with CLOP