CLOP for Noisy Black-Box Parameter Optimization

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Rémi Coulom
Posts: 404
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: CLOP - a trial for xiangqi engine

Post by Rémi Coulom » Fri Oct 07, 2011 4:30 pm

edwardyu wrote:It is the limitation of UCCILEAG which only supports roundrobin with alterate colors. I don't think "Replications" parameter will help in this case. It will be more useful if there is an option which accepts 2 output e.g. PRINT("W") PRINT("D")
for each invocation of the script.
It is in the TODO list already. But I am too busy to do it now.

You could write the output of uccileag to a file.
Even seed: run uccileag, write file, read first result, return first result
Odd seed: read file, return second result.

Rémi

edwardyu
Posts: 34
Joined: Mon Nov 17, 2008 5:58 am

Re: CLOP - a trial for xiangqi engine

Post by edwardyu » Sat Oct 08, 2011 3:59 am

Rémi Coulom wrote:
edwardyu wrote:It is the limitation of UCCILEAG which only supports roundrobin with alterate colors. I don't think "Replications" parameter will help in this case. It will be more useful if there is an option which accepts 2 output e.g. PRINT("W") PRINT("D")
for each invocation of the script.
It is in the TODO list already. But I am too busy to do it now.

You could write the output of uccileag to a file.
Even seed: run uccileag, write file, read first result, return first result
Odd seed: read file, return second result.

Rémi
Thanks Remi for the solution. Here is the modified script. Also the CLOP file has to be changed to use Replications 2.

Code: Select all

#!/usr/bin/env python
#
# pawnval.py - example script for xiangqi engine parameter optimization (Edward Yu)
# 
# 2011-10-08 Even seed: run uccileag, write file, read first result, return first result
#            Odd seed: read file, return second result.  
#
#############################################################################
"""
 DummyScript.py

 This is an example script for use with parallel optimization. In order to
 apply clop algorithms to your own problem, you should write a script that
 behaves like this one.

 Arguments are:
  #1: processor id (symbolic name, typically a machine name to ssh to)
  #2: seed (integer)
  #3: parameter id of first parameter (symbolic name)
  #4: value of first parameter (float)
  #5: parameter id of second parameter (optional)
  #6: value of second parameter (optional)
  ...

 This script should write the game outcome to its output:
  W = win
  L = loss
  D = draw

 For instance:
  $ ./DummyScript.py node-01 4 param 0.2
  W
"""
#############################################################################
import sys
import math
import random
import time
import os
#
# Log script invocations
#
f = open('pawnval.log', 'a')
print(sys.argv, file=f)

#
# Print doc if not enough parameters
#
if len&#40;sys.argv&#41; < 5&#58;
     print&#40;__doc__)
     sys.exit&#40;)

#
# Sleep for a random amount of time
#
# random.seed&#40;int&#40;sys.argv&#91;2&#93;))
# time.sleep&#40;random.random&#40;) * 2&#41;

#
# Parse parameter values
#
i = 4
params = &#91;&#93;
while i < len&#40;sys.argv&#41;&#58;
    # params.append&#40;float&#40;sys.argv&#91;i&#93;))
    params.append&#40;int&#40;sys.argv&#91;i&#93;))
    i += 2

#
# Compute winning probability
#
# d2 = 0
# for i in range&#40;len&#40;params&#41;)&#58;
#    delta = params&#91;i&#93; - 0.3456789
#    d2 += delta * delta * 10

# draw_elo = 100
# draw_rating = draw_elo * math.log&#40;10.0&#41; / 400.0
# win_p = 1.0 / &#40;1.0 + math.exp&#40;d2 + draw_rating&#41;)
# loss_p = 1.0 / &#40;1.0 + math.exp&#40;-d2 + draw_rating&#41;)

#
# Draw a random game according to this probability
#
# r = random.random&#40;)
# if r < loss_p&#58;
#    print&#40;"L")
# elif r > 1.0 - win_p&#58;
#    print&#40;"W")
# else&#58;
#    print&#40;"D")

seedi = int&#40;sys.argv&#91;2&#93;)
# even seed
if &#40;seedi & 1&#41; == 0 &#58; 
   # Run a game with param1=params&#91;0&#93; etc. and print round1 result
   fo = open&#40;"eychessu.ini","w")
   fo.write&#40;"&#91;param&#93;\n")
   fo.write&#40;"param1=" + str&#40;params&#91;0&#93;) + "\n")
   fo.write&#40;"param2=" + str&#40;params&#91;1&#93;) + "\n")
   fo.write&#40;"param3=" + str&#40;params&#91;2&#93;) + "\n")
   fo.write&#40;"param4=" + str&#40;params&#91;3&#93;) + "\n")
   fo.write&#40;"param5=" + str&#40;params&#91;4&#93;) + "\n")
   fo.close&#40;)

   if os.path.isfile&#40;"92h-3dc0.CHK") &#58;
      os.remove&#40;"92h-3dc0.CHK")
   if os.path.isfile&#40;"3dc-92h0.CHK") &#58;   
      os.remove&#40;"3dc-92h0.CHK")
   os.system&#40;"uccileag.exe < ucci3dc.in > pawnval.out");
   
   roundnum = 0
   fout = open&#40;"pawnval.out","r") 
   for line in fout &#58;
      parts = line.split&#40;' ')
      # print len&#40;parts&#41;   
      x = 0
      while x+1<len&#40;parts&#41; &#58;       
       parts&#91;x&#93;=parts&#91;x&#93;.strip&#40;)
       # print&#40;parts&#91;x&#93;)
       if roundnum == 0 &#58;
          if parts&#91;x&#93; == '1/2-1/2' &#58;
             print&#40;"D")
             roundnum = 1             
          elif parts&#91;x&#93; == '0-1' &#58;
             print&#40;"L")
             roundnum = 1 
          elif parts&#91;x&#93; == '1-0' &#58;
             print&#40;"W")
             roundnum = 1        
       x=x+1 
   fout.close&#40;)
else &#58;
   # Read game output fiile to print round2 result   
   roundnum = 0
   fout = open&#40;"pawnval.out","r") 
   for line in fout &#58;
      parts = line.split&#40;' ')
      # print len&#40;parts&#41;   
      x = 0
      while x+1<len&#40;parts&#41; &#58;       
       parts&#91;x&#93;=parts&#91;x&#93;.strip&#40;)
       # print&#40;parts&#91;x&#93;)
       if roundnum == 0 &#58;
          if parts&#91;x&#93; == '1/2-1/2' &#58;             
             roundnum = 1             
          elif parts&#91;x&#93; == '0-1' &#58;             
             roundnum = 1              
          elif parts&#91;x&#93; == '1-0' &#58;             
             roundnum = 1             
           
       elif roundnum == 1 &#58;           
          if parts&#91;x&#93; == '1/2-1/2' &#58;
             print&#40;"D")
             roundnum = 2
          elif parts&#91;x&#93; == '0-1' &#58;
             print&#40;"W")
             roundnum = 2
          elif parts&#91;x&#93; == '1-0' &#58;
             print&#40;"L")
             roundnum = 2 
                      
       x=x+1 
   fout.close&#40;)    
   os.rename&#40;"pawnval.out", "pawnval.out"+sys.argv&#91;2&#93;)
A simple question: What statistics should I look for in order to decide that the experiment is converging and I can stop the experiment? Hessian? Eigenvectors? Max? or Win Rate?

Michel
Posts: 1961
Joined: Sun Sep 28, 2008 11:50 pm

Re: CLOP

Post by Michel » Sat Oct 08, 2011 6:00 am

I pretty much understand the mathematics of your paper now.
It is very pretty!

I have a few more questions. I know I could print out the source code
and find the answers myself...

In your paper there is the following phraze
Both QuadraticLogisticRegression and
LogisticMean compute the maximum a posteriori with a Gaussian prior of
variance 100.
Do you mean the prior for the coefficients of the quadratic form and for the
"mean"?

Another question: I assume QuadraticLogisticRegression and LogisticMean
perform a maximum likelihood computation where the terms of the usual
logarithmic likelihood function are weighted by "w". Is this correct?

Rémi Coulom
Posts: 404
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: CLOP - a trial for xiangqi engine

Post by Rémi Coulom » Sat Oct 08, 2011 7:49 am

edwardyu wrote:Thanks Remi for the solution. Here is the modified script. Also the CLOP file has to be changed to use Replications 2.
Note that this approach won't work if you have more than one processor. In order to work with more than one processor, your script should use a different file name for each different seed, and the odd seed should wait for the even seed to complete before returning its result. But that is a bit complicated.
A simple question: What statistics should I look for in order to decide that the experiment is converging and I can stop the experiment? Hessian? Eigenvectors? Max? or Win Rate?
There is no optimal way to stop: the more games you play, the more accurate the estimated optimal parameters. Confidence intervals on the win rate can give you a very vague idea of how close to optimal you may be, but it is still very difficult to tell. The plots in my paper can give you an idea of how close to optimal you may be, given a number of games played.

Rémi Coulom
Posts: 404
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: CLOP

Post by Rémi Coulom » Sat Oct 08, 2011 7:53 am

Michel wrote:I pretty much understand the mathematics of your paper now.
It is very pretty!
Thanks
I have a few more questions. I know I could print out the source code
and find the answers myself...

In your paper there is the following phraze
Both QuadraticLogisticRegression and
LogisticMean compute the maximum a posteriori with a Gaussian prior of
variance 100.
Do you mean the prior for the coefficients of the quadratic form and for the
"mean"?

Another question: I assume QuadraticLogisticRegression and LogisticMean
perform a maximum likelihood computation where the terms of the usual
logarithmic likelihood function are weighted by "w". Is this correct?
Yes. To be more precise: in dimension one q(x)=a*x*x + b*x +c. I maximize a function L(x)=-prior * (a * a + b * b + c * c) + \Sum_i w(x_i)log(p_i). Logistic mean is the same, with a and b removed. p_i is the probability of result i.

The variance of the posterior for logistic mean is estimated by assuming it is Gaussian and calculating its second-order derivative at the maximum.

Rémi

Rémi Coulom
Posts: 404
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: CLOP

Post by Rémi Coulom » Sat Oct 08, 2011 8:27 am

Rémi Coulom wrote:Yes. To be more precise: in dimension one q(x)=a*x*x + b*x +c. I maximize a function L(x)=-prior * (a * a + b * b + c * c) + \Sum_i w(x_i)log(p_i).
I meant L(a,b,c), not L(x)

Rémi

Ferdy
Posts: 3607
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: CLOP for Noisy Black-Box Parameter Optimization

Post by Ferdy » Tue Oct 11, 2011 8:22 am

Rémi Coulom wrote:Hi,

This is my paper for the Tilburg conference:

Title: CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning

Abstract: Artificial intelligence in games often leads to the problem of parameter tuning. Some heuristics may have coefficients, and they should be tuned to maximize the win rate of the program. A possible approach consists in building local quadratic models of the win rate as a function of program parameters. Many local regression algorithms have already been proposed for this task, but they are usually not robust enough to deal automatically and efficiently with very noisy outputs and non-negative Hessians. The CLOP principle, which stands for Confident Local OPtimization, is a new approach to local regression that overcomes all these problems in a simple and efficient way. It consists in discarding samples whose estimated value is confidently inferior to the mean of all samples. Experiments demonstrate that, when the function to be optimized is smooth, this method outperforms all other tested algorithms.

pdf and source code:
http://remi.coulom.free.fr/CLOP/

It makes no miracle: you'll have to play a lot of games to get really good parameters. But it is certainly much more efficient than any manual method you could use with bayeselo. It is also more efficient than any other algorithm I am aware of.

Compared to the old version of QLR, I solved all the unstability problems. I do not have a mathematical proof of convergence, but I am convinced it always work well, unless the maximum is at a discontinuity, which never happens in practice.

Comments and questions are welcome.

Rémi
Hi Remi, from the read me below, I can not find the dummy.exe file from the CLOP-0.08.tar.bz2? Is there any CLOP compilaton for windows? Thanks.
Windows & Mac
-------------
Get the Qt SDK. Project files in programs/clop/compqt/* can be opened in Qt
Creator and compiled.


Optimizing a program with CLOP
==============================

First step: connection script
-----------------------------
The first step to connect a program to CLOP consists in writing a connection
script. A connection script takes parameter values as input, plays one game,
and returns the game outcome as output. For more information, run "Dummy.exe"
in the Windows distribution, or programs/clop/script/real/DummyScript.py in
Linux.

Rémi Coulom
Posts: 404
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: CLOP for Noisy Black-Box Parameter Optimization

Post by Rémi Coulom » Tue Oct 11, 2011 1:38 pm

Ferdy wrote: Hi Remi, from the read me below, I can not find the dummy.exe file from the CLOP-0.08.tar.bz2? Is there any CLOP compilaton for windows? Thanks.
I don't have easy access to a Windows machine, so I did not release the Windows binary, sorry. You'll have to compile everything from source.

Dummy.exe is produced by compiling that file:
CLOP-0.0.8/programs/clop/src/real/Dummy.c

Rémi

edwardyu
Posts: 34
Joined: Mon Nov 17, 2008 5:58 am

Re: CLOP - a trial for xiangqi engine

Post by edwardyu » Tue Oct 11, 2011 4:04 pm

I have tried to optimize the piece values of my xiangqi engines. A preliminary result is as follows :-

Code: Select all

MAX       All                  Weighted
vp 23     win 171
vpr 96    draws 62
vb 52    losses 183
ve 51     games 416
vn 197    95% UCB 0.520105
vc 203    win rate 0.488152      0.869043 
vr 462    95% LCB 0.427812

win rate 0.000388618  

The win rate 0.488152 is under the All column, and we should look for it to improving and converge, is this correct?. There is also a win rate 0.869043 under the weighted column. What does this weighted win rate mean?

Also there is a win rate on the MAX tab : 0.000388618 which is very small. Why?

Rémi Coulom
Posts: 404
Joined: Mon Apr 24, 2006 6:06 pm
Contact:

Re: CLOP - a trial for xiangqi engine

Post by Rémi Coulom » Tue Oct 11, 2011 4:22 pm

edwardyu wrote:I have tried to optimize the piece values of my xiangqi engines. A preliminary result is as follows :-

Code: Select all

MAX       All                  Weighted
vp 23     win 171
vpr 96    draws 62
vb 52    losses 183
ve 51     games 416
vn 197    95% UCB 0.520105
vc 203    win rate 0.488152      0.869043 
vr 462    95% LCB 0.427812

win rate 0.000388618  

The win rate 0.488152 is under the All column, and we should look for it to improving and converge, is this correct?. There is also a win rate 0.869043 under the weighted column. What does this weighted win rate mean?

Also there is a win rate on the MAX tab : 0.000388618 which is very small. Why?
416 games for tuning 7 parameters are not enough.

"All" column is the win rate over all games so far.
Weighted column is the win rate of samples multiplied by their weights.

Probably the regression is overfitting. I know the current version of CLOP has this problem, sometimes. I will improve it for next version.

If you wish to adjust the prior, on line 39 of src/real/CExperimentFromSettings.cpp, you can change:
pfq.SetPriorStrength(1e-3);
to
pfq.SetPriorStrength(1e-2);
or even
pfq.SetPriorStrength(1e-1);
This should solve the overfitting problem.

I will work on this problem for the next version.

Playing more games will help, too.

Rémi

Post Reply