edwardyu wrote:It is the limitation of UCCILEAG which only supports roundrobin with alterate colors. I don't think "Replications" parameter will help in this case. It will be more useful if there is an option which accepts 2 output e.g. PRINT("W") PRINT("D")
for each invocation of the script.
It is in the TODO list already. But I am too busy to do it now.
You could write the output of uccileag to a file.
Even seed: run uccileag, write file, read first result, return first result
Odd seed: read file, return second result.
edwardyu wrote:It is the limitation of UCCILEAG which only supports roundrobin with alterate colors. I don't think "Replications" parameter will help in this case. It will be more useful if there is an option which accepts 2 output e.g. PRINT("W") PRINT("D")
for each invocation of the script.
It is in the TODO list already. But I am too busy to do it now.
You could write the output of uccileag to a file.
Even seed: run uccileag, write file, read first result, return first result
Odd seed: read file, return second result.
Rémi
Thanks Remi for the solution. Here is the modified script. Also the CLOP file has to be changed to use Replications 2.
#!/usr/bin/env python
#
# pawnval.py - example script for xiangqi engine parameter optimization (Edward Yu)
#
# 2011-10-08 Even seed: run uccileag, write file, read first result, return first result
# Odd seed: read file, return second result.
#
#############################################################################
"""
DummyScript.py
This is an example script for use with parallel optimization. In order to
apply clop algorithms to your own problem, you should write a script that
behaves like this one.
Arguments are:
#1: processor id (symbolic name, typically a machine name to ssh to)
#2: seed (integer)
#3: parameter id of first parameter (symbolic name)
#4: value of first parameter (float)
#5: parameter id of second parameter (optional)
#6: value of second parameter (optional)
...
This script should write the game outcome to its output:
W = win
L = loss
D = draw
For instance:
$ ./DummyScript.py node-01 4 param 0.2
W
"""
#############################################################################
import sys
import math
import random
import time
import os
#
# Log script invocations
#
f = open('pawnval.log', 'a')
print(sys.argv, file=f)
#
# Print doc if not enough parameters
#
if len(sys.argv) < 5:
print(__doc__)
sys.exit()
#
# Sleep for a random amount of time
#
# random.seed(int(sys.argv[2]))
# time.sleep(random.random() * 2)
#
# Parse parameter values
#
i = 4
params = []
while i < len(sys.argv):
# params.append(float(sys.argv[i]))
params.append(int(sys.argv[i]))
i += 2
#
# Compute winning probability
#
# d2 = 0
# for i in range(len(params)):
# delta = params[i] - 0.3456789
# d2 += delta * delta * 10
# draw_elo = 100
# draw_rating = draw_elo * math.log(10.0) / 400.0
# win_p = 1.0 / (1.0 + math.exp(d2 + draw_rating))
# loss_p = 1.0 / (1.0 + math.exp(-d2 + draw_rating))
#
# Draw a random game according to this probability
#
# r = random.random()
# if r < loss_p:
# print("L")
# elif r > 1.0 - win_p:
# print("W")
# else:
# print("D")
seedi = int(sys.argv[2])
# even seed
if (seedi & 1) == 0 :
# Run a game with param1=params[0] etc. and print round1 result
fo = open("eychessu.ini","w")
fo.write("[param]\n")
fo.write("param1=" + str(params[0]) + "\n")
fo.write("param2=" + str(params[1]) + "\n")
fo.write("param3=" + str(params[2]) + "\n")
fo.write("param4=" + str(params[3]) + "\n")
fo.write("param5=" + str(params[4]) + "\n")
fo.close()
if os.path.isfile("92h-3dc0.CHK") :
os.remove("92h-3dc0.CHK")
if os.path.isfile("3dc-92h0.CHK") :
os.remove("3dc-92h0.CHK")
os.system("uccileag.exe < ucci3dc.in > pawnval.out");
roundnum = 0
fout = open("pawnval.out","r")
for line in fout :
parts = line.split(' ')
# print len(parts)
x = 0
while x+1<len(parts) :
parts[x]=parts[x].strip()
# print(parts[x])
if roundnum == 0 :
if parts[x] == '1/2-1/2' :
print("D")
roundnum = 1
elif parts[x] == '0-1' :
print("L")
roundnum = 1
elif parts[x] == '1-0' :
print("W")
roundnum = 1
x=x+1
fout.close()
else :
# Read game output fiile to print round2 result
roundnum = 0
fout = open("pawnval.out","r")
for line in fout :
parts = line.split(' ')
# print len(parts)
x = 0
while x+1<len(parts) :
parts[x]=parts[x].strip()
# print(parts[x])
if roundnum == 0 :
if parts[x] == '1/2-1/2' :
roundnum = 1
elif parts[x] == '0-1' :
roundnum = 1
elif parts[x] == '1-0' :
roundnum = 1
elif roundnum == 1 :
if parts[x] == '1/2-1/2' :
print("D")
roundnum = 2
elif parts[x] == '0-1' :
print("W")
roundnum = 2
elif parts[x] == '1-0' :
print("L")
roundnum = 2
x=x+1
fout.close()
os.rename("pawnval.out", "pawnval.out"+sys.argv[2])
A simple question: What statistics should I look for in order to decide that the experiment is converging and I can stop the experiment? Hessian? Eigenvectors? Max? or Win Rate?
I pretty much understand the mathematics of your paper now.
It is very pretty!
I have a few more questions. I know I could print out the source code
and find the answers myself...
In your paper there is the following phraze
Both QuadraticLogisticRegression and
LogisticMean compute the maximum a posteriori with a Gaussian prior of
variance 100.
Do you mean the prior for the coefficients of the quadratic form and for the
"mean"?
Another question: I assume QuadraticLogisticRegression and LogisticMean
perform a maximum likelihood computation where the terms of the usual
logarithmic likelihood function are weighted by "w". Is this correct?
edwardyu wrote:Thanks Remi for the solution. Here is the modified script. Also the CLOP file has to be changed to use Replications 2.
Note that this approach won't work if you have more than one processor. In order to work with more than one processor, your script should use a different file name for each different seed, and the odd seed should wait for the even seed to complete before returning its result. But that is a bit complicated.
A simple question: What statistics should I look for in order to decide that the experiment is converging and I can stop the experiment? Hessian? Eigenvectors? Max? or Win Rate?
There is no optimal way to stop: the more games you play, the more accurate the estimated optimal parameters. Confidence intervals on the win rate can give you a very vague idea of how close to optimal you may be, but it is still very difficult to tell. The plots in my paper can give you an idea of how close to optimal you may be, given a number of games played.
Michel wrote:I pretty much understand the mathematics of your paper now.
It is very pretty!
Thanks
I have a few more questions. I know I could print out the source code
and find the answers myself...
In your paper there is the following phraze
Both QuadraticLogisticRegression and
LogisticMean compute the maximum a posteriori with a Gaussian prior of
variance 100.
Do you mean the prior for the coefficients of the quadratic form and for the
"mean"?
Another question: I assume QuadraticLogisticRegression and LogisticMean
perform a maximum likelihood computation where the terms of the usual
logarithmic likelihood function are weighted by "w". Is this correct?
Yes. To be more precise: in dimension one q(x)=a*x*x + b*x +c. I maximize a function L(x)=-prior * (a * a + b * b + c * c) + \Sum_i w(x_i)log(p_i). Logistic mean is the same, with a and b removed. p_i is the probability of result i.
The variance of the posterior for logistic mean is estimated by assuming it is Gaussian and calculating its second-order derivative at the maximum.
Rémi Coulom wrote:Yes. To be more precise: in dimension one q(x)=a*x*x + b*x +c. I maximize a function L(x)=-prior * (a * a + b * b + c * c) + \Sum_i w(x_i)log(p_i).
Title: CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning
Abstract: Artificial intelligence in games often leads to the problem of parameter tuning. Some heuristics may have coefficients, and they should be tuned to maximize the win rate of the program. A possible approach consists in building local quadratic models of the win rate as a function of program parameters. Many local regression algorithms have already been proposed for this task, but they are usually not robust enough to deal automatically and efficiently with very noisy outputs and non-negative Hessians. The CLOP principle, which stands for Confident Local OPtimization, is a new approach to local regression that overcomes all these problems in a simple and efficient way. It consists in discarding samples whose estimated value is confidently inferior to the mean of all samples. Experiments demonstrate that, when the function to be optimized is smooth, this method outperforms all other tested algorithms.
It makes no miracle: you'll have to play a lot of games to get really good parameters. But it is certainly much more efficient than any manual method you could use with bayeselo. It is also more efficient than any other algorithm I am aware of.
Compared to the old version of QLR, I solved all the unstability problems. I do not have a mathematical proof of convergence, but I am convinced it always work well, unless the maximum is at a discontinuity, which never happens in practice.
Comments and questions are welcome.
Rémi
Hi Remi, from the read me below, I can not find the dummy.exe file from the CLOP-0.08.tar.bz2? Is there any CLOP compilaton for windows? Thanks.
Windows & Mac
-------------
Get the Qt SDK. Project files in programs/clop/compqt/* can be opened in Qt
Creator and compiled.
Optimizing a program with CLOP
==============================
First step: connection script
-----------------------------
The first step to connect a program to CLOP consists in writing a connection
script. A connection script takes parameter values as input, plays one game,
and returns the game outcome as output. For more information, run "Dummy.exe"
in the Windows distribution, or programs/clop/script/real/DummyScript.py in
Linux.
Ferdy wrote:
Hi Remi, from the read me below, I can not find the dummy.exe file from the CLOP-0.08.tar.bz2? Is there any CLOP compilaton for windows? Thanks.
I don't have easy access to a Windows machine, so I did not release the Windows binary, sorry. You'll have to compile everything from source.
Dummy.exe is produced by compiling that file:
CLOP-0.0.8/programs/clop/src/real/Dummy.c
MAX All Weighted
vp 23 win 171
vpr 96 draws 62
vb 52 losses 183
ve 51 games 416
vn 197 95% UCB 0.520105
vc 203 win rate 0.488152 0.869043
vr 462 95% LCB 0.427812
win rate 0.000388618
The win rate 0.488152 is under the All column, and we should look for it to improving and converge, is this correct?. There is also a win rate 0.869043 under the weighted column. What does this weighted win rate mean?
Also there is a win rate on the MAX tab : 0.000388618 which is very small. Why?
MAX All Weighted
vp 23 win 171
vpr 96 draws 62
vb 52 losses 183
ve 51 games 416
vn 197 95% UCB 0.520105
vc 203 win rate 0.488152 0.869043
vr 462 95% LCB 0.427812
win rate 0.000388618
The win rate 0.488152 is under the All column, and we should look for it to improving and converge, is this correct?. There is also a win rate 0.869043 under the weighted column. What does this weighted win rate mean?
Also there is a win rate on the MAX tab : 0.000388618 which is very small. Why?
416 games for tuning 7 parameters are not enough.
"All" column is the win rate over all games so far.
Weighted column is the win rate of samples multiplied by their weights.
Probably the regression is overfitting. I know the current version of CLOP has this problem, sometimes. I will improve it for next version.
If you wish to adjust the prior, on line 39 of src/real/CExperimentFromSettings.cpp, you can change:
pfq.SetPriorStrength(1e-3);
to
pfq.SetPriorStrength(1e-2);
or even
pfq.SetPriorStrength(1e-1);
This should solve the overfitting problem.