OK, here is what I did:Michel wrote:That would be very interesting (although it is not very clear to meI will try the idea to automatically estimate the x_i and the sigma_i during runtime instead.
how to do this).
I have a hash table with 4 million entries. Given a chess position and remaining depth, a corresponding hash table entry is computed using the zobrist key and the depth value.
In each entry, I store information needed to keep a running count of the mean value and the standard deviation for the perft value of the subtree corresponding to the position and depth. See http://en.wikipedia.org/wiki/Algorithms ... _algorithm for details on how this is implemented. (I also store the hash key in each entry for collision detection.)
To compute sampling probabilities in a given position, I first generate all legal moves, then probe the hash table for all child positions.
1) For child positions that are in the hash table, and contain at least 10 samples, I use the probability weight sqrt(x_i^2+sigma_i^2).
2) For child positions that are not in the hash table, or contain fewer than 10 samples, I use a probability weight that is calculated as an average of probability weights from 1). If there are no weights from 1), set all probability weights to 1.
Compute real probabilities by scaling the probability weights so that their sum is 1.
Select a random move based on the computed probabilities.
Make a recursive call to estimate the size of the subtree corresponding to the selected move, then multiply this number with the inverse of the selection probability to get a tree size estimate for this position.
Update the corresponding hash table entry with the estimated tree size.
If there is a hash index collision, I keep the entry that has the largest estimated subtree size.
If there is a hash key collision (ie two different position+depth pairs have the same hash key), this will not be detected, which will lead to sub-optimal sampling probabilities. However, this shouldn't be a big problem, because the estimate will be unbiased regardless of how the probability weights are chosen.
I ran a test where I do 1000 random walks from the root position, then print the average value, then repeat in an infinite loop. When using uniform weights, this average-of-1000 has a standard deviation of about 1.05e17. With the algorithm I described above, after about 28 million random walks, the standard deviation for an average-of-1000 is about 4.4e16.Michel wrote:Even if it is expensive in CPU time it will still give an indication if a substantial variance reduction is achievable.
This means that about (1.05/0.44)^2=5.7 times fewer samples are needed for a given desired accuracy. However, the method uses a lot of hash probes, so it is still unclear if it is an improvement in terms of CPU cycles, compared to the uniform weight method.
The code looks like this:
Code: Select all
private static final class MeanVarEst {
private int n;
private double mean;
private double M2;
long hashKey;
MeanVarEst() {
reset(0);
}
final void reset(long hashKey) {
this.hashKey = hashKey;
n = 0;
mean = 0;
M2 = 0;
}
final void addSample(double x) {
if (n < Integer.MAX_VALUE) {
n++;
double delta = x - mean;
mean += delta / n;
M2 += delta * (x - mean);
}
}
final int getNSamples() {
return n;
}
final double getMean() {
return mean;
}
final double getVariance() {
return (n > 1) ? M2 / (n - 1) : 0;
}
}
final private static int HSIZE = 1 << 22;
private MeanVarEst[] ht;
private final long getHashKey(int depth) {
long ret = pos.zobristHash();
return ret + depth;
}
private final double getDynWeight(Move m, int depth) {
UndoInfo ui = new UndoInfo();
pos.makeMove(m, ui);
long hKey = getHashKey(depth-1);
pos.unMakeMove(m, ui);
int hIdx = (int)(hKey & (HSIZE-1));
MeanVarEst mv = ht[hIdx];
if (mv.hashKey != hKey)
return -1;
if (doPrint && (depth == 13))
System.out.printf("d:%2d m:%s avg:%g std:%g n:%d\n", depth, TextIO.moveToUCIString(m),
mv.getMean(), Math.sqrt(mv.getVariance()), mv.getNSamples());
if (mv.getNSamples() < 10)
return -1;
double av = mv.getMean();
return Math.sqrt(av * av + mv.getVariance());
}
private final void updateHashTable(double nodes, int depth) {
long hKey = getHashKey(depth);
int hIdx = (int)(hKey & (HSIZE-1));
MeanVarEst mv = ht[hIdx];
if (mv.hashKey != hKey) {
if ((mv.hashKey != 0) && mv.getMean() > nodes)
return; // Better entry already stored in this hash slot
mv.reset(hKey);
}
mv.addSample(nodes);
}
private boolean doPrint = false;
private final void doPerfTDynWeight() throws ChessParseError {
ht = new MeanVarEst[HSIZE];
for (int i = 0; i < ht.length; i++)
ht[i] = new MeanVarEst();
pos = TextIO.readFEN(TextIO.startPosFEN);
while (true) {
double nodes = 0;
int N = 1000;
// doPrint = true;
for (int i = 0; i < N; i++) {
nodes += perfTDynWeight(13, 0);
doPrint = false;
}
System.out.printf("%.0f\n", nodes / N);
}
}
private final double perfTDynWeight(int depth, int fullDepth) {
double nodes = 0;
MoveGen.MoveList moves = moveGen.pseudoLegalMoves(pos);
MoveGen.removeIllegal(pos, moves);
if ((depth == 1) || (moves.size == 0)) {
int ret = moves.size;
moveGen.returnMoveList(moves);
return ret;
}
UndoInfo ui = new UndoInfo();
if (fullDepth <= 0) {
int N = moves.size;
double weights[] = new double[N];
int avgNum = 0;
double avgSum = 0;
double maxWeight = 0;
for (int i = 0; i < N; i++) {
double w = getDynWeight(moves.m[i], depth);
weights[i] = w;
if (w > 0) {
avgSum += w;
avgNum++;
maxWeight = Math.max(maxWeight, w);
}
}
if (avgNum == 0) { // No dynamic information, use equal weights
for (int i = 0; i < N; i++)
weights[i] = 1;
} else { // At least some dynamic information, use average weights for moves with no dynamic information
double avg = avgSum / avgNum;
for (int i = 0; i < N; i++)
if (weights[i] < 0)
weights[i] = avg;
// Avoid too large imbalance
double minWeight = maxWeight / 50;
for (int i = 0; i < N; i++)
weights[i] = Math.max(weights[i], minWeight);
}
double wTotal = 0;
double wAccum[] = new double[N];
for (int i = 0; i < N; i++) {
wTotal += weights[i];
wAccum[i] = wTotal;
}
double r = rnd.nextDouble() * wTotal;
int idx = Arrays.binarySearch(wAccum, r);
if (idx < 0) idx = -(idx+1);
pos.makeMove(moves.m[idx], ui);
double tmp = perfTDynWeight(depth - 1, fullDepth - 1);
pos.unMakeMove(moves.m[idx], ui);
nodes = tmp * wTotal / weights[idx];
updateHashTable(nodes, depth);
} else {
for (int mi = 0; mi < moves.size; mi++) {
Move m = moves.m[mi];
pos.makeMove(m, ui);
nodes += perfTDynWeight(depth - 1, fullDepth - 1);
pos.unMakeMove(m, ui);
}
}
moveGen.returnMoveList(moves);
return nodes;
}