tensorflow

brtzsnr · Post by **brtzsnr** » Mon Nov 09, 2015 11:19 pm

Google has just released a new ML library called TensorFlow http://tensorflow.org/get_started . I decided to test it because i've been looking to improve my evaluation function. I adapted the example to optimize y = sigmoid((1-p)*w_m*x + p*w_e*x). I extracted ~125 features (in the example below I only use figure value for exemplification) and run it.

Code: Select all

mport tensorflow as tf
import numpy as np

f = open('testdata')
x_data, y_data, p_data = [], [], []
for l in f:
    a = [float(e) for e in l.split()]
    y_data.append(a[:1])
    p_data.append(a[1:2])
    x_data.append(a[2:9])

print "read %d (%d) records" % (len(x_data), len(y_data))
print "read %d inputs" % len(x_data[0])

WM = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))
WE = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))

xm = tf.matmul(x_data, WM)
xe = tf.matmul(x_data, WE)

P = tf.constant(p_data)
y = xm*(1-P)+xe*P
y = tf.sigmoid(y/2)
                
                
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.AdamOptimizer(0.1)
train = optimizer.minimize(loss)
                
init = tf.initialize_all_variables()
                
sess = tf.Session()
sess.run(init)  
                
# Fit the plane.
for step in xrange(0, 1000000):
    sess.run(train)
    if step % 10 == 0:
        print step, sess.run(loss)
    if step % 100 == 0:
        l, m, e = [], sess.run(WM), sess.run(WE)
        for i in range(len(m)):
            l.append((int(m[i][0]*100), int(e[i][0]*100)))
        print step, l

For 700.000 positions this converges fast to

Code: Select all

1000 0.116354
1000 [(27, 19), (73, 188), (353, 407), (379, 452), (535, 738), (1212, 1379), (10, 58)]

That is a list of pairs or piece values in midgame and endgame.

Pawn = 73/188
Knight = 353/407
Bishop = 379/452
Rook = 535/748
Queen = 1212/1339

Looks very promising. With all features enabled I can get down to loss 0.109577. This is impressive considering I spent in total of 2h learning the framework and trying different functions to optimize. The best algorithm was AdamOptimizer which converges wicked fast.

I tried using more than one layer, but the loss was not better than 0.109577.

brtzsnr · Post by **brtzsnr** » Tue Nov 10, 2015 12:01 am

2000 hyper-bullet games show basically no change:

Score of zurichess vs basic: 791 - 794 - 415 [0.499] 2000
ELO difference: -1

Steve Maughan · Post by **Steve Maughan** » Tue Nov 10, 2015 1:02 am

So this was effectively a one-layer (i.e. linear) logistic regression?

- Steve

brtzsnr · Post by **brtzsnr** » Tue Nov 10, 2015 1:45 am

Yes. I used tensor flow to tune the weights in zurichess. Most engine's evaluation functions can be modeled as a single layer nn (y=w.x) so the idea should apply easily to other chess engines. I tried using a 2 layer nn with relu as activation function of the hidden layer (as in Girraffe) but the minimum final loss was the same.

jdart · Post by **jdart** » Tue Nov 10, 2015 4:38 am

I have used AdaGrad with some success. It is also fast to converge and is very simple to code.

See https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

--Jon

matthewlai · Post by **matthewlai** » Wed Nov 11, 2015 12:20 am

jdart wrote:I have used AdaGrad with some success. It is also fast to converge and is very simple to code.

See https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

--Jon

There is a newer sub-gradient algorithm called AdaDelta. It's an improvement on AdaGrad, and seems to perform a little bit better in most applications. It's what I use in Giraffe. I have also implemented AdaGrad.

jdart · Post by **jdart** » Wed Nov 11, 2015 1:41 am

Some reports I have read indicate AdaDelta is not better than AdaGrad. Or at least, which one is better may depend on the problem. See for example https://www.quora.com/Why-is-AdaDelta-n ... D-variants.

--Jon

matthewlai · Post by **matthewlai** » Wed Nov 11, 2015 3:32 am

jdart wrote:Some reports I have read indicate AdaDelta is not better than AdaGrad. Or at least, which one is better may depend on the problem. See for example https://www.quora.com/Why-is-AdaDelta-n ... D-variants.

--Jon

It depends on the problem. For example, in reinforcement learning, AdaDelta is much better than AdaGrad because reinforcement learning is about chasing a moving minimum, and step size shouldn't decrease as training goes on.

In any situation where the minimum moves, AdaDelta will be much better.

AdaDelta also has fewer constants that require tuning (and the constants aren't really important anyways).

In my experience, well-tuned AdaGrad performs about the same as AdaDelta for stationary tasks, but AdaDelta is always at least almost as good, and is pretty much foolproof without any tuning.

brtzsnr · Post by **brtzsnr** » Wed Nov 11, 2015 3:46 pm

FWIW, here I used ADAM, http://arxiv.org/pdf/1412.6980v8.pdf which is inspired by AdaGrad and RMSProp. Comparing to AdaGrad it converges a lot faster. Tensorflow doesn't provide an implementation of AdaDelta.

matthewlai · Post by **matthewlai** » Wed Nov 11, 2015 7:23 pm

brtzsnr wrote:FWIW, here I used ADAM, http://arxiv.org/pdf/1412.6980v8.pdf which is inspired by AdaGrad and RMSProp. Comparing to AdaGrad it converges a lot faster. Tensorflow doesn't provide an implementation of AdaDelta.

Adam looked interesting. I've always wanted to give it a try.

tensorflow

tensorflow

Re: tensorflow

Re: tensorflow

Re: tensorflow

Re: tensorflow

Re: tensorflow

Re: tensorflow

Re: tensorflow

Re: tensorflow

Re: tensorflow