tensorflow

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

tensorflow

Post by brtzsnr »

Google has just released a new ML library called TensorFlow http://tensorflow.org/get_started . I decided to test it because i've been looking to improve my evaluation function. I adapted the example to optimize y = sigmoid((1-p)*w_m*x + p*w_e*x). I extracted ~125 features (in the example below I only use figure value for exemplification) and run it.

Code: Select all

mport tensorflow as tf
import numpy as np

f = open('testdata')
x_data, y_data, p_data = [], [], []
for l in f:
    a = [float(e) for e in l.split()]
    y_data.append(a[:1])
    p_data.append(a[1:2])
    x_data.append(a[2:9])

print "read %d (%d) records" % (len(x_data), len(y_data))
print "read %d inputs" % len(x_data[0])

WM = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))
WE = tf.Variable(tf.random_uniform([len(x_data[0]), 1]))

xm = tf.matmul(x_data, WM)
xe = tf.matmul(x_data, WE)

P = tf.constant(p_data)
y = xm*(1-P)+xe*P
y = tf.sigmoid(y/2)
                
                
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.AdamOptimizer(0.1)
train = optimizer.minimize(loss)
                
init = tf.initialize_all_variables()
                
sess = tf.Session()
sess.run(init)  
                
# Fit the plane.
for step in xrange(0, 1000000):
    sess.run(train)
    if step % 10 == 0:
        print step, sess.run(loss)
    if step % 100 == 0:
        l, m, e = [], sess.run(WM), sess.run(WE)
        for i in range(len(m)):
            l.append((int(m[i][0]*100), int(e[i][0]*100)))
        print step, l
For 700.000 positions this converges fast to

Code: Select all

1000 0.116354
1000 [(27, 19), (73, 188), (353, 407), (379, 452), (535, 738), (1212, 1379), (10, 58)]
That is a list of pairs or piece values in midgame and endgame.

Pawn = 73/188
Knight = 353/407
Bishop = 379/452
Rook = 535/748
Queen = 1212/1339

Looks very promising. With all features enabled I can get down to loss 0.109577. This is impressive considering I spent in total of 2h learning the framework and trying different functions to optimize. The best algorithm was AdamOptimizer which converges wicked fast.

I tried using more than one layer, but the loss was not better than 0.109577.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: tensorflow

Post by brtzsnr »

2000 hyper-bullet games show basically no change:

Score of zurichess vs basic: 791 - 794 - 415 [0.499] 2000
ELO difference: -1
User avatar
Steve Maughan
Posts: 1221
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: tensorflow

Post by Steve Maughan »

So this was effectively a one-layer (i.e. linear) logistic regression?

- Steve
http://www.chessprogramming.net - Maverick Chess Engine
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: tensorflow

Post by brtzsnr »

Yes. I used tensor flow to tune the weights in zurichess. Most engine's evaluation functions can be modeled as a single layer nn (y=w.x) so the idea should apply easily to other chess engines. I tried using a 2 layer nn with relu as activation function of the hidden layer (as in Girraffe) but the minimum final loss was the same.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: tensorflow

Post by jdart »

I have used AdaGrad with some success. It is also fast to converge and is very simple to code.

See https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

--Jon
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: tensorflow

Post by matthewlai »

jdart wrote:I have used AdaGrad with some success. It is also fast to converge and is very simple to code.

See https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

--Jon
There is a newer sub-gradient algorithm called AdaDelta. It's an improvement on AdaGrad, and seems to perform a little bit better in most applications. It's what I use in Giraffe. I have also implemented AdaGrad.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: tensorflow

Post by jdart »

Some reports I have read indicate AdaDelta is not better than AdaGrad. Or at least, which one is better may depend on the problem. See for example https://www.quora.com/Why-is-AdaDelta-n ... D-variants.

--Jon
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: tensorflow

Post by matthewlai »

jdart wrote:Some reports I have read indicate AdaDelta is not better than AdaGrad. Or at least, which one is better may depend on the problem. See for example https://www.quora.com/Why-is-AdaDelta-n ... D-variants.

--Jon
It depends on the problem. For example, in reinforcement learning, AdaDelta is much better than AdaGrad because reinforcement learning is about chasing a moving minimum, and step size shouldn't decrease as training goes on.

In any situation where the minimum moves, AdaDelta will be much better.

AdaDelta also has fewer constants that require tuning (and the constants aren't really important anyways).

In my experience, well-tuned AdaGrad performs about the same as AdaDelta for stationary tasks, but AdaDelta is always at least almost as good, and is pretty much foolproof without any tuning.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: tensorflow

Post by brtzsnr »

FWIW, here I used ADAM, http://arxiv.org/pdf/1412.6980v8.pdf which is inspired by AdaGrad and RMSProp. Comparing to AdaGrad it converges a lot faster. Tensorflow doesn't provide an implementation of AdaDelta.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: tensorflow

Post by matthewlai »

brtzsnr wrote:FWIW, here I used ADAM, http://arxiv.org/pdf/1412.6980v8.pdf which is inspired by AdaGrad and RMSProp. Comparing to AdaGrad it converges a lot faster. Tensorflow doesn't provide an implementation of AdaDelta.
Adam looked interesting. I've always wanted to give it a try.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.