Learning time growing exponentially with number of training examples

Henk · Post by **Henk** » Sun Aug 26, 2018 6:29 pm

Maybe something wrong with my network:

When I do only supervised learning:

Learning 500 examples takes about 30 seconds
1000 examples about a minute
3000 examples 15 minutes
4000 examples 99 minutes

So I don't even want to try 8000 examples.

Is this normal for neural networks and what timing data do you have ?

[Might be I am using wrong value for hyperparameters. I am using Adam optimization algorithm]

Sesse · Post by **Sesse** » Sun Aug 26, 2018 8:11 pm

No, this is not normal. Although I guess it depends on what exactly you are timing (time for one epoch? convergence on the training set only? something else?).

Henk · Post by **Henk** » Sun Aug 26, 2018 8:27 pm

Sesse wrote: ↑Sun Aug 26, 2018 8:11 pm No, this is not normal. Although I guess it depends on what exactly you are timing (time for one epoch? convergence on the training set only? something else?).

I don't know what epoch is. Examples form one training set. Algorithm stops when average error for pass over training set is less than some value.

Can't split training set in smaller ones. For when for instance learning of last part weights might have changed that affect evaluation of examples from first part. So after learning last part the error for first part might have changed.

So error is counted over all examples of the training set after weights had been changed.

[By the way in last test now learning 3500 examples took 51 minutes ]

brianr · Post by **brianr** » Sun Aug 26, 2018 8:46 pm

Henk wrote: ↑Sun Aug 26, 2018 8:27 pm
Sesse wrote: ↑Sun Aug 26, 2018 8:11 pm No, this is not normal. Although I guess it depends on what exactly you are timing (time for one epoch? convergence on the training set only? something else?).
I don't know what epoch is. Examples form one training set. Algorithm stops when average error for pass over training set is less than some value.

Can't split training set in smaller ones. For when for instance learning of last part weights might have changed that affect evaluation of examples from first part. So after learning last part the error for first part might have changed.

So error is counted over all examples of the training set after weights had been changed.

[By the way in last test now learning 3500 examples took 51 minutes ]

If you don't know what an "epoch" is may I suggest you find out.
Algorithm stopping at some "average error" point does not sound very helpful.
I think more important is the trend in the error and how does the net perform.
You should learn to see how your NN is doing with some graphical tools.
One resource I found helpful is here (look for intro posts):
https://machinelearningmastery.com/blog/

What framework are you using?
CPU or GPU tools?

It is probably unlikely in your case (although for chess, less than millions of samples will not yield much), but one thing that can happen with small sample sizes is that your net can actually "memorize" them all, which is quite fast. Then, the error becomes very low (depending on your test/validation/test sample sets), so your run time will be lower. Using more samples than can be memorized is when the net starts to generalize and "learn", so reaching a lower error will take much longer.

Henk · Post by **Henk** » Sun Aug 26, 2018 9:29 pm

I am not using any framework (except .net framework). I don't have gpu's. I already set error too ridiculously high value. Say 30 centipawns or more. So it is just terribly slow. So if even training error can't get below 30 Centipawn … But that only holds for "larger" training sets. On a small training set I can get arbitrarily low errors.
Debugging time I guess.

brianr · Post by **brianr** » Sun Aug 26, 2018 11:33 pm

Henk wrote: ↑Sun Aug 26, 2018 9:29 pm I am not using any framework (except .net framework). I don't have gpu's. I already set error too ridiculously high value. Say 30 centipawns or more. So it is just terribly slow. So if even training error can't get below 30 Centipawn … But that only holds for "larger" training sets. On a small training set I can get arbitrarily low errors.
Debugging time I guess.

OK, no GPU, no framework...

Then, I suggest using FANN with the GUI to test various net sizes and number of layers, LRs, etc.

You can then code FANN (http://leenissen.dk/fann/wp/) with C or use neural2d (see http://neural2d.net/?page_id=19),

When you get to a framework, suggest Keras with TensorFlow.
Works with CPU, GPU, Win, Linux, Phones...and TensorBoard let's you see how your net is doing.
The machinelearning blog will help, or you could stick with just Python.

Perhaps try something simple, like learning 3 piece endgames to play nearly perfectly.

Henk · Post by **Henk** » Tue Aug 28, 2018 12:40 pm

Because training error is not that important now I measured average validation error. Error calculated over unseen positions

Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes

Looks not very promising.

For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.

Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59

So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.

brianr · Post by **brianr** » Tue Aug 28, 2018 1:23 pm

Henk wrote: ↑Tue Aug 28, 2018 12:40 pm Because training error is not that important now I measured average validation error. Error calculated over unseen positions

Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes

Looks not very promising.

For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.

Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59

So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.

Again, the trend is important, and exactly what would be expected.
Sorry, don't know how to get an actual image inserted.
Just search for say "mse tensorboard" and look at the images.
For example:
https://machinelearningmastery.com/disp ... -in-keras/

jorose · Post by **jorose** » Tue Aug 28, 2018 2:24 pm

Henk wrote: ↑Tue Aug 28, 2018 12:40 pm Because training error is not that important now I measured average validation error. Error calculated over unseen positions

Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes

Looks not very promising.

For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.

Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59

So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.

Just wanted to mention I don't think things are as bad as you feel they are, you seem to be on the right track.

I don't understand your loss function, where is the target value?

Dividing by 400 probably doesn't make too much sense when you are working with a neural net, keeping inputs to values which are usually between +1 and -1 is usually recommended as being easier for the network to learn. Dividing by 400 forces the network to produce huge weights near the output, which you don't want.

As brianr already mentioned (is there a way to turn on the names like we used to have on the forum) the trend is what is most important. By the sound of it things are improving over time, which is very good! You can build on that; it is much harder when things remain around truly random values.

Henk · Post by **Henk** » Tue Aug 28, 2018 2:58 pm

jorose wrote: ↑Tue Aug 28, 2018 2:24 pm
Henk wrote: ↑Tue Aug 28, 2018 12:40 pm Because training error is not that important now I measured average validation error. Error calculated over unseen positions

Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes

Looks not very promising.

For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.

Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59

So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.
Just wanted to mention I don't think things are as bad as you feel they are, you seem to be on the right track.

I don't understand your loss function, where is the target value?

Dividing by 400 probably doesn't make too much sense when you are working with a neural net, keeping inputs to values which are usually between +1 and -1 is usually recommended as being easier for the network to learn. Dividing by 400 forces the network to produce huge weights near the output, which you don't want.

As brianr already mentioned (is there a way to turn on the names like we used to have on the forum) the trend is what is most important. By the sound of it things are improving over time, which is very good! You can build on that; it is much harder when things remain around truly random values.

Output value of training example is tanh(Eval(position/ 400)) giving a value between 1 and -1.
I divide by 400 centipawn for tanh(2) is already almost equal to 1.

For loss function I still use mean square error. Maybe I should change that. For activation function I use SELU. I read that if you use a SELU you don't need batch normalization for it is self normalizing. But computing an Exp(x) is one of slowest operations during training.

Code: Select all

       static public double SELU(double x)
        {
            return 1.0507 * (x >= 0 ? x : 1.67326 * (Math.Exp(x) - 1));
        }

Learning time growing exponentially with number of training examples

Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples

Re: Learning time growing exponentially with number of training examples