Maybe something wrong with my network:
When I do only supervised learning:
Learning 500 examples takes about 30 seconds
1000 examples about a minute
3000 examples 15 minutes
4000 examples 99 minutes
So I don't even want to try 8000 examples.
Is this normal for neural networks and what timing data do you have ?
[Might be I am using wrong value for hyperparameters. I am using Adam optimization algorithm]
Learning time growing exponentially with number of training examples
Moderators: hgm, Rebel, chrisw
-
- Posts: 7220
- Joined: Mon May 27, 2013 10:31 am
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: Learning time growing exponentially with number of training examples
No, this is not normal. Although I guess it depends on what exactly you are timing (time for one epoch? convergence on the training set only? something else?).
-
- Posts: 7220
- Joined: Mon May 27, 2013 10:31 am
Re: Learning time growing exponentially with number of training examples
I don't know what epoch is. Examples form one training set. Algorithm stops when average error for pass over training set is less than some value.
Can't split training set in smaller ones. For when for instance learning of last part weights might have changed that affect evaluation of examples from first part. So after learning last part the error for first part might have changed.
So error is counted over all examples of the training set after weights had been changed.
[By the way in last test now learning 3500 examples took 51 minutes ]
-
- Posts: 536
- Joined: Thu Mar 09, 2006 3:01 pm
Re: Learning time growing exponentially with number of training examples
If you don't know what an "epoch" is may I suggest you find out.Henk wrote: ↑Sun Aug 26, 2018 8:27 pmI don't know what epoch is. Examples form one training set. Algorithm stops when average error for pass over training set is less than some value.
Can't split training set in smaller ones. For when for instance learning of last part weights might have changed that affect evaluation of examples from first part. So after learning last part the error for first part might have changed.
So error is counted over all examples of the training set after weights had been changed.
[By the way in last test now learning 3500 examples took 51 minutes ]
Algorithm stopping at some "average error" point does not sound very helpful.
I think more important is the trend in the error and how does the net perform.
You should learn to see how your NN is doing with some graphical tools.
One resource I found helpful is here (look for intro posts):
https://machinelearningmastery.com/blog/
What framework are you using?
CPU or GPU tools?
It is probably unlikely in your case (although for chess, less than millions of samples will not yield much), but one thing that can happen with small sample sizes is that your net can actually "memorize" them all, which is quite fast. Then, the error becomes very low (depending on your test/validation/test sample sets), so your run time will be lower. Using more samples than can be memorized is when the net starts to generalize and "learn", so reaching a lower error will take much longer.
-
- Posts: 7220
- Joined: Mon May 27, 2013 10:31 am
Re: Learning time growing exponentially with number of training examples
I am not using any framework (except .net framework). I don't have gpu's. I already set error too ridiculously high value. Say 30 centipawns or more. So it is just terribly slow. So if even training error can't get below 30 Centipawn … But that only holds for "larger" training sets. On a small training set I can get arbitrarily low errors.
Debugging time I guess.
Debugging time I guess.
-
- Posts: 536
- Joined: Thu Mar 09, 2006 3:01 pm
Re: Learning time growing exponentially with number of training examples
OK, no GPU, no framework...Henk wrote: ↑Sun Aug 26, 2018 9:29 pm I am not using any framework (except .net framework). I don't have gpu's. I already set error too ridiculously high value. Say 30 centipawns or more. So it is just terribly slow. So if even training error can't get below 30 Centipawn … But that only holds for "larger" training sets. On a small training set I can get arbitrarily low errors.
Debugging time I guess.
Then, I suggest using FANN with the GUI to test various net sizes and number of layers, LRs, etc.
You can then code FANN (http://leenissen.dk/fann/wp/) with C or use neural2d (see http://neural2d.net/?page_id=19),
When you get to a framework, suggest Keras with TensorFlow.
Works with CPU, GPU, Win, Linux, Phones...and TensorBoard let's you see how your net is doing.
The machinelearning blog will help, or you could stick with just Python.
Perhaps try something simple, like learning 3 piece endgames to play nearly perfectly.
-
- Posts: 7220
- Joined: Mon May 27, 2013 10:31 am
Re: Learning time growing exponentially with number of training examples
Because training error is not that important now I measured average validation error. Error calculated over unseen positions
Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes
Looks not very promising.
For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.
Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59
So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.
Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes
Looks not very promising.
For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.
Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59
So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.
-
- Posts: 536
- Joined: Thu Mar 09, 2006 3:01 pm
Re: Learning time growing exponentially with number of training examples
Again, the trend is important, and exactly what would be expected.Henk wrote: ↑Tue Aug 28, 2018 12:40 pm Because training error is not that important now I measured average validation error. Error calculated over unseen positions
Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes
Looks not very promising.
For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.
Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59
So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.
Sorry, don't know how to get an actual image inserted.
Just search for say "mse tensorboard" and look at the images.
For example:
https://machinelearningmastery.com/disp ... -in-keras/
-
- Posts: 360
- Joined: Thu Jan 22, 2015 3:21 pm
- Location: Zurich, Switzerland
- Full name: Jonathan Rosenthal
Re: Learning time growing exponentially with number of training examples
Just wanted to mention I don't think things are as bad as you feel they are, you seem to be on the right track.Henk wrote: ↑Tue Aug 28, 2018 12:40 pm Because training error is not that important now I measured average validation error. Error calculated over unseen positions
Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes
Looks not very promising.
For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.
Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59
So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.
I don't understand your loss function, where is the target value?
Dividing by 400 probably doesn't make too much sense when you are working with a neural net, keeping inputs to values which are usually between +1 and -1 is usually recommended as being easier for the network to learn. Dividing by 400 forces the network to produce huge weights near the output, which you don't want.
As brianr already mentioned (is there a way to turn on the names like we used to have on the forum) the trend is what is most important. By the sound of it things are improving over time, which is very good! You can build on that; it is much harder when things remain around truly random values.
-Jonathan
-
- Posts: 7220
- Joined: Mon May 27, 2013 10:31 am
Re: Learning time growing exponentially with number of training examples
Output value of training example is tanh(Eval(position/ 400)) giving a value between 1 and -1.jorose wrote: ↑Tue Aug 28, 2018 2:24 pmJust wanted to mention I don't think things are as bad as you feel they are, you seem to be on the right track.Henk wrote: ↑Tue Aug 28, 2018 12:40 pm Because training error is not that important now I measured average validation error. Error calculated over unseen positions
Getting to
0.25 takes 2 minutes
0.2 takes 3 minutes
0.15 takes 9 minutes
0.14 takes 22 minutes
Looks not very promising.
For error = 0.5 * x * x where x = Tanh(value / 400) and value is measured in centi pawns.
By the way I might as well take |x| for it is not used for training so it does not need mean square errors.
Then for 0.14 |x| = 0.53 ouch.
Atanh(0.53) = 0.59
So value = 0.59 * 400 = 236 centi pawns. Terrible.
No wonder it plays random games.
I don't understand your loss function, where is the target value?
Dividing by 400 probably doesn't make too much sense when you are working with a neural net, keeping inputs to values which are usually between +1 and -1 is usually recommended as being easier for the network to learn. Dividing by 400 forces the network to produce huge weights near the output, which you don't want.
As brianr already mentioned (is there a way to turn on the names like we used to have on the forum) the trend is what is most important. By the sound of it things are improving over time, which is very good! You can build on that; it is much harder when things remain around truly random values.
I divide by 400 centipawn for tanh(2) is already almost equal to 1.
For loss function I still use mean square error. Maybe I should change that. For activation function I use SELU. I read that if you use a SELU you don't need batch normalization for it is self normalizing. But computing an Exp(x) is one of slowest operations during training.
Code: Select all
static public double SELU(double x)
{
return 1.0507 * (x >= 0 ? x : 1.67326 * (Math.Exp(x) - 1));
}