Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.
Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.
So when testing gradients I replace my activation function by a continous one with same parameters. Then I know formulas are mainly ok.
Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.
Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.
Found another bug that may have caused these relatively large errors. So previous statement about bias parameter may not be true.
Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.
I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.
I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.
Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.
I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.
I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.
Using SELU now. Some say it's the best others are not so enthusiastic about it. I think I have to calculate variance and mean of the input values and see if it works or not.