Deep misery

Henk · Post by **Henk** » Fri Feb 09, 2018 1:50 pm

AlvaroBegue wrote:
Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.

Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.

So when testing gradients I replace my activation function by a continous one with same parameters. Then I know formulas are mainly ok.

Henk · Post by **Henk** » Sat Feb 10, 2018 3:30 pm

Henk wrote:
AlvaroBegue wrote:
Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.
Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.

Found another bug that may have caused these relatively large errors. So previous statement about bias parameter may not be true.

trulses · Post by **trulses** » Tue Feb 13, 2018 9:38 pm

Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.

I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.

I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.

Henk · Post by **Henk** » Tue Feb 13, 2018 11:45 pm

trulses wrote:
Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.

I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.
I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.

Using SELU now. Some say it's the best others are not so enthusiastic about it. I think I have to calculate variance and mean of the input values and see if it works or not.

Deep misery

Re: Deep misery

Re: Deep misery

Re: Deep misery

Re: Deep misery