Deep misery

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Deep misery

Post by Henk »

AlvaroBegue wrote:
Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.
Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.

So when testing gradients I replace my activation function by a continous one with same parameters. Then I know formulas are mainly ok.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Deep misery

Post by Henk »

Henk wrote:
AlvaroBegue wrote:
Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.
Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.
Found another bug that may have caused these relatively large errors. So previous statement about bias parameter may not be true.
trulses
Posts: 39
Joined: Wed Dec 06, 2017 5:34 pm

Re: Deep misery

Post by trulses »

Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.

I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.
I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: Deep misery

Post by Henk »

trulses wrote:
Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.

I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.
I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.
Using SELU now. Some say it's the best others are not so enthusiastic about it. I think I have to calculate variance and mean of the input values and see if it works or not.