Deep misery

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Henk
Posts: 5077
Joined: Mon May 27, 2013 8:31 am

Re: Deep misery

Post by Henk » Fri Feb 09, 2018 12:50 pm

AlvaroBegue wrote:
Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.
Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.

So when testing gradients I replace my activation function by a continous one with same parameters. Then I know formulas are mainly ok.

Henk
Posts: 5077
Joined: Mon May 27, 2013 8:31 am

Re: Deep misery

Post by Henk » Sat Feb 10, 2018 2:30 pm

Henk wrote:
AlvaroBegue wrote:
Henk wrote:Test (F(x+h)-F(x))/h does not work in discontinues part of a function. For instance if f(u) = u > 0 ? u: 0.01 * u and F(x + h) > 0 but F(x ) < 0
You can't expect equality, but you should get pretty close in most cases. If your h is judiciously chosen, the probability of hitting a corner like that will be very small.
Testing failed because of these errors. And for instance for bias parameter they were not that small. When I replaced activation function by a continous one these errors disappeared.
Found another bug that may have caused these relatively large errors. So previous statement about bias parameter may not be true.

trulses
Posts: 25
Joined: Wed Dec 06, 2017 4:34 pm

Re: Deep misery

Post by trulses » Tue Feb 13, 2018 8:38 pm

Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.

I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.
I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.

Henk
Posts: 5077
Joined: Mon May 27, 2013 8:31 am

Re: Deep misery

Post by Henk » Tue Feb 13, 2018 10:45 pm

trulses wrote:
Henk wrote:I looked at batch normalization. But my network does not use mini-batches and computes gradient for loss over one training example.

I read that probably ELU or SELU might help and is faster than using batch normalization. If not then I switch over to mini batches.
I logged on for the sole purpose of saying this, but implement mini-batching as your top priority. It's going to stabilize everything you're doing by some massive amount.
Using SELU now. Some say it's the best others are not so enthusiastic about it. I think I have to calculate variance and mean of the input values and see if it works or not.

Post Reply