Using Mini-Batch for tunig

Desperado · Post by **Desperado** » Tue Jan 12, 2021 8:30 pm

I have read in several threads that some of the people tune their data with so called "Mini-Batches",
a subset of the total dataset. What is the idea and how can it be used.

A link to an easy introduction on that topic would already be interesting.

Thanks in advance.

xr_a_y · Post by **xr_a_y** » Tue Jan 12, 2021 8:47 pm

Maybe this ?
https://ruder.io/optimizing-gradient-descent/

derjack · Post by **derjack** » Tue Jan 12, 2021 9:11 pm

Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.

Desperado · Post by **Desperado** » Tue Jan 12, 2021 9:30 pm

derjack wrote: ↑Tue Jan 12, 2021 9:11 pm Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.

Splitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.

I can imagine, that results in more updates of an parameter vector and you reach very fast a semi-good solution.
Further i would guess the challenge could be to calm down to avoid fluctuations or divergence.

In basic algorithms where i don't have something like a learning rate i would control it with stepcount * stepsize when updating a parameter.
Additionally someone could operate on every parameter in the beginning and later only pick a subset of parameter vector.

What do you think ? sorry, a complete new field for me.

derjack · Post by **derjack** » Tue Jan 12, 2021 10:07 pm

Desperado wrote: ↑Tue Jan 12, 2021 9:30 pm
derjack wrote: ↑Tue Jan 12, 2021 9:11 pm Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.
Splitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.

Actually it is algorithm specific, or gradient descent specific. In mini-batches you still operate on the entire data in each epoch.

Maybe you misunderstood with something like using only a subset of positions and training on them, then using other subset of positions etc. training separately for each subsets, and since they are smaller, the training would be faster.

Desperado · Post by **Desperado** » Tue Jan 12, 2021 10:21 pm

derjack wrote: ↑Tue Jan 12, 2021 10:07 pm
Desperado wrote: ↑Tue Jan 12, 2021 9:30 pm
derjack wrote: ↑Tue Jan 12, 2021 9:11 pm Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.
Splitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.

Actually it is algorithm specific, or gradient descent specific. In mini-batches you still operate on the entire data in each epoch.

Maybe you misunderstood with something like using only a subset of positions and training on them, then using other subset of positions etc. training separately for each subsets, and since they are smaller, the training would be faster.

What i mean is that you accept changes during an epoch, so you update the reference fitness too (in the hope you are on the right track).
The would speed up things at the beginning but you would need to "cool down" the learning rate in later epochs, so you don't begin to fluctuate or to diverge because of too many updates of the reference fitness.

Using Mini-Batch for tunig

Using Mini-Batch for tunig

Re: Using Mini-Batch for tunig

Re: Using Mini-Batch for tunig

Re: Using Mini-Batch for tunig

Re: Using Mini-Batch for tunig

Re: Using Mini-Batch for tunig