## Using Mini-Batch for tunig

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Posts: 708
Joined: Mon Dec 15, 2008 10:45 am

### Using Mini-Batch for tunig

I have read in several threads that some of the people tune their data with so called "Mini-Batches",
a subset of the total dataset. What is the idea and how can it be used.

A link to an easy introduction on that topic would already be interesting.

xr_a_y
Posts: 1382
Joined: Sat Nov 25, 2017 1:28 pm
Location: France

### Re: Using Mini-Batch for tunig

derjack
Posts: 3
Joined: Fri Dec 27, 2019 7:47 pm
Full name: Jacek Dermont

### Re: Using Mini-Batch for tunig

Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.

Posts: 708
Joined: Mon Dec 15, 2008 10:45 am

### Re: Using Mini-Batch for tunig

derjack wrote:
Tue Jan 12, 2021 8:11 pm
Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.
Splitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.

I can imagine, that results in more updates of an parameter vector and you reach very fast a semi-good solution.
Further i would guess the challenge could be to calm down to avoid fluctuations or divergence.

In basic algorithms where i don't have something like a learning rate i would control it with stepcount * stepsize when updating a parameter.
Additionally someone could operate on every parameter in the beginning and later only pick a subset of parameter vector.

What do you think ? sorry, a complete new field for me.

derjack
Posts: 3
Joined: Fri Dec 27, 2019 7:47 pm
Full name: Jacek Dermont

### Re: Using Mini-Batch for tunig

Tue Jan 12, 2021 8:30 pm
derjack wrote:
Tue Jan 12, 2021 8:11 pm
Stochastic gradient descent could be seen as mini-batch of size 1. So you iterate over your data, one by one and update the weights after one position. In mini-batch you divide your training data into parts of size N (typically in range 32..1024 or more) and update weights once per mini-batch. It is usually more effective than SGD and more parallelizable be it via GPU or CPU, so in practice much faster.
Splitting the training data into parts doesn't sound like it depends on a tuning algorithm itself, so what is the general idea.
Actually it is algorithm specific, or gradient descent specific. In mini-batches you still operate on the entire data in each epoch.

Maybe you misunderstood with something like using only a subset of positions and training on them, then using other subset of positions etc. training separately for each subsets, and since they are smaller, the training would be faster.

Posts: 708
Joined: Mon Dec 15, 2008 10:45 am

### Re: Using Mini-Batch for tunig

derjack wrote:
Tue Jan 12, 2021 9:07 pm