partition result into subgroups

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 31, 2012 5:19 pm

I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks

bob · Post by **bob** » Fri Aug 31, 2012 5:44 pm

Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks

My first thought is to do a simple approach to shuffling. If you have N values, generate a random number between one and N, and remove that result and stick it in partition 1. Then generate another random number between one and N-1 (since the list is one smaller) and add that to partition 1. Repeat until partition 1 is large enough to satisfy you. Then do the same for partition 2, ...

I assume there is some sort of correlation between consecutive elements that you want to spread across different partitions? Otherwise you could generate a number between 1 and N-max_partition_size and just grab N consecutive results for the first partition, then remove them and repeat.

gladius · Post by **gladius** » Fri Aug 31, 2012 5:45 pm

Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks

This is ugly, but it works.

Code: Select all

#include <algorithm>
#include <vector>
#include <cstdlib>
#include <cstdio>

std::vector<int> generate(int value, int n) {
  std::vector<int> nums;
  while (nums.size() < n - 1) {
    int r = rand() % (value / std::max(1, n / 7));
    nums.push_back(r);
    value -= r;
  }
  nums.push_back(value);
  return nums;
}

int main() {
  std::vector<int> result = generate(1234567, 55);
  int check = 0;
  for (int i = 0; i < result.size(); ++i) {
    check += result[i];
    printf("%d,", result[i]);
  }
  printf("\nCheck: %d\nSize: %d\n", check, result.size());
  return 0;
}

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 31, 2012 8:09 pm

bob wrote:
Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
My first thought is to do a simple approach to shuffling. If you have N values, generate a random number between one and N, and remove that result and stick it in partition 1. Then generate another random number between one and N-1 (since the list is one smaller) and add that to partition 1. Repeat until partition 1 is large enough to satisfy you. Then do the same for partition 2, ...

I assume there is some sort of correlation between consecutive elements that you want to spread across different partitions? Otherwise you could generate a number between 1 and N-max_partition_size and just grab N consecutive results for the first partition, then remove them and repeat.

I think my first thought was also along those lines but meeting the raw constraint that each partition be of size N/k and the column constraint for W,D,L that the sums be equal to the total makes it complicated. I just looked into your suggestions now so I will take a closer look later and get back..

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 31, 2012 8:13 pm

Thanks a lot Gary for taking the time to present it in code! Ok so the code can partition W wins in to k groups, and similarly L losses into k groups, and finally for the losses I just take Li = N / k - Wi - Di. I think that should work unless I missed something. Will check and come back.
Thanks again
Edit: There is a problem. Say I have N(W,D,L) = 2100(1000,1000,100) and I want to divide in to 10 thus 210(100,100,10) games each then say I get 50,50 for W & D, then for L it has to be 210 - 50 - 50 = 110. But we have only a total of 100 losses.

Pio · Post by **Pio** » Fri Aug 31, 2012 8:56 pm

Hi Daniel!

I would partition the wins, W, first into the k subgroups, then you know that the number of losses, L, has to be the same for each partition.

Then I would do the same with the draws D, i.e I would partition the number of draws into the k subgroups.

One way of partitioning the wins could for example be done by realising that the problem is equivalent to put W nr of 1:s and then randomly insert k-1 nr of commas (,) between them and then counting how many 1:s are between each comma. The same can of course be done for the draws

Good luck!

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 31, 2012 9:18 pm

Hi Pio
The problem is I can't store the results in an array since I could have millions of games. Say a result of 6million(3million,1million,2million). And also each raw should sum up to N/k which means I have only 2 degrees of freedom. That means,one of W,D or L column is dependent on the others. If it was a repeated random sub-sampling then all I have to do is generate random numbers rand(N) continuously and each group gets N/k results. But I want to do k-fold cross validations test with partitions where repetitions are not allowed.
Daniel

kbhearn · Post by **kbhearn** » Fri Aug 31, 2012 9:38 pm

How fast does it need to be? If it can have some computational overhead to iterate through a random partition then you could use a pseudorandom permutation and not need to store the results of the permutation, just use it to translate indexes as you need them. Changing the key would give you a different partitioning.

i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.

Pio · Post by **Pio** » Fri Aug 31, 2012 9:39 pm

Hi Daniel!

If you have space requirements you could do the same as my previous post with the exception that you only save the k-1 commas postions (uniformly randomly distributed between 0 and w + k - 2) + the two paranthesis positions (position -1 for "(" and position W + k -1 for ")" in an array a, sort the array a with the respect to the positions and then subtract a[t] - a[t-1] to get the numer of partion t.

When you have the number of wins for partition 1 you could get the number of draws for that partition and then subtract the number of draws for partition one from the total number of draws and then do the same for partition 2...

I hope I do not have made too many errors.

Good luck!

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 31, 2012 11:04 pm

kbhearn wrote:How fast does it need to be? If it can have some computational overhead to iterate through a random partition then you could use a pseudorandom permutation and not need to store the results of the permutation, just use it to translate indexes as you need them. Changing the key would give you a different partitioning.

i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.

It can be slow as hell since it is done one time. Can you give an example of a PRP that can generate permutation of [0..N] integers where N can be as big as a million?

partition result into subgroups

partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups

Re: partition result into subgroups