I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
partition result into subgroups
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
-
bob
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: partition result into subgroups
My first thought is to do a simple approach to shuffling. If you have N values, generate a random number between one and N, and remove that result and stick it in partition 1. Then generate another random number between one and N-1 (since the list is one smaller) and add that to partition 1. Repeat until partition 1 is large enough to satisfy you. Then do the same for partition 2, ...Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
I assume there is some sort of correlation between consecutive elements that you want to spread across different partitions? Otherwise you could generate a number between 1 and N-max_partition_size and just grab N consecutive results for the first partition, then remove them and repeat.
-
gladius
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: partition result into subgroups
This is ugly, but it works.Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
Code: Select all
#include <algorithm>
#include <vector>
#include <cstdlib>
#include <cstdio>
std::vector<int> generate(int value, int n) {
std::vector<int> nums;
while (nums.size() < n - 1) {
int r = rand() % (value / std::max(1, n / 7));
nums.push_back(r);
value -= r;
}
nums.push_back(value);
return nums;
}
int main() {
std::vector<int> result = generate(1234567, 55);
int check = 0;
for (int i = 0; i < result.size(); ++i) {
check += result[i];
printf("%d,", result[i]);
}
printf("\nCheck: %d\nSize: %d\n", check, result.size());
return 0;
}
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: partition result into subgroups
I think my first thought was also along those lines but meeting the raw constraint that each partition be of size N/k and the column constraint for W,D,L that the sums be equal to the total makes it complicated. I just looked into your suggestions now so I will take a closer look later and get back..bob wrote:My first thought is to do a simple approach to shuffling. If you have N values, generate a random number between one and N, and remove that result and stick it in partition 1. Then generate another random number between one and N-1 (since the list is one smaller) and add that to partition 1. Repeat until partition 1 is large enough to satisfy you. Then do the same for partition 2, ...Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
I assume there is some sort of correlation between consecutive elements that you want to spread across different partitions? Otherwise you could generate a number between 1 and N-max_partition_size and just grab N consecutive results for the first partition, then remove them and repeat.
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: partition result into subgroups
Thanks a lot Gary for taking the time to present it in code! Ok so the code can partition W wins in to k groups, and similarly L losses into k groups, and finally for the losses I just take Li = N / k - Wi - Di. I think that should work unless I missed something. Will check and come back.
Thanks again
Edit: There is a problem. Say I have N(W,D,L) = 2100(1000,1000,100) and I want to divide in to 10 thus 210(100,100,10) games each then say I get 50,50 for W & D, then for L it has to be 210 - 50 - 50 = 110. But we have only a total of 100 losses.
Thanks again
Edit: There is a problem. Say I have N(W,D,L) = 2100(1000,1000,100) and I want to divide in to 10 thus 210(100,100,10) games each then say I get 50,50 for W & D, then for L it has to be 210 - 50 - 50 = 110. But we have only a total of 100 losses.
-
Pio
- Posts: 334
- Joined: Sat Feb 25, 2012 10:42 pm
- Location: Stockholm
Re: partition result into subgroups
Hi Daniel!
I would partition the wins, W, first into the k subgroups, then you know that the number of losses, L, has to be the same for each partition.
Then I would do the same with the draws D, i.e I would partition the number of draws into the k subgroups.
One way of partitioning the wins could for example be done by realising that the problem is equivalent to put W nr of 1:s and then randomly insert k-1 nr of commas (,) between them and then counting how many 1:s are between each comma. The same can of course be done for the draws
Good luck!
I would partition the wins, W, first into the k subgroups, then you know that the number of losses, L, has to be the same for each partition.
Then I would do the same with the draws D, i.e I would partition the number of draws into the k subgroups.
One way of partitioning the wins could for example be done by realising that the problem is equivalent to put W nr of 1:s and then randomly insert k-1 nr of commas (,) between them and then counting how many 1:s are between each comma. The same can of course be done for the draws
Good luck!
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: partition result into subgroups
Hi Pio
The problem is I can't store the results in an array since I could have millions of games. Say a result of 6million(3million,1million,2million). And also each raw should sum up to N/k which means I have only 2 degrees of freedom. That means,one of W,D or L column is dependent on the others. If it was a repeated random sub-sampling then all I have to do is generate random numbers rand(N) continuously and each group gets N/k results. But I want to do k-fold cross validations test with partitions where repetitions are not allowed.
Daniel
The problem is I can't store the results in an array since I could have millions of games. Say a result of 6million(3million,1million,2million). And also each raw should sum up to N/k which means I have only 2 degrees of freedom. That means,one of W,D or L column is dependent on the others. If it was a repeated random sub-sampling then all I have to do is generate random numbers rand(N) continuously and each group gets N/k results. But I want to do k-fold cross validations test with partitions where repetitions are not allowed.
Daniel
-
kbhearn
- Posts: 411
- Joined: Thu Dec 30, 2010 4:48 am
Re: partition result into subgroups
How fast does it need to be? If it can have some computational overhead to iterate through a random partition then you could use a pseudorandom permutation and not need to store the results of the permutation, just use it to translate indexes as you need them. Changing the key would give you a different partitioning.
i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.
i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.
-
Pio
- Posts: 334
- Joined: Sat Feb 25, 2012 10:42 pm
- Location: Stockholm
Re: partition result into subgroups
Hi Daniel!
If you have space requirements you could do the same as my previous post with the exception that you only save the k-1 commas postions (uniformly randomly distributed between 0 and w + k - 2) + the two paranthesis positions (position -1 for "(" and position W + k -1 for ")" in an array a, sort the array a with the respect to the positions and then subtract a[t] - a[t-1] to get the numer of partion t.
When you have the number of wins for partition 1 you could get the number of draws for that partition and then subtract the number of draws for partition one from the total number of draws and then do the same for partition 2...
I hope I do not have made too many errors.
Good luck!
If you have space requirements you could do the same as my previous post with the exception that you only save the k-1 commas postions (uniformly randomly distributed between 0 and w + k - 2) + the two paranthesis positions (position -1 for "(" and position W + k -1 for ")" in an array a, sort the array a with the respect to the positions and then subtract a[t] - a[t-1] to get the numer of partion t.
When you have the number of wins for partition 1 you could get the number of draws for that partition and then subtract the number of draws for partition one from the total number of draws and then do the same for partition 2...
I hope I do not have made too many errors.
Good luck!
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: partition result into subgroups
It can be slow as hell since it is done one time. Can you give an example of a PRP that can generate permutation of [0..N] integers where N can be as big as a million?kbhearn wrote:How fast does it need to be? If it can have some computational overhead to iterate through a random partition then you could use a pseudorandom permutation and not need to store the results of the permutation, just use it to translate indexes as you need them. Changing the key would give you a different partitioning.
i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.