partition result into subgroups

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

partition result into subgroups

Post by Daniel Shawul »

I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: partition result into subgroups

Post by bob »

Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
My first thought is to do a simple approach to shuffling. If you have N values, generate a random number between one and N, and remove that result and stick it in partition 1. Then generate another random number between one and N-1 (since the list is one smaller) and add that to partition 1. Repeat until partition 1 is large enough to satisfy you. Then do the same for partition 2, ...

I assume there is some sort of correlation between consecutive elements that you want to spread across different partitions? Otherwise you could generate a number between 1 and N-max_partition_size and just grab N consecutive results for the first partition, then remove them and repeat.
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: partition result into subgroups

Post by gladius »

Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
This is ugly, but it works.

Code: Select all

#include <algorithm>
#include <vector>
#include <cstdlib>
#include <cstdio>

std&#58;&#58;vector<int> generate&#40;int value, int n&#41; &#123;
  std&#58;&#58;vector<int> nums;
  while &#40;nums.size&#40;) < n - 1&#41; &#123;
    int r = rand&#40;) % &#40;value / std&#58;&#58;max&#40;1, n / 7&#41;);
    nums.push_back&#40;r&#41;;
    value -= r;
  &#125;
  nums.push_back&#40;value&#41;;
  return nums;
&#125;

int main&#40;) &#123;
  std&#58;&#58;vector<int> result = generate&#40;1234567, 55&#41;;
  int check = 0;
  for &#40;int i = 0; i < result.size&#40;); ++i&#41; &#123;
    check += result&#91;i&#93;;
    printf&#40;"%d,", result&#91;i&#93;);
  &#125;
  printf&#40;"\nCheck&#58; %d\nSize&#58; %d\n", check, result.size&#40;));
  return 0;
&#125;
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: partition result into subgroups

Post by Daniel Shawul »

bob wrote:
Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
My first thought is to do a simple approach to shuffling. If you have N values, generate a random number between one and N, and remove that result and stick it in partition 1. Then generate another random number between one and N-1 (since the list is one smaller) and add that to partition 1. Repeat until partition 1 is large enough to satisfy you. Then do the same for partition 2, ...

I assume there is some sort of correlation between consecutive elements that you want to spread across different partitions? Otherwise you could generate a number between 1 and N-max_partition_size and just grab N consecutive results for the first partition, then remove them and repeat.
I think my first thought was also along those lines but meeting the raw constraint that each partition be of size N/k and the column constraint for W,D,L that the sums be equal to the total makes it complicated. I just looked into your suggestions now so I will take a closer look later and get back..
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: partition result into subgroups

Post by Daniel Shawul »

Thanks a lot Gary for taking the time to present it in code! Ok so the code can partition W wins in to k groups, and similarly L losses into k groups, and finally for the losses I just take Li = N / k - Wi - Di. I think that should work unless I missed something. Will check and come back.
Thanks again
Edit: There is a problem. Say I have N(W,D,L) = 2100(1000,1000,100) and I want to divide in to 10 thus 210(100,100,10) games each then say I get 50,50 for W & D, then for L it has to be 210 - 50 - 50 = 110. But we have only a total of 100 losses.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: partition result into subgroups

Post by Pio »

Hi Daniel!

I would partition the wins, W, first into the k subgroups, then you know that the number of losses, L, has to be the same for each partition.

Then I would do the same with the draws D, i.e I would partition the number of draws into the k subgroups.

One way of partitioning the wins could for example be done by realising that the problem is equivalent to put W nr of 1:s and then randomly insert k-1 nr of commas (,) between them and then counting how many 1:s are between each comma. The same can of course be done for the draws :)

Good luck!
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: partition result into subgroups

Post by Daniel Shawul »

Hi Pio
The problem is I can't store the results in an array since I could have millions of games. Say a result of 6million(3million,1million,2million). And also each raw should sum up to N/k which means I have only 2 degrees of freedom. That means,one of W,D or L column is dependent on the others. If it was a repeated random sub-sampling then all I have to do is generate random numbers rand(N) continuously and each group gets N/k results. But I want to do k-fold cross validations test with partitions where repetitions are not allowed.
Daniel
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: partition result into subgroups

Post by kbhearn »

How fast does it need to be? If it can have some computational overhead to iterate through a random partition then you could use a pseudorandom permutation and not need to store the results of the permutation, just use it to translate indexes as you need them. Changing the key would give you a different partitioning.

i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.
Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: partition result into subgroups

Post by Pio »

Hi Daniel!

If you have space requirements you could do the same as my previous post with the exception that you only save the k-1 commas postions (uniformly randomly distributed between 0 and w + k - 2) + the two paranthesis positions (position -1 for "(" and position W + k -1 for ")" in an array a, sort the array a with the respect to the positions and then subtract a[t] - a[t-1] to get the numer of partion t.

When you have the number of wins for partition 1 you could get the number of draws for that partition and then subtract the number of draws for partition one from the total number of draws and then do the same for partition 2...

I hope I do not have made too many errors.

Good luck!
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: partition result into subgroups

Post by Daniel Shawul »

kbhearn wrote:How fast does it need to be? If it can have some computational overhead to iterate through a random partition then you could use a pseudorandom permutation and not need to store the results of the permutation, just use it to translate indexes as you need them. Changing the key would give you a different partitioning.

i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.
It can be slow as hell since it is done one time. Can you give an example of a PRP that can generate permutation of [0..N] integers where N can be as big as a million?