kbhearn wrote:How fast does it need to be? If it can have some computational overhead to iterate through a random partition then you could use a pseudorandom permutation and not need to store the results of the permutation, just use it to translate indexes as you need them. Changing the key would give you a different partitioning.
i.e. translate indexes 0->N/k through a PRP, repeat for next N/k if you want another partition, the PRP guarantees that each starting index will correspond to a unique finishing index.
It can be slow as hell since it is done one time. Can you give an example of a PRP that can generate permutation of [0..N] integers where N can be as big as a million?
you can use a 3 round feistel network as a generic PRP with a set size of any even number of bits.
Then you can convert the n-bit PRP to a PRP from 0...N by running any index > N through the PRP again until you get an index < N (hence you want to keep the order close).
i.e. if 1 million was your N, then you'd use 20 bits for your base PRP.
and if index 3 happens to map to 1031000 which is outside your set, then you'd rerun it and map from 1031000 instead, ultimately it results in a PRP with an arbitrary range - at worst you have to run the PRP an average of 4 times per index if your N is (2^a) + 1 which would require a base PRP of size 2^(a+2) when a is even. when a is odd it'd just be 2^(a+1)
Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
Maybe I missed something but I don't see why you can't store millions of indices in an array. If you mean billions of games I can understand your concern, but 4MB for 1 million games doesn't seem like a problem to me.
Actually this seems to be the best approach. Picking one at a time and decreasing either W , D or L depending on where the picked result fall. I can not be sure till I code it but it sure sounds like much easier than shuffling through the whole list at the beginning
Thanks.
Edit Indeed this works well. Moral of story is if something doesn't occur to you, take some fresh air
Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
Maybe I missed something but I don't see why you can't store millions of indices in an array. If you mean billions of games I can understand your concern, but 4MB for 1 million games doesn't seem like a problem to me.
Actually I don't have large number of games in mind, so I was mentioning million/billions to limit people from coming up with array solutions. In practise W,D,L could be anything an integer or long long could hold. Anyway it turns out I and many others have been looking at it from the wrong perspective. What Bob suggested is simple and a clear winner.
Daniel Shawul wrote:Thanks a lot Gary for taking the time to present it in code! Ok so the code can partition W wins in to k groups, and similarly L losses into k groups, and finally for the losses I just take Li = N / k - Wi - Di. I think that should work unless I missed something. Will check and come back.
Thanks again Edit: There is a problem. Say I have N(W,D,L) = 2100(1000,1000,100) and I want to divide in to 10 thus 210(100,100,10) games each then say I get 50,50 for W & D, then for L it has to be 210 - 50 - 50 = 110. But we have only a total of 100 losses.
Sorry, I should have been a bit more verbose . The idea was to call it once each for w/d/l. This method is marginally more efficient, as it only takes O(k) operations, where k = buckets. The other method takes O(n) operations, where n = # games.
I think there is some kind of systematic error with this method. I did few runs and it seems wins decrease from sample 1 to sample 10, and losses increase. My guess is rand() is running out of cycles or I made a mistake somewhere
Indeed running out of cycles was the problem. Pseudo random number generator with big cycles such as Mersenne twister should be used. Well this turned itself into an interesting problem afterall. Complete code for the curios
#include <stdlib.h>
#include <stdio.h>
void seed_rand(unsigned long);
unsigned long gen_rand();
void gen(int* r,int* rk,int Nk) {
int N = 0;
for(int j = 0;j < 3;j++)
N += r[j];
for(int i = 0;i < Nk;i++) {
int v = gen_rand() % N;
//int v = rand() % N;
int total = 0;
for(int j = 0;j < 3;j++) {
total += r[j];
if(total > v) {
rk[j]++;
r[j]--;
N--;
break;
}
}
}
}
int main() {
int r[3];
int Nk = 4000;
int rk[10][3];
seed_rand(1);
//srand(1);
for(int k = 0;k < 10;k++) {
r[0] = 10000;r[1] = 20000;r[2] = 10000;
for(int i = 0;i < 10;i++) {
rk[i][0] = rk[i][1] = rk[i][2] = 0;
gen(r,rk[i],Nk);
printf("%d %d %d\n",rk[i][0],rk[i][1],rk[i][2]);
}
printf("========\n");
}
return 0;
}
/* Period parameters */
#define N 624
#define M 397
#define MATRIX_A 0x9908b0dfUL /* constant vector a */
#define UPPER_MASK 0x80000000UL /* most significant w-r bits */
#define LOWER_MASK 0x7fffffffUL /* least significant r bits */
static unsigned long mt[N]; /* the array for the state vector */
static int mti=N+1; /* mti==N+1 means mt[N] is not initialized */
/* initializes mt[N] with a seed */
void seed_rand(unsigned long s)
{
mt[0]= s & 0xffffffffUL;
for (mti=1; mti<N; mti++) {
mt[mti] =
(1812433253UL * (mt[mti-1] ^ (mt[mti-1] >> 30)) + mti);
/* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */
/* In the previous versions, MSBs of the seed affect */
/* only MSBs of the array mt[]. */
/* 2002/01/09 modified by Makoto Matsumoto */
mt[mti] &= 0xffffffffUL;
/* for >32 bit machines */
}
}
/* generates a random number on [0,0xffffffff]-interval */
unsigned long gen_rand()
{
unsigned long y;
static unsigned long mag01[2]={0x0UL, MATRIX_A};
/* mag01[x] = x * MATRIX_A for x=0,1 */
if (mti >= N) { /* generate N words at one time */
int kk;
if (mti == N+1) /* if init_genrand() has not been called, */
seed_rand(5489UL); /* a default initial seed is used */
for (kk=0;kk<N-M;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+M] ^ (y >> 1) ^ mag01[y & 0x1UL];
}
for (;kk<N-1;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1UL];
}
y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);
mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL];
mti = 0;
}
y = mt[mti++];
/* Tempering */
y ^= (y >> 11);
y ^= (y << 7) & 0x9d2c5680UL;
y ^= (y << 15) & 0xefc60000UL;
y ^= (y >> 18);
return y;
}
Now the results look more random with no discernible pattern
Daniel Shawul wrote:I seem to have a mental block with permutation stuff and can't figure out what seems to be easy problem.. Say I have N games with WDL results so that N = W + D + L. How to partition this into k sub-groups of N1(W1,D1,L1) so that they add up to N(W,D,L). I thought of random shuffling of N indices and then picking up Nk = N / k elements for each group. But that requires the indices to be stored in an array. That could be very big if I have thousands/millions of games so I want to avoid that. How do you generate n random integers that add up to some value ?
Thanks
Maybe I missed something but I don't see why you can't store millions of indices in an array. If you mean billions of games I can understand your concern, but 4MB for 1 million games doesn't seem like a problem to me.
Actually I don't have large number of games in mind, so I was mentioning million/billions to limit people from coming up with array solutions. In practise W,D,L could be anything an integer or long long could hold. Anyway it turns out I and many others have been looking at it from the wrong perspective. What Bob suggested is simple and a clear winner.
I see, in that case using an array is unjustified. Your implementation of Bob's suggestion is a nice solution for moderate sizes of N, but for really large values it would be more efficient to utilize a function that can sample the hypergeometric distribution. See for example: http://svn.r-project.org/R/trunk/src/nmath/rhyper.c
Thanks Peter. Indeed that is what it is , a multivariate version of hyper geometric distribution. Do you have the 'hyper' function that you used. I am trying to keep a list of utility functions for combinatorics so that I don't have to rewrite them again. Edit: I see it is probably the rhyper function from the svn link you gave. In that case I am afraid I can't use it because I have 6 variables. For home WDL and away WDL. One more thing I needed to take care of was that when N is not divisible by k, the last sub-group should take what ever is left over.