caution about random number seeding

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

caution about random number seeding

Post by Daniel Shawul »

I run into two random number seeding problems on a cluster that I felt worth mentioning to warn others in the future. I use time(NULL) to pick a seed.

Caution 1 : That is not enough when you are running multiple versions of the program on a cluster. I was trying to do parallel montecarlo simulation and that gave me same simulations from all the procesors. Hence seed it with porcessor id as well. srand(time(NULL) * (get_pid() + 1));

Caution 2 : If you are doing a tree spliting operation, the hash keys in your history stack need to be the same ! So before generation of the hash keys seed the random number to a known constant. First save the current seed, then set it to a known constant, then generate your hash keys, then seed it back to the saved seed. In books and other places you would need different random numbers in each run.

That is two bugs solved related to seeding a random number.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: caution about random number seeding

Post by bob »

Daniel Shawul wrote:I run into two random number seeding problems on a cluster that I felt worth mentioning to warn others in the future. I use time(NULL) to pick a seed.

Caution 1 : That is not enough when you are running multiple versions of the program on a cluster. I was trying to do parallel montecarlo simulation and that gave me same simulations from all the procesors. Hence seed it with porcessor id as well. srand(time(NULL) * (get_pid() + 1));

Caution 2 : If you are doing a tree spliting operation, the hash keys in your history stack need to be the same ! So before generation of the hash keys seed the random number to a known constant. First save the current seed, then set it to a known constant, then generate your hash keys, then seed it back to the saved seed. In books and other places you would need different random numbers in each run.

(1) is well-known. (2) I don't get in why would you want to seed the PRNG to produce < 1000 Zobrist randoms??? Use the default seeding to initialize, then re-seed if you want to change the stream to alter book randomness or whatever...

That is two bugs solved related to seeding a random number.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: caution about random number seeding

Post by Daniel Shawul »

(1) is well-known.
(2) I don't get in why would you want to seed the PRNG to produce < 1000 Zobrist randoms??? Use the default seeding to initialize, then re-seed if you want to change the stream to alter book randomness or whatever...
Take it easy...this is just a cautionary note for whoever needs it.

I prefer calculating it on the fly for convenience. I have many games implemented with different board size , pieces, holdings
which are known only at run time.And also many places which require constant or variable seeds. books, montecarlo search, go string hashing etc...

So to me the question is Why do you precalculate ???
rand() is good enough for chess and you can seed from a constant to get the same stream.
To each his own.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: caution about random number seeding

Post by bob »

Daniel Shawul wrote:
(1) is well-known.
(2) I don't get in why would you want to seed the PRNG to produce < 1000 Zobrist randoms??? Use the default seeding to initialize, then re-seed if you want to change the stream to alter book randomness or whatever...
Take it easy...this is just a cautionary note for whoever needs it.

I prefer calculating it on the fly for convenience. I have many games implemented with different board size , pieces, holdings
which are known only at run time.And also many places which require constant or variable seeds. books, montecarlo search, go string hashing etc...

So to me the question is Why do you precalculate ???
rand() is good enough for chess and you can seed from a constant to get the same stream.
To each his own.
By "pre-calculate" I mean do it when you start the engine. There is no reasonable explanation for using a seed there, since you want the same numbers every time you start up, assuming you use a zobrist signature for book position recognition and such...

It is not easy to come up with a better seed than the default, generally, as it has been pretty thoroughly tested. If you want to re-seed (such as one might want in a cluster-based monte-carlo simulation), then it makes perfect sense to do so.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: caution about random number seeding

Post by Daniel Shawul »

It seems that the thread id should be mixed in there too. But anyway I think a linear congruential generator like LCG is not recommended for monte-carlo simulation, similar to cryptography ,so I have to drop it. I remember now that I was forced to use mersenne twister even for majic generation the brute fore way...

Another problem is reproducing the monte carlo simulation itself. For debugging purposes, I think I need to either save the used seeds for each processor Or use a constant seed for each. But the later might give a bad result. srand(1) is the default and gauranted to be good. But I don't think srand(0), srand(2)...srand(id) are good so I can't use that.. If this works, I actually don't need to use time for seed atleast for the MC part.
Maybe I can generate random seeds and broadcast them to each processor. But the communication will not be started by the time I start generating hash_keys, reading books etc... But if that is possible it would save me the trouble of saving the seeds for each processor.

Edit: Another need for the parallel MC simulation is to require a simulation by 1 processor to be same as by two,three or more processors. So that means each processor should sample a unique section of the same pool of random numbers. I don't think C has this kind of stuff but MPI has. Note that choosing distinct seeds at start up do not avoid overlap of generated sequence! So I am going to use MPI for the MC part and use rand() with time seed for the rest.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: caution about random number seeding

Post by bob »

Daniel Shawul wrote:It seems that the thread id should be mixed in there too. But anyway I think a linear congruential generator like LCG is not recommended for monte-carlo simulation, similar to cryptography ,so I have to drop it. I remember now that I was forced to use mersenne twister even for majic generation the brute fore way...

Another problem is reproducing the monte carlo simulation itself. For debugging purposes, I think I need to either save the used seeds for each processor Or use a constant seed for each. But the later might give a bad result. srand(1) is the default and gauranted to be good. But I don't think srand(0), srand(2)...srand(id) are good so I can't use that.. If this works, I actually don't need to use time for seed atleast for the MC part.
Maybe I can generate random seeds and broadcast them to each processor. But the communication will not be started by the time I start generating hash_keys, reading books etc... But if that is possible it would save me the trouble of saving the seeds for each processor.

Edit: Another need for the parallel MC simulation is to require a simulation by 1 processor to be same as by two,three or more processors. So that means each processor should sample a unique section of the same pool of random numbers. I don't think C has this kind of stuff but MPI has. Note that choosing distinct seeds at start up do not avoid overlap of generated sequence! So I am going to use MPI for the MC part and use rand() with time seed for the rest.
I give this as a programming assignment in a course I teach. The Monte-Carlo part of the assignment is not that hard. But I require _identical_ results regardless of how many nodes they use, which makes life interesting for the random number generation issue...

It is simple enough to take a good PRNG, and if you are going to use 16 nodes max, then you can use 16 starting seeds. And for a single node test, you take the first N/16 randoms from the first seed, then then next N/16 from the next, until you have N randoms. non-power-of-2 numbers of nodes make this even more complicated, but the idea works for any PRNG that can be seeded.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: caution about random number seeding

Post by Daniel Shawul »

Like I have pointed out already, that approach has serious problems. Even though you seed them from different constant, (1) it is not gauranteed to be independent (2) it has serious correlation problems. It is easy to implement however.
Parallel random number generation is a whole new field by itself. I can only find packages SPRNG and RngStream that address part of the issue. So I guess for now I am stuck with seed from different numbers as I do not want to use that..
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: caution about random number seeding

Post by Daniel Shawul »

I feel that initializing from time(NULL) at start up is a bad idea if you need reproducibility.
Hence something like,

Code: Select all

     srand&#40;processor_rank + 1&#41;  ->first processor gets srand&#40;1&#41; which is the default and "strongest"
                                  the rest srand&#40;2&#41;,srand&#40;3&#41; ...
This gives reproducibility if the same number of processors are used for each run. If you want reproducibility
with different processors you have to sample from the same pool of random numbers. With seeding
parallel streams may be highly correlated. So it may be better to drop the seeding idea all in all and use SPRNG or somethng similar for best MC simulations.

Then for books or randomizing of root moves, reseed with time and continue the "constant" stream after
the particular job is done. This requires re-seeding everytime a book is probed or a root move is randomized since we decided to
start from constant seed at startup. If you need "variable" random numbers inside search tree, this
is probably a bad idea though.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: caution about random number seeding

Post by bob »

Daniel Shawul wrote:Like I have pointed out already, that approach has serious problems. Even though you seed them from different constant, (1) it is not gauranteed to be independent (2) it has serious correlation problems. It is easy to implement however.
Parallel random number generation is a whole new field by itself. I can only find packages SPRNG and RngStream that address part of the issue. So I guess for now I am stuck with seed from different numbers as I do not want to use that..
If you seed properly, there is no correlation if you use a good PRNG. Certainly not a LCG type. Best bet is to always test random numbers if the results are important. That's part of the programming assignment here. There are a number of pretty straight-forward tests one can do to feel fairly confident that the numbers are reasonable.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: caution about random number seeding

Post by Daniel Shawul »

If you seed properly, there is no correlation if you use a good PRNG. Certainly not a LCG type. Best bet is to always test random numbers if the results are important. That's part of the programming assignment here. There are a number of pretty straight-forward tests one can do to feel fairly confident that the numbers are reasonable.
Mersenne twister with its big period should be fine. Also it uses an LCG to modify the passed seed
hence calling srand with processor rank should be fine. But LCGs like rand() has 2**32, divide that by N. Also correlation reduces the period say by atleast 1/5th , so overall only a small section
of the period is available to a single processor.

I think reproducibility for _timed_ MC simulations is somewhat difficult because you wouldn't know how many
simulations will be carried out in advance (i.e N) and also different processors may do unequal number of simulations (i.e not
necessarily N/16) . I have dropped that requirement for now. Anyway what could go wrong in a parallel montecarlo ?!