Dann Corbit wrote:I just completed calculation using mm of Elo for junkbase (more than 10 million games, of which more than 9.5 million had recognizable termination).
It used 1 GB of RAM and took several hours.
The cost of parsing the PGN file should be linear in the file size. The cost of computation is not related to the number of games, only to the number of players.
The memory requirement of bayeselo is quadratic in the number of players, and the computation time is cubic in the number of players.
I can't use locks because I use third-party software (cutechess-cli) which writes the pgns without locks.
Howver now that I have separate files for each processor all problems are solved. For the first time I got 40k games with no loss !
I failed to notice this problem because it rarely happens for tcs larger than 40/30 sec.
Daniel
Daniel Shawul wrote:I can't use locks because I use third-party software (cutechess-cli) which writes the pgns without locks.
Howver now that I have separate files for each processor all problems are solved. For the first time I got 40k games with no loss !
I failed to notice this problem because it rarely happens for tcs larger than 40/30 sec.
Daniel
The question to answer is "what is the probability that any two games will end at exactly the same time, so that you get two different instances of the thing trying to open the same file and write to it at the same time?" The answer is, the more games you play in parallel, the greater the probability. The faster the games, the greater the probability.
I originally had a similar problem since I can produce 100K games an hour at reasonable time controls, and 1M games an hour at ultra-fast times. I would almost never get exactly the right number of games reported by BayesElo until I solved that.
Dann Corbit wrote:I just completed calculation using mm of Elo for junkbase (more than 10 million games, of which more than 9.5 million had recognizable termination).
It used 1 GB of RAM and took several hours.
The cost of parsing the PGN file should be linear in the file size. The cost of computation is not related to the number of games, only to the number of players.
The memory requirement of bayeselo is quadratic in the number of players, and the computation time is cubic in the number of players.
Rémi
That is one of the reasons I can handle 1M games in 30 seconds. Most of the time is spent reading the PGN. I typically have a dozen or two different "players" (mainly different versions of Crafty against the standard gauntlet list) so that the computational time is pretty minimal overall. In fact, it will run much faster, but I access the PGN thru NFS which is not so fast. I could run on the I/O node (which I have at times) and it is blazingly fast since the disks are speedy and there is no network delay.
Daniel Shawul wrote:I can't use locks because I use third-party software (cutechess-cli) which writes the pgns without locks.
Howver now that I have separate files for each processor all problems are solved. For the first time I got 40k games with no loss !
I failed to notice this problem because it rarely happens for tcs larger than 40/30 sec.
Daniel
The question to answer is "what is the probability that any two games will end at exactly the same time, so that you get two different instances of the thing trying to open the same file and write to it at the same time?" The answer is, the more games you play in parallel, the greater the probability. The faster the games, the greater the probability.
I originally had a similar problem since I can produce 100K games an hour at reasonable time controls, and 1M games an hour at ultra-fast times. I would almost never get exactly the right number of games reported by BayesElo until I solved that.
I use my own tester which will manage many games simultaneously on a single machine. I don't have any issues because it's single threaded and use libev, an event library.
Of course this is not scalable but I think the limitations are pretty minor - I can set up 48 simultaneous games for instance on my quad and it will play the same number of games per minute as if I set 4.
My tester won't work on a cluster (unless of course I use rsh or ssh) to make the connections, but then the communication between programs and testers are over the network. I'm sure all of this would work fine if the network was fast and reliable and I was not asking for 50 games a second.
From my experiance it doesnt have a limit. However I run it from wine under linux and don't give it a PGN but instead feed it a pgn via stdin via pipe |
jshriver wrote:From my experiance it doesnt have a limit. However I run it from wine under linux and don't give it a PGN but instead feed it a pgn via stdin via pipe |
Why not download the source, compile, and run it directly without wine? I use it all the time myself and it has compiled cleanly on all sorts of linux distros including our specialized cluster (ROCKS) version.