SCID filesize limitations?

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
jshriver
Posts: 1105
Joined: Wed Mar 08, 2006 8:41 pm
Location: Morgantown, WV, USA
Contact:

SCID filesize limitations?

Post by jshriver » Fri Sep 09, 2011 2:27 am

Greetings,

I was planning to convert a lot of my pgn files into SCID format for analysis and tweaking and have hit a problem. Whenever I open a new database, and go to import from 1 file I keep getting an error "Error opening file" it's not a permission issue, I can cat, vi, whatever and access the file. So guessing it's a filesize.


The only file I was able to work with was 441megs, the rest are about 1.5-4gigs each.

I have a LOT of data, over 140gigs of pgn data so prefer not to break them down even further.

-Josh

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: SCID filesize limitations?

Post by Don » Fri Sep 09, 2011 3:14 am

jshriver wrote:Greetings,

I was planning to convert a lot of my pgn files into SCID format for analysis and tweaking and have hit a problem. Whenever I open a new database, and go to import from 1 file I keep getting an error "Error opening file" it's not a permission issue, I can cat, vi, whatever and access the file. So guessing it's a filesize.


The only file I was able to work with was 441megs, the rest are about 1.5-4gigs each.

I have a LOT of data, over 140gigs of pgn data so prefer not to break them down even further.

-Josh
I'll bet it's a memory issue. 441 is not likely to be an actual file size limitation so it must be that only so much can be in memory and be indexed appropriately.

If it's just a one time thing to get them imported, just use pgn-extract to make smaller files, import them, then delete the small files. You probably have no other reasonable choice.

User avatar
jshriver
Posts: 1105
Joined: Wed Mar 08, 2006 8:41 pm
Location: Morgantown, WV, USA
Contact:

Re: SCID filesize limitations?

Post by jshriver » Fri Sep 09, 2011 3:16 am

I was just writing a perl script (lost my original) to do just that. Split a pgn into smaller chunks.

Didn't know pgn-extract could do that, have the cli args handy by chance? Meanwhile will dig the man page and try.

Thanks for the tip :)
-Josh

User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 2:27 pm

Re: SCID filesize limitations?

Post by Don » Fri Sep 09, 2011 3:28 am

jshriver wrote:I was just writing a perl script (lost my original) to do just that. Split a pgn into smaller chunks.

Didn't know pgn-extract could do that, have the cli args handy by chance? Meanwhile will dig the man page and try.

Thanks for the tip :)
-Josh
pgn-extract is pretty awesome and can do a lot. You can strip comments and clean up incorrect SAN notation and other things.

cat big.pgn | pgn-extract -#1000

To get get 1.pgn 2.pgn 3.pgn etc with 1000 pgn games per file. I would count how many games in the big file to estimate how many games you want in the smaller file:

cat big.pgn | grep Results | wc

User avatar
jshriver
Posts: 1105
Joined: Wed Mar 08, 2006 8:41 pm
Location: Morgantown, WV, USA
Contact:

Re: SCID filesize limitations?

Post by jshriver » Fri Sep 09, 2011 3:30 am

Thanks and quicker than I could figure out via man :) but agree pgn-extract is awesome, it's in the top of my chess utilities.

jwes
Posts: 778
Joined: Sat Jul 01, 2006 5:11 am

Re: SCID filesize limitations?

Post by jwes » Fri Sep 09, 2011 2:26 pm

jshriver wrote:Greetings,

I was planning to convert a lot of my pgn files into SCID format for analysis and tweaking and have hit a problem. Whenever I open a new database, and go to import from 1 file I keep getting an error "Error opening file" it's not a permission issue, I can cat, vi, whatever and access the file. So guessing it's a filesize.


The only file I was able to work with was 441megs, the rest are about 1.5-4gigs each.

I have a LOT of data, over 140gigs of pgn data so prefer not to break them down even further.

-Josh
IIRC, SCID has a limit of 16M games per database.

stevenaaus
Posts: 602
Joined: Wed Oct 13, 2010 7:44 am
Location: Australia
Contact:

Use "pgnscid" command-line tool.

Post by stevenaaus » Sat Sep 10, 2011 9:26 am

jshriver wrote:Greetings,

I was planning to convert a lot of my pgn files into SCID format for analysis and tweaking and have hit a problem. Whenever I open a new database, and go to import from 1 file I keep getting an error "Error opening file" it's not a permission issue, I can cat, vi, whatever and access the file. So guessing it's a filesize.


The only file I was able to work with was 441megs, the rest are about 1.5-4gigs each.

I have a LOT of data, over 140gigs of pgn data so prefer not to break them down even further.

-Josh
Using the "pgnscid" command-line utility is much more reliable, and has always woked for me. pgnscid PGNFILE makes a scid database out of a pgn file.

Opening large PGNs from Scid directly can be troublesome. It's definitely a tcl memory issue, but what is happening is unclear to me.

Scid databases are limited to 16 million games. The actual limit is 16777214 and Scid vs. PC svn will allow this many games. Otherwise, the only limit on file size is from the operating system. 2gig or 4gig on FAT32 i think.

stevenaaus
Posts: 602
Joined: Wed Oct 13, 2010 7:44 am
Location: Australia
Contact:

Re: Use "pgnscid" command-line tool.

Post by stevenaaus » Sat Sep 10, 2011 12:08 pm

I don't think this is relevant to your problem, but just noting - Pascal did try to remove some file size limits at one stage, but gave in. These notes refer to the game file (.sg4) file size.
Before reaching this limit, the size of the game file could exceed 2 GB, which was not possible on 32 bit systems. So now some of Scid's IO handle large files (64 bit I/O system calls even on 32 bit systems) (but the limit for the game file is still 4 GB, given that games offsets are saved as unsigned integers).
But the changes got reverted.
scid-4.2.2
==========
This maintenance release reverts to 32 bits file I/O. Large file handling across many platforms seems a bit hard too achieve ...
On my test bases, maxed.sg4 with 16.777 million games is 1400522061 bytes. Junkbase gamefile is 1790846546 jbase.sg4

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Use "pgnscid" command-line tool.

Post by Adam Hair » Sun Sep 11, 2011 1:29 pm

stevenaaus wrote:
jshriver wrote:Greetings,

I was planning to convert a lot of my pgn files into SCID format for analysis and tweaking and have hit a problem. Whenever I open a new database, and go to import from 1 file I keep getting an error "Error opening file" it's not a permission issue, I can cat, vi, whatever and access the file. So guessing it's a filesize.


The only file I was able to work with was 441megs, the rest are about 1.5-4gigs each.

I have a LOT of data, over 140gigs of pgn data so prefer not to break them down even further.

-Josh
Using the "pgnscid" command-line utility is much more reliable, and has always woked for me. pgnscid PGNFILE makes a scid database out of a pgn file.

Opening large PGNs from Scid directly can be troublesome. It's definitely a tcl memory issue, but what is happening is unclear to me.

Scid databases are limited to 16 million games. The actual limit is 16777214 and Scid vs. PC svn will allow this many games. Otherwise, the only limit on file size is from the operating system. 2gig or 4gig on FAT32 i think.
What I have found to be strange is that Scid has trouble with the pgns from the CCRL site. The 40/40 pgn without comments is ~360 MB. Scid chokes at game 262,144 when I try to open the pgn ( 262,144 = 2^18). If I split the pgn, there are no problems opening the parts. I have the same problem with the 40/4 pgn. I don't have problems with other, larger pgns.

kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 3:48 am

Re: Use "pgnscid" command-line tool.

Post by kbhearn » Mon Sep 12, 2011 3:51 am

scid is not a wonderfully forgiving pgn parser. games with missing result codes or with nonstandard starting positions are two things i've seen cause it to choke on an entire file. if the tool you're using to split it corrects syntax, it could be solving scid's parsing problem.

Post Reply