Scid error on CCRL pgn file imports

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
FlavusSnow
Posts: 89
Joined: Thu Apr 01, 2010 3:28 am
Location: Omaha, NE

Re: Scid error on CCRL pgn file imports

Post by FlavusSnow » Wed Jan 25, 2012 10:18 pm

Its not a specific game. I've split the offending pgn file into 9 files each containing approximately 90,000 games. Regardless of what order you import them it will error on the third import (game number 2^18). the error happens in both windows and Linux. My best guess is that there is a variable size being exceeded in the headers for white and black ELO because the only difference between CEGT files and CCRL files are that CCRL headers include these ELO entries.

tmokonen
Posts: 1020
Joined: Sun Mar 12, 2006 5:46 pm
Location: Vancouver

Re: Scid error on CCRL pgn file imports

Post by tmokonen » Wed Jan 25, 2012 11:56 pm

I think the reason why this happens with the CCRL PGN file is because of the way the rounds are numbered. I tried an experiment, and wrote a quick and dirty program to replace the Round tags in the CCRL PGN file with [Round "?"], and SCID loaded all the games in the modified PGN file, whereas SCID loaded only 262,144 games from the unmodified PGN file. There must be a limit in SCID as to the number of rounds or subrounds per tournament.

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair » Thu Jan 26, 2012 1:04 am

FlavusSnow wrote:Its not a specific game. I've split the offending pgn file into 9 files each containing approximately 90,000 games. Regardless of what order you import them it will error on the third import (game number 2^18). the error happens in both windows and Linux. My best guess is that there is a variable size being exceeded in the headers for white and black ELO because the only difference between CEGT files and CCRL files are that CCRL headers include these ELO entries.
I removed those tags and it still failed for me.

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair » Thu Jan 26, 2012 1:05 am

tmokonen wrote:I think the reason why this happens with the CCRL PGN file is because of the way the rounds are numbered. I tried an experiment, and wrote a quick and dirty program to replace the Round tags in the CCRL PGN file with [Round "?"], and SCID loaded all the games in the modified PGN file, whereas SCID loaded only 262,144 games from the unmodified PGN file. There must be a limit in SCID as to the number of rounds or subrounds per tournament.
I did not try that, but I will.

Norm Pollock
Posts: 1017
Joined: Thu Mar 09, 2006 3:15 pm
Location: Long Island, NY, USA
Contact:

Re: Scid error on CCRL pgn file imports

Post by Norm Pollock » Thu Jan 26, 2012 1:27 am

I have imported CCRL 40/40 and 40/4,and CEGT 40/20 and 40/4, many times into SCID. Never had a crash. But I did use pgnscid.exe (which for my personal convenience, I renamed to pgn2scid.exe).

By using pgnscid.exe, the conversion to si4 is done outside of SCID. Therefore SCID just has to open a scid file. When you import a pgn file into SCID, you are asking SCID to do the conversion AND the opening.

There probably is a bug in SCID with regards to importing pgn files. Until that is corrected, I think using pgnscid.exe is the way to go.

tmokonen
Posts: 1020
Joined: Sun Mar 12, 2006 5:46 pm
Location: Vancouver

Re: Scid error on CCRL pgn file imports

Post by tmokonen » Thu Jan 26, 2012 1:39 am

Norm Pollock wrote:I have imported CCRL 40/40 and 40/4,and CEGT 40/20 and 40/4, many times into SCID. Never had a crash. But I did use pgnscid.exe (which for my personal convenience, I renamed to pgn2scid.exe).

By using pgnscid.exe, the conversion to si4 is done outside of SCID. Therefore SCID just has to open a scid file. When you import a pgn file into SCID, you are asking SCID to do the conversion AND the opening.

There probably is a bug in SCID with regards to importing pgn files. Until that is corrected, I think using pgnscid.exe is the way to go.
Even with pgnscid, I had problems. The error message I got indicates that it is indeed a problem with the rounds. I used the version of pgnscid that came with Scid Vs. PC 4.6.

Code: Select all

C:\Temp\cegtallblitz>pgnscid CCRL-404.[808885].pgn
Converting file CCRL-404.[808885].pgn to Scid database CCRL-404.[808885]:
Errors/warnings will be written to CCRL-404.[808885].err.

  [0% 10   20   30   40   50   60   70   80   90  100]
  [..
ERROR: Too many round names!  The maximum allowable number is 262143.
Aborting pgnscid; try using a smaller PGN file.

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair » Thu Jan 26, 2012 2:41 am

tmokonen wrote:
Norm Pollock wrote:I have imported CCRL 40/40 and 40/4,and CEGT 40/20 and 40/4, many times into SCID. Never had a crash. But I did use pgnscid.exe (which for my personal convenience, I renamed to pgn2scid.exe).

By using pgnscid.exe, the conversion to si4 is done outside of SCID. Therefore SCID just has to open a scid file. When you import a pgn file into SCID, you are asking SCID to do the conversion AND the opening.

There probably is a bug in SCID with regards to importing pgn files. Until that is corrected, I think using pgnscid.exe is the way to go.
Even with pgnscid, I had problems. The error message I got indicates that it is indeed a problem with the rounds. I used the version of pgnscid that came with Scid Vs. PC 4.6.

Code: Select all

C:\Temp\cegtallblitz>pgnscid CCRL-404.[808885].pgn
Converting file CCRL-404.[808885].pgn to Scid database CCRL-404.[808885]:
Errors/warnings will be written to CCRL-404.[808885].err.

  [0% 10   20   30   40   50   60   70   80   90  100]
  [..
ERROR: Too many round names!  The maximum allowable number is 262143.
Aborting pgnscid; try using a smaller PGN file.
Great! That explains why I can import the 40/40 database into Scid after editing all the Round names to ?.

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair » Thu Jan 26, 2012 2:57 am

stevenaaus wrote:
Adam Hair wrote:Actually, the problem occurs at the 262,144th game, which is 2^18 (as Kirill pointed out to me once).
Yes - this is interesting. Where is the download link ?
The fact that Scid can import more games than this means perhaps that the CCRL program may have something funny at this number.

Have you tried the pgnscid command line tool. It comes with scid and creates an si4 from the command line. It's more reliable for some reason.
You can find the link to the smaller database on this page:
http://computerchess.org.uk/ccrl/4040/games.html

There are approximately 370,000 games in that database.

stevenaaus
Posts: 602
Joined: Wed Oct 13, 2010 7:44 am
Location: Australia
Contact:

Re: Scid error on CCRL pgn file imports

Post by stevenaaus » Thu Jan 26, 2012 9:19 am

Ok... i remember seeing something about this before.
I've made myself a big pgn with non-unique ROUND names, and can reproduce the bug.

The current behaviour leaves a broken 262144 game database, which is not fixable by "scidt -N database", but i've made some changes and it now leaves the db in a better state from which "scidt -N database" recovers all except the last game. Perhaps the last game should be deleted.

Anyway, i wonder what the best solution is. 2^18 round names seems reasonable to me. And some hack like

Code: Select all

if ($site == "CCRL") Round = "";
is unacceptable because of the slow down.

Perhaps there is some satisfactory solution.

Adam Hair
Posts: 3201
Joined: Wed May 06, 2009 8:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair » Thu Jan 26, 2012 12:35 pm

stevenaaus wrote:Ok... i remember seeing something about this before.
I've made myself a big pgn with non-unique ROUND names, and can reproduce the bug.

The current behaviour leaves a broken 262144 game database, which is not fixable by "scidt -N database", but i've made some changes and it now leaves the db in a better state from which "scidt -N database" recovers all except the last game. Perhaps the last game should be deleted.

Anyway, i wonder what the best solution is. 2^18 round names seems reasonable to me. And some hack like

Code: Select all

if ($site == "CCRL") Round = "";
is unacceptable because of the slow down.

Perhaps there is some satisfactory solution.
The reason for the unique Round names (numbers) is that it is used to identify each game. Games from the published database can be traced back to the submitted pgn. Which has been useful lately. Norm Pollock noticed some discrepancies recently in the databases and it was easy to determine who submitted those games (me :oops: ) and determine what mistake was made. Given this and that the PGN specifications given no limit on the number of Round names, I do not think that the CCRL will change its method.

Of course, I am not expecting you to change Scid vs PC if you feel 2^18 is enough Round names.

Post Reply