Scid error on CCRL pgn file imports

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

FlavusSnow
Posts: 89
Joined: Thu Apr 01, 2010 5:28 am
Location: Omaha, NE

Re: Scid error on CCRL pgn file imports

Post by FlavusSnow »

Its not a specific game. I've split the offending pgn file into 9 files each containing approximately 90,000 games. Regardless of what order you import them it will error on the third import (game number 2^18). the error happens in both windows and Linux. My best guess is that there is a variable size being exceeded in the headers for white and black ELO because the only difference between CEGT files and CCRL files are that CCRL headers include these ELO entries.
tmokonen
Posts: 1296
Joined: Sun Mar 12, 2006 6:46 pm
Location: Kelowna
Full name: Tony Mokonen

Re: Scid error on CCRL pgn file imports

Post by tmokonen »

I think the reason why this happens with the CCRL PGN file is because of the way the rounds are numbered. I tried an experiment, and wrote a quick and dirty program to replace the Round tags in the CCRL PGN file with [Round "?"], and SCID loaded all the games in the modified PGN file, whereas SCID loaded only 262,144 games from the unmodified PGN file. There must be a limit in SCID as to the number of rounds or subrounds per tournament.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair »

FlavusSnow wrote:Its not a specific game. I've split the offending pgn file into 9 files each containing approximately 90,000 games. Regardless of what order you import them it will error on the third import (game number 2^18). the error happens in both windows and Linux. My best guess is that there is a variable size being exceeded in the headers for white and black ELO because the only difference between CEGT files and CCRL files are that CCRL headers include these ELO entries.
I removed those tags and it still failed for me.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair »

tmokonen wrote:I think the reason why this happens with the CCRL PGN file is because of the way the rounds are numbered. I tried an experiment, and wrote a quick and dirty program to replace the Round tags in the CCRL PGN file with [Round "?"], and SCID loaded all the games in the modified PGN file, whereas SCID loaded only 262,144 games from the unmodified PGN file. There must be a limit in SCID as to the number of rounds or subrounds per tournament.
I did not try that, but I will.
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: Scid error on CCRL pgn file imports

Post by Norm Pollock »

I have imported CCRL 40/40 and 40/4,and CEGT 40/20 and 40/4, many times into SCID. Never had a crash. But I did use pgnscid.exe (which for my personal convenience, I renamed to pgn2scid.exe).

By using pgnscid.exe, the conversion to si4 is done outside of SCID. Therefore SCID just has to open a scid file. When you import a pgn file into SCID, you are asking SCID to do the conversion AND the opening.

There probably is a bug in SCID with regards to importing pgn files. Until that is corrected, I think using pgnscid.exe is the way to go.
tmokonen
Posts: 1296
Joined: Sun Mar 12, 2006 6:46 pm
Location: Kelowna
Full name: Tony Mokonen

Re: Scid error on CCRL pgn file imports

Post by tmokonen »

Norm Pollock wrote:I have imported CCRL 40/40 and 40/4,and CEGT 40/20 and 40/4, many times into SCID. Never had a crash. But I did use pgnscid.exe (which for my personal convenience, I renamed to pgn2scid.exe).

By using pgnscid.exe, the conversion to si4 is done outside of SCID. Therefore SCID just has to open a scid file. When you import a pgn file into SCID, you are asking SCID to do the conversion AND the opening.

There probably is a bug in SCID with regards to importing pgn files. Until that is corrected, I think using pgnscid.exe is the way to go.
Even with pgnscid, I had problems. The error message I got indicates that it is indeed a problem with the rounds. I used the version of pgnscid that came with Scid Vs. PC 4.6.

Code: Select all

C:\Temp\cegtallblitz>pgnscid CCRL-404.[808885].pgn
Converting file CCRL-404.[808885].pgn to Scid database CCRL-404.[808885]:
Errors/warnings will be written to CCRL-404.[808885].err.

  [0% 10   20   30   40   50   60   70   80   90  100]
  [..
ERROR: Too many round names!  The maximum allowable number is 262143.
Aborting pgnscid; try using a smaller PGN file.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair »

tmokonen wrote:
Norm Pollock wrote:I have imported CCRL 40/40 and 40/4,and CEGT 40/20 and 40/4, many times into SCID. Never had a crash. But I did use pgnscid.exe (which for my personal convenience, I renamed to pgn2scid.exe).

By using pgnscid.exe, the conversion to si4 is done outside of SCID. Therefore SCID just has to open a scid file. When you import a pgn file into SCID, you are asking SCID to do the conversion AND the opening.

There probably is a bug in SCID with regards to importing pgn files. Until that is corrected, I think using pgnscid.exe is the way to go.
Even with pgnscid, I had problems. The error message I got indicates that it is indeed a problem with the rounds. I used the version of pgnscid that came with Scid Vs. PC 4.6.

Code: Select all

C:\Temp\cegtallblitz>pgnscid CCRL-404.[808885].pgn
Converting file CCRL-404.[808885].pgn to Scid database CCRL-404.[808885]:
Errors/warnings will be written to CCRL-404.[808885].err.

  [0% 10   20   30   40   50   60   70   80   90  100]
  [..
ERROR: Too many round names!  The maximum allowable number is 262143.
Aborting pgnscid; try using a smaller PGN file.
Great! That explains why I can import the 40/40 database into Scid after editing all the Round names to ?.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair »

stevenaaus wrote:
Adam Hair wrote:Actually, the problem occurs at the 262,144th game, which is 2^18 (as Kirill pointed out to me once).
Yes - this is interesting. Where is the download link ?
The fact that Scid can import more games than this means perhaps that the CCRL program may have something funny at this number.

Have you tried the pgnscid command line tool. It comes with scid and creates an si4 from the command line. It's more reliable for some reason.
You can find the link to the smaller database on this page:
http://computerchess.org.uk/ccrl/4040/games.html

There are approximately 370,000 games in that database.
stevenaaus
Posts: 608
Joined: Wed Oct 13, 2010 9:44 am
Location: Australia

Re: Scid error on CCRL pgn file imports

Post by stevenaaus »

Ok... i remember seeing something about this before.
I've made myself a big pgn with non-unique ROUND names, and can reproduce the bug.

The current behaviour leaves a broken 262144 game database, which is not fixable by "scidt -N database", but i've made some changes and it now leaves the db in a better state from which "scidt -N database" recovers all except the last game. Perhaps the last game should be deleted.

Anyway, i wonder what the best solution is. 2^18 round names seems reasonable to me. And some hack like

Code: Select all

if ($site == "CCRL") Round = "";
is unacceptable because of the slow down.

Perhaps there is some satisfactory solution.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Scid error on CCRL pgn file imports

Post by Adam Hair »

stevenaaus wrote:Ok... i remember seeing something about this before.
I've made myself a big pgn with non-unique ROUND names, and can reproduce the bug.

The current behaviour leaves a broken 262144 game database, which is not fixable by "scidt -N database", but i've made some changes and it now leaves the db in a better state from which "scidt -N database" recovers all except the last game. Perhaps the last game should be deleted.

Anyway, i wonder what the best solution is. 2^18 round names seems reasonable to me. And some hack like

Code: Select all

if ($site == "CCRL") Round = "";
is unacceptable because of the slow down.

Perhaps there is some satisfactory solution.
The reason for the unique Round names (numbers) is that it is used to identify each game. Games from the published database can be traced back to the submitted pgn. Which has been useful lately. Norm Pollock noticed some discrepancies recently in the databases and it was easy to determine who submitted those games (me :oops: ) and determine what mistake was made. Given this and that the PGN specifications given no limit on the number of Round names, I do not think that the CCRL will change its method.

Of course, I am not expecting you to change Scid vs PC if you feel 2^18 is enough Round names.