new junkbase

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: Optimal compression

Post by Gian-Carlo Pascutto »

I'm almost certain that the last time I tried this it was way more efficient to zip up the SCID files than to zip up the PGN.
Dann Corbit
Posts: 12791
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Optimal compression

Post by Dann Corbit »

Gian-Carlo Pascutto wrote:I'm almost certain that the last time I tried this it was way more efficient to zip up the SCID files than to zip up the PGN.
Besides, the files are available both ways (bzip2 compressed SCID files and bzip2 compressed PGN.)

Code: Select all

 Directory of C:\dannfast\e_drive\ward-ftp\FTPRoot\pub\scid

05/20/2009  01:05 AM       581,821,018 jbase.sg3.bz2
05/20/2009  01:03 AM       208,961,026 jbase.si3.bz2
05/20/2009  01:01 AM         6,394,724 jbase.sn3.bz2
               3 File(s)    797,176,768 bytes

Code: Select all


 Directory of C:\dannfast\e_drive\ward-ftp\FTPRoot\pub\a-openings
05/26/2009  06:50 PM           998,832 A00.pgn.bz2
...
05/26/2009  06:55 PM           211,656 A99.pgn.bz2
             542 File(s)    273,578,222 bytes

 Directory of C:\dannfast\e_drive\ward-ftp\FTPRoot\pub\b-openings
05/26/2009  06:55 PM            45,867 B00$01.pgn.bz2
...
05/26/2009  07:04 PM             9,599 B99y.pgn.bz2
            1177 File(s)    391,056,381 bytes
 Directory of C:\dannfast\e_drive\ward-ftp\FTPRoot\pub\c-openings
05/26/2009  07:04 PM           258,702 C00$11.pgn.bz2
...
05/26/2009  07:11 PM           667,610 C99.pgn.bz2
            1085 File(s)    232,974,751 bytes

 Directory of C:\dannfast\e_drive\ward-ftp\FTPRoot\pub\d-openings
05/26/2009  07:11 PM            20,923 D00$05.pgn.bz2
...
05/26/2009  07:17 PM           310,115 D99.pgn.bz2
             932 File(s)    225,090,299 bytes

 Directory of C:\dannfast\e_drive\ward-ftp\FTPRoot\pub\e-openings
05/26/2009  07:17 PM           255,175 E00.pgn.bz2
...
05/26/2009  07:20 PM         1,759,921 E99.pgn.bz2
             558 File(s)    177,728,416 bytes

A total of 800 MB are consumed for compressed SCID files.
A total of 1300 MB are consumed for compressed PGN files.

So if you already use SCID, then get the SCID files. The download will be faster and you can turn it into PGN if you like. If you don't use SCID, then get the PGN files.
Philippe

Re: Optimal compression

Post by Philippe »

I made a test. I compressed the 3 files jbase.sg3 *.sn3 and *.si3 with NanoZip and ended with a 631 MB file instead of 760 MB. It lasted 1 hour.
User avatar
Rolf
Posts: 6081
Joined: Fri Mar 10, 2006 11:14 pm
Location: Munster, Nuremberg, Princeton

Re: new junkbase

Post by Rolf »

Third part of my test:

The top program with the most games in the junk is...

SHREDDER vers. 8!!!

With 176000 games approx.

Also Booot or Bright have a huge part of games. Yes, Fritz 8 too or the deep version.

Forget about RYBKA. It's participating with some 10000 with some versions.

---------------------------------------------------

So besides the title of junk, we can conclude that it's biased junk.

It's comparable to the long tradition in the CB supporting forum CSS in Germany where for years Fritz and others won all "private" tournaments while Rybka wasnt even mentioned.

It's so sad to see.

Selfunderstood that I already took care that no games out of my career should be in that junkbase. Sorry.
-Popper and Lakatos are good but I'm stuck on Leibowitz
Dann Corbit
Posts: 12791
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: new junkbase

Post by Dann Corbit »

There are 81392 Zappa games
THere are 22065 ZapChess games
There are 156677 Naum games
There are 215323 Rybka games
There are 569977 Shredder games
There are 670012 Fritz games

I think that the counts are mostly a function of how long the programs have been around.

But the biggest problem with the database is the inconsistency in naming of the players (IMO).

For instance, there are 6545 distinct players with the substring 'shredder' in them.