
new junkbase
Moderator: Ras
Re: new junkbase
I´ve download it, but I can´t open it with my scid version, what would happen?? 

-
- Posts: 12791
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: new junkbase
I assume it has been decompressed with bzip2 via:ArmyBridge wrote:I´ve download it, but I can´t open it with my scid version, what would happen??
bzip2 -d <file 1>
bzip2 -d <file 2>
bzip2 -d <file 3>
If that is the case, then maybe it is too many games. My version of Scid is specially built to accomdate 20 M games or so.
-
- Posts: 12791
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: new junkbase
You might try this version of Scid:ArmyBridge wrote:I´ve download it, but I can´t open it with my scid version, what would happen??
http://cap.connx.com/chess-engines/new- ... igscid.zip
-
- Posts: 6081
- Joined: Fri Mar 10, 2006 11:14 pm
- Location: Munster, Nuremberg, Princeton
Re: new junkbase
Two comments. It's beyond me why I cant get the offered SCID for all the games. Why needing a command line tool to begin to extract the scid files? What is the benefit of such tricky high tech stuff if we just want to get a junk base???Dann Corbit wrote:You might try this version of Scid:ArmyBridge wrote:I´ve download it, but I can´t open it with my scid version, what would happen??
http://cap.connx.com/chess-engines/new- ... igscid.zip
I for one wanted just to take a look. Please give me a chance to extract and then make a few tests with the alleged 7,5 mill base.
I use already a whole collection of more than 9 mill games.
The CB 4, 3 mill Bigbase and more than 5 mill computerchess games. Including server games.
Why do you get only 7,5 mill games in total?
Sorry if the comments and questions are looking disrespectful. It's not my intention. I did just see a contradiction between junk and high tech tools to deal with. Mekes no scientifical sense to me. The method should always be cjosen equivalent to the data types as such. Very basic science stuff.
I would beg you for a executive (exe or com). And a program to handle the 7,5 mill games... Why SCID if it's so limited? I got the games in pgn actually up to 2,5 mill to the ECO code of B22.
Just a third point of critic:
For the stats of the first 2,5 mill games it looks as if the A40 and the B20 ECO games have almost a high majority for the whole database. Is that reasonable to assume or is that a conseuence of the machine games chosen? For human chess that doesnt look representative IMO.
Sorry for all the stuff.
-Popper and Lakatos are good but I'm stuck on Leibowitz
-
- Posts: 12791
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: new junkbase
I package this way because it is the most compact.Rolf wrote:Two comments. It's beyond me why I cant get the offered SCID for all the games. Why needing a command line tool to begin to extract the scid files? What is the benefit of such tricky high tech stuff if we just want to get a junk base???Dann Corbit wrote:You might try this version of Scid:ArmyBridge wrote:I´ve download it, but I can´t open it with my scid version, what would happen??
http://cap.connx.com/chess-engines/new- ... igscid.zip
The bandwidth required goes from thousands of GB/day to hundreds of GB
I guess that your games are better than my games. It is pointless for you to download them. It is mostly just for people who do not have a commercial chess database system.I for one wanted just to take a look. Please give me a chance to extract and then make a few tests with the alleged 7,5 mill base.
I use already a whole collection of more than 9 mill games.
The CB 4, 3 mill Bigbase and more than 5 mill computerchess games. Including server games.
Because that is what I have left after I de-dupe my collection. As to why only 7.5 million games? That is how many I have collected. I can't think of any other explanation.Why do you get only 7,5 mill games in total?
According to my examination of the classification, there is wide dispersion of ECO categories, with ECO B being the most popular and ECO E being the least, but there are still one million ECO E games.Sorry if the comments and questions are looking disrespectful. It's not my intention. I did just see a contradiction between junk and high tech tools to deal with. Mekes no scientifical sense to me. The method should always be cjosen equivalent to the data types as such. Very basic science stuff.
I would beg you for a executive (exe or com). And a program to handle the 7,5 mill games... Why SCID if it's so limited? I got the games in pgn actually up to 2,5 mill to the ECO code of B22.
Just a third point of critic:
For the stats of the first 2,5 mill games it looks as if the A40 and the B20 ECO games have almost a high majority for the whole database. Is that reasonable to assume or is that a conseuence of the machine games chosen? For human chess that doesnt look representative IMO.
Sorry for all the stuff.
-
- Posts: 12791
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: new junkbase
You can also get the individual categorized files as PGN compressed by bzip2 here:
http://cap.connx.com/a-openings/
http://cap.connx.com/b-openings/
http://cap.connx.com/c-openings/
http://cap.connx.com/d-openings/
http://cap.connx.com/e-openings/
You can decompress bzip2 compressed files with bzip2, 7-zip or PowerArchiver (among others)
http://cap.connx.com/a-openings/
http://cap.connx.com/b-openings/
http://cap.connx.com/c-openings/
http://cap.connx.com/d-openings/
http://cap.connx.com/e-openings/
You can decompress bzip2 compressed files with bzip2, 7-zip or PowerArchiver (among others)
-
- Posts: 6081
- Joined: Fri Mar 10, 2006 11:14 pm
- Location: Munster, Nuremberg, Princeton
Re: new junkbase
Your answers are not really what I have expected but they do still tell me something so that now I understand the meaning of the base.Dann Corbit wrote:I package this way because it is the most compact.Rolf wrote:Two comments. It's beyond me why I cant get the offered SCID for all the games. Why needing a command line tool to begin to extract the scid files? What is the benefit of such tricky high tech stuff if we just want to get a junk base???Dann Corbit wrote:You might try this version of Scid:ArmyBridge wrote:I´ve download it, but I can´t open it with my scid version, what would happen??
http://cap.connx.com/chess-engines/new- ... igscid.zip
The bandwidth required goes from thousands of GB/day to hundreds of GBI guess that your games are better than my games. It is pointless for you to download them. It is mostly just for people who do not have a commercial chess database system.I for one wanted just to take a look. Please give me a chance to extract and then make a few tests with the alleged 7,5 mill base.
I use already a whole collection of more than 9 mill games.
The CB 4, 3 mill Bigbase and more than 5 mill computerchess games. Including server games.Because that is what I have left after I de-dupe my collection. As to why only 7.5 million games? That is how many I have collected. I can't think of any other explanation.Why do you get only 7,5 mill games in total?According to my examination of the classification, there is wide dispersion of ECO categories, with ECO B being the most popular and ECO E being the least, but there are still one million ECO E games.Sorry if the comments and questions are looking disrespectful. It's not my intention. I did just see a contradiction between junk and high tech tools to deal with. Mekes no scientifical sense to me. The method should always be cjosen equivalent to the data types as such. Very basic science stuff.
I would beg you for a executive (exe or com). And a program to handle the 7,5 mill games... Why SCID if it's so limited? I got the games in pgn actually up to 2,5 mill to the ECO code of B22.
Just a third point of critic:
For the stats of the first 2,5 mill games it looks as if the A40 and the B20 ECO games have almost a high majority for the whole database. Is that reasonable to assume or is that a conseuence of the machine games chosen? For human chess that doesnt look representative IMO.
Sorry for all the stuff.
As I said, the base has many faults:
1) mixture out of comp and human games
2) name headers incorrect in many cases
So that the danger for a beginner player is the unsorted, meaningless and confusing high number of games.
For someone who has 5 mill games out of a well edited database the plus of 2,5 mill of the junkbase is total nonsense because there is no hidden value.
Most of these plus games come from computer games but for these sort of games there are original engine bases in all multitudes and better edited.
But again, a closer look has reveiled for me that some lay begins to mix the games and doing some stats what then leads to a total confusion because the results are often wrong.
Now you could reply that you have warned about the junk, so where is the point for such a critic?
Here I come back to my method/dada quality critic. I experienced that the SCID tool is practically a copy of the professional CbessBase. You can do dozens of nice feature processes and then get the results. So the SCID is high class IMO. But only it's for free it doesnt make sense with such data garbage. To the contrary, the SCID suggests that the data cant be so bad because you get so many nice results. Yes, but it's false data and results.
So in a way I must take back my former criticism. The SCID itself is fantastic. Ok, I dont need it because CB is better if you have a year long experience so that it's just a habit or an expectance you seek. No, the terrible emptiness is in the high number of the database junk. How could a lay or beginner or young talent with small budget look through the dangers of the huge base? Must he not be impressed by the plus of 2,5 mill "games"? Not, if he would listen to an expert who checked with the different search tools in SCID itself.
We had the debate a couple of months ago. Games can have different results, different headers, different number of moves and it's still always the same game. At that moment your statistics is destroyed.
Perhaps a computerchess freak needs just the hugeness but a human player would better need some expert comments so dearly. And a progressed player who wants to prepare with the modern tools is cheated by the huge amount of data.
A real chess talent MUST start with the best thinkable data before he then begins to memorize and learn the theories. A future scientist must study with the best readers and the best instututes. For certain quiz Wikipedia could be the alternative. But the deeper you progress the time you waste with such a tool is increasing. This isnt my opinion, it's the truth seen from an academic perspective.
(I come back to my earlier oimpression that the junkbase had ONLY 2.5 mill games in total. That was my mistake because I had begun with a transfer of the SCID base into pgn for ALL games. That wasnt possible with how I did it. After I split the base into the five ECO parts in SCID and then exported the games, ewverything went fine. Also the impression that the ECO B part stopped at the B22 was wrong, because that was the end of my whole games after the wrong export.)
Since you are an expert in CC, I wished that you wouldnt offer such garbage. Because the underlying chess had deserved a better presentation for perhaps people in computerchess who have never had the chance to experience the true beaty of the game of chess. The moment you process chess with machines you have no more the chance to feel the beauty. It degenerates into numbers. But in the case of the junkbase even the numbers are ugly.
I hopee that my feedback now allows you to forgive me that tone of my first impression. The value of your whole chess collection is clear.
A tiny little idea in the end for the collections and chess in general also in relation to copyright.
You cant expect to learn something in chess -as a human player - if you just digest the played moves without commentaries. Because we are not machines. And human chess players need mtore than just moves but they need ideas about the moves. Nobody should forget that computers are just collectors of moves but they dont understand chessmoves. It's not a contradiction that computers win today because that is due to their constand exactness of moves so that no amateur players have a chance.
-Popper and Lakatos are good but I'm stuck on Leibowitz
Optimal compression
Hello Dann,
Thank you for this gift.
In a post you say : "I package this way because it is the most compact."
I have a suggestion for compacting your 7 millions games. Leave them as PGN file(s) and use a high compression program. NanoZip is one of them.
Look here : http://nanozip.net Why this one ? because of this : http://compressionratings.com/rating_sum.html
I think you will have smaller files by compacting text files with it from PGN files.
Thank you for this gift.
In a post you say : "I package this way because it is the most compact."
I have a suggestion for compacting your 7 millions games. Leave them as PGN file(s) and use a high compression program. NanoZip is one of them.
Look here : http://nanozip.net Why this one ? because of this : http://compressionratings.com/rating_sum.html
I think you will have smaller files by compacting text files with it from PGN files.
Re: Optimal compression
Scid uses a special algorithm that considers which moves are legal from a position, the result is that "the move encoding format is very compact: most moves take only one byte of storage" (acording to the user manual). I don't think a program compressing text files can do better, although to be certain one must make an experiment.Philippe wrote:I think you will have smaller files by compacting text files with it from PGN files.