FICS Data

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

jshriver wrote:Checked and right now my raw data streams from fics over the past 3 years is 61gigs. So up for options on what people would like to have grabbed from it.
Wow. May I have then please:

all standard and blitz games,
min 10 moves, no dupes, no variant games

:)

Best,
hi, merhaba, hallo HT
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Sounds good, I can't sleep and working on fics2008 right now. Have to wait 15min in between downloads but should have everything done and uploaded by tomorrow sometime. Once it's up I'll post on the board the urls.

It's also payday :) with ram being cheap it's about time I upgrade this little box from 1gig to maybe 4 or 8. Should make crunching these datasets a lot quicker.

-Josh
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: FICS Data

Post by jwes »

jshriver wrote:Checked and right now my raw data streams from fics over the past 3 years is 61gigs. So up for options on what people would like to have grabbed from it.
I'd like to see it run through pgn-extract, very short games removed, and split into a few large chunks. I'd prefer split by elo, but split into openings would be good also. Pgn-extract claims to do both.
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

jwes wrote:
jshriver wrote:Checked and right now my raw data streams from fics over the past 3 years is 61gigs. So up for options on what people would like to have grabbed from it.
I'd like to see it run through pgn-extract, very short games removed, and split into a few large chunks. I'd prefer split by elo, but split into openings would be good also. Pgn-extract claims to do both.
Hi Wes,

My step by step questions on :

How would you proceed with such huge amount of data?

chunk size: which is better? 250 MB? or 500 MB?

very short games: minimum number of moves? 4, 7 or 10?

type of games: standard, rapid and blitz together or separate?

split: by elo or eco? which is easier?

Hope this helps Joshua,

Best,
hi, merhaba, hallo HT
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: FICS Data

Post by jdart »

There are some invalid moves (not SAN standard) in the file - not very many it seems but some:

Illegal move: Qd1d3
Illegal move: Qd1d3
Illegal move: Qc8f5+
Illegal move: Qc8f5+
Illegal move: Qe2f2
Illegal move: Qe2f2
Illegal move: Qb2b3
Illegal move: Qb2b3
Illegal move: Bf5e6
Illegal move: Bf5e6
Illegal move: Qc7b6+
Illegal move: Qc7b6+
Illegal move: Qb1c2
Illegal move: Qb1c2
Illegal move: Qg1f2
Illegal move: Qg1f2
Illegal move: Qf1e2
Illegal move: Qf1e2
Illegal move: Qe1d2
Illegal move: Qe1d2
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Unfortunately my parser isn't very smart in this regards, it assume that whatever move was made on the server is legal (no real chess rule checking just dumping from one stream into another formated into pgn).

Believe that will be where pgn-extract or scid comes in, they can clean out the bad games.
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

All of Mr. Taner's files are up on my site now:

http://olympuschess.com/fics/Taner/

Enjoy.
-Josh
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

Thanks.

I see your zipped data is smaller than mine.
Better to download bz2 versions to save some bw.

Best,
hi, merhaba, hallo HT
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: FICS Data

Post by jdart »

I think they're invalid, not necessarily illegal. For example "Qd1d3" can't be a SAN-compliant move. It should be one of "Qd3", "Qdd3" or "Q1d3" - with the last 2 variants being used only if there's more than one queen that could move to d3.

--Jon
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: FICS Data

Post by Zach Wegner »

Qd1d3 can be valid.

[d]8/8/8/3Q4/8/8/8/3Q1Q2 w - -