FICS Data

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

FICS Data

Post by jshriver »

I've been running a game observation bot on fics for just about 3 years. I cleaned up the parser and after all of this time consolidated all of the raw data and dumped it through my parser to get a nice big 3.5gig pgn file.

If anyone wants it I've put it on my website.
It's pretty much unfiltered raw pgn games of all elo ratings/unrated. even guest. The only thing I filtered for was variants. Though I kept my raw streams for processing later.

http://olympuschess.com/fics-1.pgn.bz2

Hope releasing this into the wild will help other people besides myself.
It's definitely a fun dataset. If anyone finds any problems please let me know. I wish I could have ran this in scid for error checking but my machine isn't powerful enough to do it. Took my parser (written in perl) almost 2 weeks to crunch the raw data.

-Josh

P.S. I just checked and this file has 4,182,419 games.
Last edited by jshriver on Sun Nov 15, 2009 1:54 am, edited 2 times in total.
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

For those interested, I'm rewriting the bot as well due to some account limitations on fics. Right now Oannes (account name) looks for running games, gets a history, observes and pops the data off (no processing). The major drawback to this is that fics allows you to only view up to 10 games at a time so losing A LOT of the overall game traffic.

To get around this, I'm keeping a list of user accounts, right now I have over 9k usernames. The new bot will grab data from their history, maintain an internal table of previous games and cycle accordingly. FICS history goes from 1-99 or 0-99, forget. But since there are only 10 games in the history it'll never overlap.

In my tests so far, just letting it run through 1 single names pass it's accumulated about 72megs raw data resulting in 36,489 games. Which goes to show how much my original bot is missing all this time.

Plan to keep releasing game dumps for other people to use for learning, opening books, or whatever.

Hope this helps.
-Josh
Hart

Re: FICS Data

Post by Hart »

I have been looking for something like this for a while. Does this include human-engine games as well?

Thanks.
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Sure it does :) it's pretty much everything that's visible to users. If you have a specific account you're looking for I can do a quick grep to see what and how many games is in this first release.

-Josh
P.S. I changed the file and the link above from my initial post because I wanted to give some kind of version to it.
Hart

Re: FICS Data

Post by Hart »

My download will complete in 40 minutes, but I am curious how many games there are and what percentage are human-engine?

Thanks again.
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Hope it works well for you. By the way this is bzip2 compressed. If others would like I could put a zip version up as well.

I did add md5 and sh1sums

http://olympuschess.com/fics-1.bz2.sh1
http://olympuschess.com/fics-1.bz2.md5

-Josh
Hart

Re: FICS Data

Post by Hart »

Did you try 7zip/LZMA with large word/dictionary size? I just took 60,000 pseudo-random games and reduced them to 13% their original size compared to your 27%, so you might be able to cut this file size down in half.
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Tried 7zr with maximum compression and it only sliced off a little more than bzip2 with -9.

Can put it up as well upon request.
-Josh
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: FICS Data

Post by jwes »

After running it through pgn-extract, importing it into scid and eliminating duplicates and very short games, I ended up with 1601772 games.
Here is an example of a badly formatted game.

[White "Flesch"]
[Black "Vadasz"]
[WhiteElo "0"]
[BlackElo "0"]
[Date "2006.10.18"]
[Event "None"]
[Site "FICS"]
[Round "0"]


1 ...
1 Bxh7+
1 Nf5
1 Qh5
1 Rd7
1 Rg7+
1 Rxd4
1 d5
1 none Kxf6Kxh7Nxg3Qc5Qc6Qe7Rad8Rxa3Rxd2Rxe4cxd5gxf5gxh5
2 Ba3
2 Bxc6
2 Bxd2
2 Bxf5
2 Ne4+
2 Nxg3
2 Qa1#
2 Qf6
2 Qxd2
2 Re5
2 Rxd6
2 Rxe4
2 bxa3
2 h4 Bd5Bxc6+Bxd6Kf7Nb3+Ng5Qd8Qf3Qh4Rd2Rxd6Rxg3+
3 Bxc6
3 Kb1
3 Kg1
3 Ne7+
3 Qf6
3 Qxe2
3 Rg7+
3 Rh4+
3 hxg3 Bxc6+Ke8Kg6Kg8Ncd2+Nh3#Qxe7Rd1+Rxg3+gxf6
4 Kc2
4 Kf1
4 Kg1
4 Qxd1
4 Qxh7+
4 Rg4+
4 Rh8+ Kh7Kh8Kxh7Kxh8Nd4+Rxd3Rxe2e2+
5 Be3
5 Bxf6#
5 Kxd2
5 Qh6+
5 Rg4
5 Rh5+
5 Rxg7+
5 bxc5 Bxe3#Kg8Kh8Nxe2Qxb2Rg2+
6 Kf1
6 Kh1
6 Qh6#
6 Qxg7#
6 Rf8+
6 Rh8#
6 bxa3 Bg8Ng3Rxg3#e2+
7 Ke1
7 Rfxg8# Rg1+
1-0
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Thanks for the input, will take a look at my data and the parser.

Had seen a problem similar to that when i was testing. Occasionally, as the original bot was recording "style 12" lines from the FICS server it might miss a beat and lose a line, it wasn't transmitted, or lost somehow. Resulting in holes in the game data.

Didn't seem as frequent to be a big concern, but the line with the random garbage (or other game moves) within a single move section is troubling.

Will look at it this weekend.
Any other concerns are appreciated.

-Josh