FICS Data

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

jwes wrote:After running it through pgn-extract, importing it into scid and eliminating duplicates and very short games, I ended up with 1601772 games.
Hi,

I have tried few utilities to load this huge pgn.
Without success so far.

pgn-extract on the other hand seems processing it speedily.

Which switches did you use in pgn-extract?

best,
hi, merhaba, hallo HT
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: FICS Data

Post by jwes »

beachknight wrote:
jwes wrote:After running it through pgn-extract, importing it into scid and eliminating duplicates and very short games, I ended up with 1601772 games.
Hi,

I have tried few utilities to load this huge pgn.
Without success so far.

pgn-extract on the other hand seems processing it speedily.

Which switches did you use in pgn-extract?

best,
I just used -o and -l. I also wrote a script that split the original pgn file into 20 smaller files, as the first time I tried pgn-extract it choked with an out of memory error.
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

jwes wrote:
beachknight wrote:
jwes wrote:After running it through pgn-extract, importing it into scid and eliminating duplicates and very short games, I ended up with 1601772 games.
Hi,

I have tried few utilities to load this huge pgn.
Without success so far.

pgn-extract on the other hand seems processing it speedily.

Which switches did you use in pgn-extract?

best,
I just used -o and -l. I also wrote a script that split the original pgn file into 20 smaller files, as the first time I tried pgn-extract it choked with an out of memory error.
Ok. Almost the same procedure.

After deleting 2 moves (ie max 4 halfmoves), I got 1 642 622 games.

Next job is removing dupes.

Best,
hi, merhaba, hallo HT
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: FICS Data

Post by jwes »

It looks like that some of the games are suicide chess.
Here is an example.
[Event "None"]
[Site "FICS"]
[Date "2007.11.6"]
[Round "0"]
[White "nessegrev"]
[Black "fenlurker"]
[Result "0-1"]
[WhiteElo "1843"]
[BlackElo "2170"]

1. e3 b6 2. Ba6 Nxa6 3. Qh5 Nb8 4. Qxh7 Rxh7 5. h3 Rxh3 6. Rxh3 f6 7. Rh6
gxh6 8. Nh3 Kf7 9. Ng5+ hxg5 10. a3 c5 11. c3 Na6 12. b4 cxb4 13. axb4 Nxb4
14. Rxa7 Rxa7 15. cxb4 Ra3 16. Bxa3 g4 17. g3 b5 18. Bc1 Ke6 19. f4 gxf3
20. Kd1 Kd6 21. d3 Kc6 22. d4 Kb7 23. Bb2 Qa5 24. bxa5 e5 25. dxe5 fxe5 26.
Bxe5 Nf6 27. Bxf6 Ba3 28. Nxa3 d6 29. Nxb5 f2 30. Nxd6+ Ka7 31. Nxc8+ f1=B
32. Nxa7 Bb5 33. Nxb5 0-1
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: FICS Data

Post by MattieShoes »

Suicide is probably caused by the letter -- on fics, suicide and standard are both represented by the letter s, but one is capitalized and one is lowercase...
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

jwes wrote:It looks like that some of the games are suicide chess.
Dont know how to eliminate them. By my side,
removing the dupes seems almost impossible.

An example: Same game with different two results,
1-0 and 0-1. But... the endposition is draw! I'd
adjudicate that game! :)

Plus: It is impossible to consider time forfeits.

As soon as I run remove duplicates, I predict that
only the half of the games will remain.

Best,
hi, merhaba, hallo HT
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

Another problem: How many half-moves should I take into account?
10? (ie 5 moves) 20? (ie 10 moves)

Best,
hi, merhaba, hallo HT
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

beachknight wrote:Another problem: How many half-moves should I take into account?
10? (ie 5 moves) 20? (ie 10 moves)

Best,
I took 10 moves ie 20 half-moves.

That resulted a game db with 1 555 352 games.
Still with games with no results.

Thank you, Joshua.

My research db is enriched now and contains
more than 17 million games.

Best,
hi, merhaba, hallo HT
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

Forgot to add that:

I cleaned all relayed games starting with
FM, GM, IM and WFM, WGM, WIM. Because,
most probably, these are already included
in my rdb.

Best,
hi, merhaba, hallo HT
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: FICS Data

Post by jdart »

Can you post your cleaned data set back to Joshua so others can benefit? I'd offer to host it but while I have bandwidth, I don't have the disk.

Personally also I am not generally interested in short blitz and lightning games or games from players < about 1600 on FICS. So for my own use at least I'd cull those.

--Jon