FICS Data

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

Zach Wegner wrote:Qd1d3 can be valid.

[d]8/8/8/3Q4/8/8/8/3Q1Q2 w - -
Ooouch. Beatiful.

We have just found another exam question
for engine authors or programmers:

How does your engine handle this position?

;)

Best,
hi, merhaba, hallo HT
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

That's a good point. Those games might have been a variant (suicide, wild, or whatever) that I didn't properly clean.

-Josh
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: FICS Data

Post by jwes »

beachknight wrote:
jwes wrote:
jshriver wrote:Checked and right now my raw data streams from fics over the past 3 years is 61gigs. So up for options on what people would like to have grabbed from it.
I'd like to see it run through pgn-extract, very short games removed, and split into a few large chunks. I'd prefer split by elo, but split into openings would be good also. Pgn-extract claims to do both.
Hi Wes,

My step by step questions on :

How would you proceed with such huge amount of data?

chunk size: which is better? 250 MB? or 500 MB?

very short games: minimum number of moves? 4, 7 or 10?

type of games: standard, rapid and blitz together or separate?

split: by elo or eco? which is easier?

Hope this helps Joshua,

Best,
I don't have any strong feelings on any of these questions.
250 MB chunks might be better.
Minimum 7 moves + shorter games that end in mate, though a 2 move minimum gains nearly all the benefit. If there is a way to distinguish between resignation and disconnection, then I would also like to filter out games that end by disconnection up to 12 moves.
I would like games split by time control.
Splitting by eco is slightly easier, but pgn-extract supports both.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: FICS Data

Post by michiguel »

beachknight wrote:
Zach Wegner wrote:Qd1d3 can be valid.

[d]8/8/8/3Q4/8/8/8/3Q1Q2 w - -
Ooouch. Beatiful.

We have just found another exam question
for engine authors or programmers:

How does your engine handle this position?

;)

Best,
Gaviota handles it warning the user without crashing:

Code: Select all

setboard 8/8/8/3Q4/8/8/8/3Q1Q2 w - -
Error (wrong FEN or EPD): 8/8/8/3Q4/8/8/8/3Q1Q2 w - -
tellusererror Error loading FEN:
8/8/8/3Q4/8/8/8/3Q1Q2 w - -
Miguel
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

I wrote this little script as a quick hack to split pgn files if anyone finds it useful.

Usage: ./pgnchop.pl file.pgn 1000

Will create:
file.pgn.0 .. file.pgn.X in 1000 game chunks.

#!/usr/bin/perl -w

my $i=1;
my $games=0;
my $slice=0;

# Source pgn
open(FP, "<$ARGV[0]");
$games=$ARGV[1];

print "Chopping $ARGV[0] into $games game slices:\n";

open(PGN, ">", "$ARGV[0].$slice") || die $!;
while(<FP>) {

print PGN "$_";

if(($_ =~ "1-0" || $_ =~ "1/2-1/2" || $_ =~ "0-1") && ($_ !~ "Result")) {
$i+=1;
if($i == $games+1) {
$slice+=1;
close(PGN);
open(PGN, ">", "$ARGV[0].$slice") || die $!;
print "Writing to $ARGV[0].$slice\n";
$i=1;
}
}
}
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

jshriver wrote:I wrote this little script as a quick hack to split pgn files if anyone finds it useful.

Usage: ./pgnchop.pl file.pgn 1000

Will create:
file.pgn.0 .. file.pgn.X in 1000 game chunks.

#!/usr/bin/perl -w

my $i=1;
my $games=0;
my $slice=0;

# Source pgn
open(FP, "<$ARGV[0]");
$games=$ARGV[1];

print "Chopping $ARGV[0] into $games game slices:\n";

open(PGN, ">", "$ARGV[0].$slice") || die $!;
while(<FP>) {

print PGN "$_";

if(($_ =~ "1-0" || $_ =~ "1/2-1/2" || $_ =~ "0-1") && ($_ !~ "Result")) {
$i+=1;
if($i == $games+1) {
$slice+=1;
close(PGN);
open(PGN, ">", "$ARGV[0].$slice") || die $!;
print "Writing to $ARGV[0].$slice\n";
$i=1;
}
}
}
Is this script to be used for 60+ gb data?

Best,
hi, merhaba, hallo HT
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

It should be. The logic is pretty simple it basically

* Looks to see how many games to put in each chunk.
* Read file one line at a time
* If it comes across a score that is not in a PGN "Result tag" increase slice counter.
** if you hit the counter, close the file, open a new one with the next slice id.
* until end of file :)

Not great but seems to work for me, I feed it a 2gig pgn and worked fine.
User avatar
beachknight
Posts: 3533
Joined: Tue Jan 09, 2007 8:33 pm
Location: Antalya, Turkey

Re: FICS Data

Post by beachknight »

jshriver wrote:It should be. The logic is pretty simple it basically

* Looks to see how many games to put in each chunk.
* Read file one line at a time
* If it comes across a score that is not in a PGN "Result tag" increase slice counter.
** if you hit the counter, close the file, open a new one with the next slice id.
* until end of file :)

Not great but seems to work for me, I feed it a 2gig pgn and worked fine.
Excellent. Lets wait for the outcome. I predict some 24m games.

Best,

Hint: 4 gig --> 1.6m+ --->> 60 gig --> 24m+ :)
hi, merhaba, hallo HT
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Here is an updated sample of what I'm outputing.

Still want to refine it though.

[White "plink"]
[Black "zzzzzzzflash"]
[WhiteElo "2369"]
[BlackElo "1492"]
[Result "1-0"]
[Date "2009.1.22"]
[Event "None"]
[Site "FICS"]
[Round "0"]

1. d4 {0:00} e6 {0:00}
2. c4 {0:00} Bb4+ {0:08}
3. Nc3 {0:00} Bxc3+ {0:03}
4. bxc3 {0:37} Nf6 {1:48}
5. e4 {0:37} O-O {0:23}
6. e5 {0:34} Ne8 {0:31}
7. Bd3 {0:31} d6 {0:08}
8. Nf3 {0:30} c6 {0:16}
9. Bxh7+ {0:28} Kxh7 {0:12}
10. Ng5+ {0:15} Kg8 {0:28}
11. Qh5 {0:26} Qxg5 {1:37}
12. Bxg5 {0:00} f6 {0:32}
13. exf6 {0:31} Nxf6 {0:07}
14. Qg6 {0:20} Ng4 {0:43}
15. Be7 {0:28} Nxf2 {0:27}
16. Bxf8 {0:26} Kxf8 {0:19}
17. O-O {0:05} Nd7 {0:11}
18. Rxf2+ {0:19} Nf6 {0:05}
19. Rxf6+ {0:00} gxf6 {0:08}
20. Qxf6+ {0:00} Ke8 {0:17}
21. Rf1 {0:00} Kd7 {0:17}
22. Qg7+ {0:00}
{Black resigns} 1-0
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: FICS Data

Post by jshriver »

Basically I can add time controls and type now. (Just not sure where to put it).

I added move times, and also added a comment to say why the game ended. This was a request since it'll say "X resigned" "x ran out of time" or whatever.

-Josh