Looking for EPD's of the first 10 moves of games

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Looking for EPD's of the first 10 moves of games

Post by Rebel »

For the creation of a new ultra short Polyglot book (only the first 10 moves ) I extended the existing EPD database of 14 million unique opening position with 2.5 million, now 16.5 million. Before analyzing those with SF12 I would like to have 20 million.

Anyone having large EPD opening sets, or large PGN's with the more unusual openings?
90% of coding is debugging, the other 10% is writing bugs.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Looking for EPD's of the first 10 moves of games

Post by Dann Corbit »

Rebel wrote: Mon Nov 09, 2020 12:56 am For the creation of a new ultra short Polyglot book (only the first 10 moves ) I extended the existing EPD database of 14 million unique opening position with 2.5 million, now 16.5 million. Before analyzing those with SF12 I would like to have 20 million.

Anyone having large EPD opening sets, or large PGN's with the more unusual openings?
I don't have a ply count with my EPD positions.
But I have a lot of games.
I guess if you used CCRL and CEGT and TCEC and OMCorr and TWIC that would give you plenty.
CCRL and CEGT is about 2.5 million games.
I have about a million correspondence games.
Twic is smaller, but there are a lot of interesting openings in there.

If you just want volume there is Lichess and Playchess
Lichess has 1,620,424,788 standard rated games now.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Re: Looking for EPD's of the first 10 moves of games

Post by Rebel »

Dann Corbit wrote: Mon Nov 09, 2020 1:55 am
Rebel wrote: Mon Nov 09, 2020 12:56 am For the creation of a new ultra short Polyglot book (only the first 10 moves ) I extended the existing EPD database of 14 million unique opening position with 2.5 million, now 16.5 million. Before analyzing those with SF12 I would like to have 20 million.

Anyone having large EPD opening sets, or large PGN's with the more unusual openings?
I don't have a ply count with my EPD positions.
But I have a lot of games.
I guess if you used CCRL and CEGT and TCEC and OMCorr and TWIC that would give you plenty.
CCRL and CEGT is about 2.5 million games.
I have about a million correspondence games.
Twic is smaller, but there are a lot of interesting openings in there.

If you just want volume there is Lichess and Playchess
Lichess has 1,620,424,788 standard rated games now.
I use all of thes but haven't turned to Lichess yet, will be fun. But I am very much interested in your one million correspondence games because correspondence players often try unusable openings.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
pohl4711
Posts: 2439
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: Looking for EPD's of the first 10 moves of games

Post by pohl4711 »

Rebel wrote: Mon Nov 09, 2020 9:26 am
Dann Corbit wrote: Mon Nov 09, 2020 1:55 am
Rebel wrote: Mon Nov 09, 2020 12:56 am For the creation of a new ultra short Polyglot book (only the first 10 moves ) I extended the existing EPD database of 14 million unique opening position with 2.5 million, now 16.5 million. Before analyzing those with SF12 I would like to have 20 million.

Anyone having large EPD opening sets, or large PGN's with the more unusual openings?
I don't have a ply count with my EPD positions.
But I have a lot of games.
I guess if you used CCRL and CEGT and TCEC and OMCorr and TWIC that would give you plenty.
CCRL and CEGT is about 2.5 million games.
I have about a million correspondence games.
Twic is smaller, but there are a lot of interesting openings in there.

If you just want volume there is Lichess and Playchess
Lichess has 1,620,424,788 standard rated games now.
I use all of thes but haven't turned to Lichess yet, will be fun. But I am very much interested in your one million correspondence games because correspondence players often try unusable openings.
I saw, that you linked my website on your project-site (a huge honor for me - you are one of my computerchess heroes!), but "only" the Download & Links section. There I missed (until now) to offer a download to the AB-testrun games played from the beginning of 2020 until now. That download was only on my main-site, so I am afraid you missed it? 485000 games were played in 2020 until today... A huge number of data.
Here the direct download link:
https://www.sp-cc.de/files/archive_hert_2020.zip

And do you know Thomas Zipproth's awesome LittleBlitzerGUI tools? Even though you dont use that GUI, in that downlad is a very cool tool: pgn2epd
The tools awaits a pgn-file and it builds a epd-file with the FEN codes of all endpositions of all games stored in the pgn games-file. So, if you cut all games in a pgn-file after move 10 and use that tool, it will build the epd of all endpositions after move 10 very fast and automatically.
https://www.sp-cc.de/files/zipproth_lbg_tools.zip
User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Re: Looking for EPD's of the first 10 moves of games

Post by Rebel »

Added 12 million positions from a Lichess database of 2200+ elo, so I have now 28 million unique positions from the first 10 moves, twice as much as the first USB database of 2 years ago.

The 14 million USB book analyzed with Stockfish 10 at 1000ms has 250+ downloads and 22 likes, not much, but enough for an update.

Making a new 28 million USB book at 1000ms with Stockfish 12 (estimated elo 3200) would certainly a good improvement and would take 8 days to complete.

But I wonder if there is interest to make it a commumity project splitting the job in parts of (say) 1 million positions at a higher time control or (even better) fixed depth.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Re: Looking for EPD's of the first 10 moves of games

Post by Rebel »

pohl4711 wrote: Mon Nov 09, 2020 12:54 pm
Rebel wrote: Mon Nov 09, 2020 9:26 am
Dann Corbit wrote: Mon Nov 09, 2020 1:55 am
Rebel wrote: Mon Nov 09, 2020 12:56 am For the creation of a new ultra short Polyglot book (only the first 10 moves ) I extended the existing EPD database of 14 million unique opening position with 2.5 million, now 16.5 million. Before analyzing those with SF12 I would like to have 20 million.

Anyone having large EPD opening sets, or large PGN's with the more unusual openings?
I don't have a ply count with my EPD positions.
But I have a lot of games.
I guess if you used CCRL and CEGT and TCEC and OMCorr and TWIC that would give you plenty.
CCRL and CEGT is about 2.5 million games.
I have about a million correspondence games.
Twic is smaller, but there are a lot of interesting openings in there.

If you just want volume there is Lichess and Playchess
Lichess has 1,620,424,788 standard rated games now.
I use all of thes but haven't turned to Lichess yet, will be fun. But I am very much interested in your one million correspondence games because correspondence players often try unusable openings.
I saw, that you linked my website on your project-site (a huge honor for me - you are one of my computerchess heroes!), but "only" the Download & Links section. There I missed (until now) to offer a download to the AB-testrun games played from the beginning of 2020 until now. That download was only on my main-site, so I am afraid you missed it? 485000 games were played in 2020 until today... A huge number of data.
Here the direct download link:
https://www.sp-cc.de/files/archive_hert_2020.zip

And do you know Thomas Zipproth's awesome LittleBlitzerGUI tools? Even though you dont use that GUI, in that downlad is a very cool tool: pgn2epd
The tools awaits a pgn-file and it builds a epd-file with the FEN codes of all endpositions of all games stored in the pgn games-file. So, if you cut all games in a pgn-file after move 10 and use that tool, it will build the epd of all endpositions after move 10 very fast and automatically.
https://www.sp-cc.de/files/zipproth_lbg_tools.zip
Thanks Stefan, I will look into those.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 6995
Joined: Thu Aug 18, 2011 12:04 pm

Re: Looking for EPD's of the first 10 moves of games

Post by Rebel »

The HERT pgn added 154.983 unique positions to the database, good to have these type of openings.
90% of coding is debugging, the other 10% is writing bugs.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Looking for EPD's of the first 10 moves of games

Post by Dann Corbit »

Rebel wrote: Mon Nov 09, 2020 9:26 am But I am very much interested in your one million correspondence games because correspondence players often try unusable openings.
See:
http://talkchess.com/forum3/viewtopic.php?f=2&t=75739
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.