I have a small database, (let's say 300000 games), with truncated games from length 40ply, with player names and game statistics. The games are from a obk book, made from a big database with about 800000 games. I converted to pgn with obk2bin. So the player names and game statics got lost in the process. I was able to replace the games by the original games, (with player names and game statics), by first truncating the original games to 40ply. Than adding the databases together and search for doubles in Chessbase 15. But the games are still only 40 ply. I want to find the complete games.
Is there some way to achieve this? I have tried the find duplicates options in Chess Assistant, but I find them very confusing. I also tried the find twins with SCID but without any luck.
How to find original games in big database
Moderators: hgm, Rebel, chrisw
-
- Posts: 239
- Joined: Fri Jul 06, 2018 4:23 pm
- Full name: Jonathan Cremers
-
- Posts: 91
- Joined: Sat Nov 02, 2019 6:42 pm
- Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ
Re: How to find original games in big database
For each game, I would fast forward to the 40th move and search for all games matching the position at that truncation point.
This would require a very short script using something like pgn-extract or scid.
This would require a very short script using something like pgn-extract or scid.
-
- Posts: 239
- Joined: Fri Jul 06, 2018 4:23 pm
- Full name: Jonathan Cremers
Re: How to find original games in big database
I don't know how to write scripts. I'm not a programmer. I can only do some very basic stuff with command line apps. There are also some shorter games included I would like to be able to search for. Most games are 40 ply, but some are shorter, I think because the original games where also shorter.
-
- Posts: 91
- Joined: Sat Nov 02, 2019 6:42 pm
- Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ
Re: How to find original games in big database
> very basic stuff with command line apps
This is all we need.
pgn-extact truncated-games.pgn -C -F > note-final-positions.pgn
Then extract the final positions using your most comfortable method.
This is all we need.
pgn-extact truncated-games.pgn -C -F > note-final-positions.pgn
Then extract the final positions using your most comfortable method.
-
- Posts: 4833
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: How to find original games in big database
Try these command lines using pgn-extract.Jonathan003 wrote: ↑Sun Apr 19, 2020 7:40 pm I have a small database, (let's say 300000 games), with truncated games from length 40ply, with player names and game statistics. The games are from a obk book, made from a big database with about 800000 games. I converted to pgn with obk2bin. So the player names and game statics got lost in the process. I was able to replace the games by the original games, (with player names and game statics), by first truncating the original games to 40ply. Than adding the databases together and search for doubles in Chessbase 15. But the games are still only 40 ply. I want to find the complete games.
Is there some way to achieve this? I have tried the find duplicates options in Chess Assistant, but I find them very confusing. I also tried the find twins with SCID but without any luck.
1. To extract games from big.pgn those similar games in obk.pgn with less than 40 plies and save those in extract1.pgn
Code: Select all
pgn-extract -U --fuzzydepth 0 -oextract1.pgn obk.pgn big.pgn
Code: Select all
pgn-extract -U --fuzzydepth 40 -oextract2.pgn obk.pgn big.pgn
When I tested this method on the small number of games, it worked.
-
- Posts: 239
- Joined: Fri Jul 06, 2018 4:23 pm
- Full name: Jonathan Cremers
Re: How to find original games in big database
Thanks for the recommendations. I tried the method Ferdy described.
The result is not perfect. Allot of sidelines are also included in the results. I use obk2bin to convert a obk book to pgn. Than I convert to cbh in Chessbase 15 and search for games with a ? and delete these games, so only the main lines remains in the database. So I don't like it if sidelines are added again.
The result is not perfect. Allot of sidelines are also included in the results. I use obk2bin to convert a obk book to pgn. Than I convert to cbh in Chessbase 15 and search for games with a ? and delete these games, so only the main lines remains in the database. So I don't like it if sidelines are added again.
-
- Posts: 239
- Joined: Fri Jul 06, 2018 4:23 pm
- Full name: Jonathan Cremers
Re: How to find original games in big database
What to do next?
Here is an example how 'note-final-positions' looks like:
[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]
[PlyCount "40"]
1. d4 Nf6 2. c4 e6 3. Nc3 Bb4 4. Qc2 d5 5. a3 Bxc3+ 6. Qxc3 Ne4 7. Qc2 c5
8. dxc5 Nc6 9. cxd5 exd5 10. e3 Qa5+ 11. b4 Nxb4 12. axb4 Qxa1 13. Bb5+ Kf8
14. Ne2 a6 15. Bd3 Bd7 16. f3 Ba4 17. Qb2 Qxb2 18. Bxb2 Ng5 19. Nd4 Bd7 20.
Kf2 f6 { "r4k1r/1p1b2pp/p4p2/2Pp2n1/1P1N4/3BPP2/1B3KPP/7R w - - 0 21" }
1/2-1/2
[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]
[PlyCount "40"]
1. d4 Nf6 2. c4 e6 3. Nc3 Bb4 4. Qc2 d5 5. a3 Bxc3+ 6. Qxc3 Ne4 7. Qc2 c5
8. dxc5 Nc6 9. cxd5 exd5 10. e3 Qf6 11. f3 Qh4+ 12. g3 Nxg3 13. Qf2 Nf5 14.
Qxh4 Nxh4 15. b4 a6 16. Kf2 Ne5 17. Bb2 f6 18. Rd1 Be6 19. Ne2 Bf7 20. Rg1
Nc4 { "r3k2r/1p3bpp/p4p2/2Pp4/1Pn4n/P3PP2/1B2NK1P/3R1BR1 w kq - 6 21" }
1/2-1/2
[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]
[PlyCount "40"]
1. d4 Nf6 2. c4 e6 3. Nc3 Bb4 4. Qc2 d5 5. a3 Bxc3+ 6. Qxc3 Ne4 7. Qc2 c5
8. dxc5 Nc6 9. cxd5 exd5 10. Nf3 Qf6 11. e3 Bg4 12. Be2 O-O 13. O-O Rfe8
14. Bd2 d4 15. Rad1 Nxd2 16. Rxd2 dxe3 17. Rd6 Re6 18. fxe3 Rxd6 19. cxd6
Bxf3 20. Bxf3 Qxd6 { "r5k1/pp3ppp/2nq4/8/8/P3PB2/1PQ3PP/5RK1 w - - 0 21" }
1/2-1/2
[Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]
[PlyCount "40"]
I got these messages 'String length 76 is to long for the line length of 75'
What does it mean? I there something wrong with the pgn output from obk2bin?
I know if I convert a bin book to pgn with polyglot-tolerant. The output has many transpositional errors. Threefold repetitions that were not in the original games. Like knights and bishops getting back to there starting positions, rooks moving back and forwards ect...
I didn't find these errors in the output of obk2bin till now.
-
- Posts: 7216
- Joined: Mon May 27, 2013 10:31 am
Re: How to find original games in big database
...
Last edited by Henk on Thu Apr 23, 2020 1:29 pm, edited 1 time in total.
-
- Posts: 2487
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: How to find original games in big database
Right what it says. The string is too long so that you need to allow more line length.Jonathan003 wrote: ↑Thu Apr 23, 2020 12:57 pmI got these messages 'String length 76 is to long for the line length of 75'
What does it mean?
Check the -w argument for pgn-extract:
Source: https://www.cs.kent.ac.uk/people/staff/ ... lp.html#-wOutput line length (-w or --linelength)
The -w flag allows an approximate line length to be set for output. Normally games are output with lines up to a maximum of 75 characters. Use the -w flag if you want longer output lines. For instance, you might want all the moves of a game to appear on a single line. You would get this effect by specifying -w1000 (say):
pgn-extract -w1000 file.pgn
If some games are more than 1000 characters long then just increase the value.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 91
- Joined: Sat Nov 02, 2019 6:42 pm
- Full name: ɹǝƃɹǝqǝᗡ ǝɔnɹꓭ
Re: How to find original games in big database
I like the Linux command line, so something like this:Jonathan003 wrote: ↑Thu Apr 23, 2020 12:57 pmWhat to do next?
Here is an example how 'note-final-positions' looks like:
pgn-extact truncated-games.pgn -C -F | awk -F \" '/{/ {print $2}' > fenlist.txt
>What to do next?
Loop through the fenlist and extract the complete games from some MegaComplete pgn file:
pgn-extact truncated-games.pgn -C -F | awk -F \" '/{/ {print $2}' | while read FEN; do pgn-extract -Tf"$FEN" MegaComplete.pgn; done