Duplicate positions??

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
User avatar
David Dahlem
Posts: 900
Joined: Wed Mar 08, 2006 8:06 pm

Duplicate positions??

Post by David Dahlem » Mon Jun 18, 2007 10:17 pm

I'm in the process of creating my own "perfect" book for Arena and CB guis. I started with a collection of opening lines in pgn format. I used Pgn-Extract to clean this pgn file and remove duplicate lines. Pgn-Extract only removes lines with the exact same moves. There are many lines with duplicate final positions, but with a different move order.

How do i find and remove the positions with the same final position and differing move orders?

Thanks
Dave

SalvoSpit

Re: Duplicate positions??

Post by SalvoSpit » Tue Jun 19, 2007 1:38 pm

Hi Dave :) ,


You need to use first PgnScanner 0.75 by Gabriel Guillory (http://transversale.fr/pgnscanner/pgnscanner_eng.htm), type:

verbose on
open Perfect.pgn
dbl -ply=99 -occ=1 -out=doubles.pgn
exit

In this way You will detect partial or full doubles until a given ply. Doubles with transpositions are included so it is possible to get not exactly identical games since moves sequences order can be different. A string as "there are X other doubles until ply=Y" is added in the "Annotator" pgn-tag of the selected games.


Now You can use pgn-extract to build a clean.pgn, type:

pgn-extract -llogfile.txt -D -oclean.pgn perfect.pgn doubles.pgn


Ciao :) ,
Salvo

User avatar
David Dahlem
Posts: 900
Joined: Wed Mar 08, 2006 8:06 pm

Re: Duplicate positions??

Post by David Dahlem » Tue Jun 19, 2007 2:21 pm

SalvoSpit wrote:Hi Dave :) ,


You need to use first PgnScanner 0.75 by Gabriel Guillory (http://transversale.fr/pgnscanner/pgnscanner_eng.htm), type:

verbose on
open Perfect.pgn
dbl -ply=99 -occ=1 -out=doubles.pgn
exit

In this way You will detect partial or full doubles until a given ply. Doubles with transpositions are included so it is possible to get not exactly identical games since moves sequences order can be different. A string as "there are X other doubles until ply=Y" is added in the "Annotator" pgn-tag of the selected games.


Now You can use pgn-extract to build a clean.pgn, type:

pgn-extract -llogfile.txt -D -oclean.pgn perfect.pgn doubles.pgn


Ciao :) ,
Salvo
Hi Salvo

Thank you very much. I already have PgnScanner 0.75. I'll try your suggestion shortly.

Regards
Dave

User avatar
David Dahlem
Posts: 900
Joined: Wed Mar 08, 2006 8:06 pm

Re: Duplicate positions??

Post by David Dahlem » Tue Jun 19, 2007 3:10 pm

Hi Salvo

For some reason, this doesn't seem to work for me. I followed your instructions exactly, but zero doubles are found, I even manually copied a game in Perfect.pgn so there were two exact copies of the same game. PgnScanner didn't find any doubles.

Pgn-Extract will find exact doubles, but not the same final position doubles such as this simple example ...

1. e4 e5 2. Nf3 Nc6
1. Nf3 Nc6 2. e4 e5

Different moves but same final position. I know there are many such dupes in Perfect.pgn. I suppose it wouldn't hure to have these duplicate position lines in my book. It would just create an unnecessarily large book. After all, a "Perfect" opening book needs to be perfect. :-)

Regards
Dave

SalvoSpit

Re: Duplicate positions??

Post by SalvoSpit » Tue Jun 19, 2007 7:43 pm

Hi Dave :) ,

if the games are in this form:

[Event "?"]
[Site "?"]
[Date "2007.06.11"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 Nc6 1/2-1/2

[Event "?"]
[Site "?"]
[Date "2007.06.11"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]

1. Nf3 Nc6 2. e4 e5 1/2-1/2


You can use SCID:
1- File-->New-->Perfect.si3
2- Window-->Maintenance window-->Delete twin games-->(Set only these options:
First 4 letters only, Alll games in the database, shorter game. Unflag all other options.
3 - Press the Delete games button-->Press OK-->Press Close
4 - Window-->Maintenance window-->Compact database-->compact game file-->Press OK
5 - Close SCID
6- Open newly Perfect.si3
7- Tools-->Export all filter games

You now have the clean perfect.pgn

Ciao :) ,
Salvo

User avatar
David Dahlem
Posts: 900
Joined: Wed Mar 08, 2006 8:06 pm

Re: Duplicate positions??

Post by David Dahlem » Tue Jun 19, 2007 10:13 pm

SalvoSpit wrote:Hi Dave :) ,

if the games are in this form:

[Event "?"]
[Site "?"]
[Date "2007.06.11"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 Nc6 1/2-1/2

[Event "?"]
[Site "?"]
[Date "2007.06.11"]
[White "?"]
[Black "?"]
[Result "1/2-1/2"]

1. Nf3 Nc6 2. e4 e5 1/2-1/2


You can use SCID:
1- File-->New-->Perfect.si3
2- Window-->Maintenance window-->Delete twin games-->(Set only these options:
First 4 letters only, Alll games in the database, shorter game. Unflag all other options.
3 - Press the Delete games button-->Press OK-->Press Close
4 - Window-->Maintenance window-->Compact database-->compact game file-->Press OK
5 - Close SCID
6- Open newly Perfect.si3
7- Tools-->Export all filter games

You now have the clean perfect.pgn

Ciao :) ,
Salvo
Hi Salvo.

I don't currently have Scid, but i'll download it and try your suggestion.

Thanks
Dave

SalvoSpit

Re: Duplicate positions??

Post by SalvoSpit » Wed Jun 20, 2007 5:25 am


Post Reply