Modular opening book SF analysed 87417 pos., beta-1

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Stefan will be happy ... first beta!

Post by Frank Quisinsky »

Hi Ferdinand,

OK, found my mistake too.
A short error checking will be make it a bit better.

Best
Frank
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: But ... have a look here!

Post by Frank Quisinsky »

OK, again ...

You wrote in your readme:
"Duplicates in filtered_big.pgn file are also removed."

Not seen at first, sorry!

---

I have:
87.417 games + 4.728 games in update database = 92.145 games.

pgn-select
with parameter

Code: Select all

pgn-extract --fuzzydepth 0 --duplicates dupes.pgn --output unique-alpha.pgn alpha.pgn
Result = 30.220 games without doubles

Or with the tool by Norm:

Code: Select all

pgnFin alpha.pgn

you need pgn-extract available
it creates outF.epd

I then trim it, and add id

epdtrim outF.epd
idopcode outT.epd
copy idlist inlist
epdInsert inlist outT.epd

creates outN.epd
then I rename outN.epd to whatever you want
Result = 30.238 games without doubles

With SF analysis I reject
9.391 with doubles and 50 by hand (very rarely lines with unusal combinations)

Result = 9.441 with doubles or 4.924 without doubles

...

Now ...
If I copy the 30.220 games (file called test.pgn) and the *.epd with 87.417 SF analyses in your directory ... I got the result:


24.173 positions in filtered_test.pgn
And this can't be right.
It must be 26.629 ...


Because:
87.417 in Alpha.pgn - 9.441 reject = 77.976 with doubles or
77.976 with doubles + 4.726 update with doubles = 82.704 final.

82.704 with doubles = 26.629 without doubles (after the tool by Norm).
and all what SF found is reject.

But your tool give not 26.629, your tool give ... 24.173!!

Code: Select all

start
refepd, alpha.epd
refpgn, test.pgn
minscorecp, -30
maxscorecp, +50
mincntqueen, 0
maxcntqueen, 2
end
Best
Frank

So, better at first is not to reject the doubles (max. with parameter in criteria.txt) and we can check where ist the mistake if we have 4x output with *.epd. Think so ... hope I am right.

Must drive to my prof. work ...
Can answere in the evening!
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: But ... have a look here!

Post by Ferdy »

Frank Quisinsky wrote:OK, again ...

You wrote in your readme:
"Duplicates in filtered_big.pgn file are also removed."

Not seen at first, sorry!

---

I have:
87.417 games + 4.728 games in update database = 92.145 games.

pgn-select
with parameter

Code: Select all

pgn-extract --fuzzydepth 0 --duplicates dupes.pgn --output unique-alpha.pgn alpha.pgn
Result = 30.220 games without doubles
From my calculation:

Code: Select all

alpha.pgn        = 87417
upd_a00-e99.pgn  = 4728
alpha-1.pgn      = 87417 + 4728 = 92145
Run pgn-extract to get unique games from alpha-1.pgn, comparing the end position only.

Code: Select all

pgn-extract --fuzzydepth 0 --duplicates dupes.pgn --output unique-alpha-1.pgn alpha-1.pgn
unique-alpha-1.pgn = 30236

Yours : 30220
Mine : 30236
Could you verify your result?
Norm Pollock
Posts: 1056
Joined: Thu Mar 09, 2006 4:15 pm
Location: Long Island, NY, USA

Re: But ... have a look here!

Post by Norm Pollock »

Fsux en passant in end positions could be causing the difference. Try using the --nofaux option with pgn-extract.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: But ... have a look here!

Post by Ferdy »

Frank Quisinsky wrote:OK, again ...

You wrote in your readme:
"Duplicates in filtered_big.pgn file are also removed."

Not seen at first, sorry!

---

I have:
87.417 games + 4.728 games in update database = 92.145 games.

pgn-select
with parameter

Code: Select all

pgn-extract --fuzzydepth 0 --duplicates dupes.pgn --output unique-alpha.pgn alpha.pgn
Result = 30.220 games without doubles

Or with the tool by Norm:

Code: Select all

pgnFin alpha.pgn

you need pgn-extract available
it creates outF.epd

I then trim it, and add id

epdtrim outF.epd
idopcode outT.epd
copy idlist inlist
epdInsert inlist outT.epd

creates outN.epd
then I rename outN.epd to whatever you want
Result = 30.238 games without doubles

With SF analysis I reject
9.391 with doubles and 50 by hand (very rarely lines with unusal combinations)

Result = 9.441 with doubles or 4.924 without doubles

...

Now ...
If I copy the 30.220 games (file called test.pgn) and the *.epd with 87.417 SF analyses in your directory ... I got the result:


24.173 positions in filtered_test.pgn
And this can't be right.
It must be 26.629 ...


Because:
87.417 in Alpha.pgn - 9.441 reject = 77.976 with doubles or
77.976 with doubles + 4.726 update with doubles = 82.704 final.

82.704 with doubles = 26.629 without doubles (after the tool by Norm).
and all what SF found is reject.

But your tool give not 26.629, your tool give ... 24.173!!

Code: Select all

start
refepd, alpha.epd
refpgn, test.pgn
minscorecp, -30
maxscorecp, +50
mincntqueen, 0
maxcntqueen, 2
end
Best
Frank

So, better at first is not to reject the doubles (max. with parameter in criteria.txt) and we can check where ist the mistake if we have 4x output with *.epd. Think so ... hope I am right.

Must drive to my prof. work ...
Can answere in the evening!
I looked at the epd called _00001-87417-analysis.epd having 87417 epd lines we call this alpha.epd. In this file there are duplicates, one is this.

Code: Select all

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq -
and there are 12 with different ce values.

Code: Select all

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10299"; ce -30; acd 29; acs 30; acn 304539901; pv Lxe4 ;

Code: Select all

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10311"; ce -15; acd 29; acs 30; acn 306083854; pv Lxe4 Dh4 De2 Sf6 Lf3 O-O dxe6 Lc6 Ld2 Te8 g3 Lxf3 Sxf3 Dg4 h3 Dxe6 Dxe6+ Txe6+ Kf1 Lxc3 Lxc3 Sc6 Sd4 Sxd4 Lxd4 Kf7 Td1 ;

Code: Select all

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10320"; ce -13; acd 27; acs 30; acn 303890551; pv Lxe4 Dh4 De2 Sf6 Lf3 O-O g3 Dd4 Kf1 La6 Sb5 Lxb5 cxb5 Sxd5 Sh3 Df6 Kg2 c6 Lg5 Df7 Sf4 a6 a4 axb5 axb5 Txa1 Txa1 ;

Code: Select all

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10566"; ce -31; acd 27; acs 30; acn 299206348; pv Lxe4 Dh4 Ld3 exd5 Sf3 De7+ Le3 dxc4 Lxc4 Lxc3+ bxc3 Sc6 O-O O-O-O Te1 Df8 Sd4 Sf6 Sxc6 dxc6 Dc2 Kb8 f3 Da3 Lb3 The8 Lf2 ;

[...] more
How your tool handle this? Is this included in alpha.pgn or beta-1.pgn? Which value to use when filtering by ce values?
These duplicates makes the calculation complicated.

My tool will search ce values in a given window say -30/+50, then let pgn-extract find the games and remove duplicates by end position matching. In this case my tool will include this epd as good, because there is epd with ce within -30/+50, although there is ce -31.

It seems to me that the ref epd should be unique. If there are more than 1 epd with different ce values then those should be identified by analyzing engine, example.

Code: Select all

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10299"; ce -30; acd 29; acs 30; acn 304539901; pv Lxe4 ; Ae "Sf8";
rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10311"; ce -15; acd 29; acs 30; acn 306083854; pv Lxe4 Dh4; Ae "K10";
Same epd but with different ce values but with specified Ae opcode.



But not below, same engine, same epd different ce values.

Code: Select all

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10299"; ce -30; acd 29; acs 30; acn 304539901; pv Lxe4 ; Ae "Sf8";

rn1qk1nr/pbpp2pp/1p2p3/3P4/1bP1p3/2NB4/PP3PPP/R1BQK1NR w KQkq - id "10299"; ce -36; acd 29; acs 30; acn 314539901; pv Lxe4 ; Ae "Sf8";
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: But ... have a look here!

Post by Frank Quisinsky »

Hi Ferdinand,

same epd ... different values!

Should be clear ...
I am using 4 cores with 4x Hyperthreading.
Final results never will be the same.

Indeed bad ... I am thinking a long time about the problem!
But with more cores and hyperthreading I can get in 30 seconds by move a clearly better result.

Best
Frank
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: But ... have a look here!

Post by Frank Quisinsky »

Hi Ferdinand,

Example:
If I have 4 times the EPD with 4 different values ...
reject with 0.50 / -0.30

1. Value = -0.20
2. Value = -0.25
3. Value = -0.30
4. Value = -0.31

I reject only 1/4 with Value = -0.31 after Stockfish analysis. Because I do it by hand with game number information under Chessbase GUI. I created from the 87.417 PGN file CBH database files. PGN and EPD have the same game numbers! With epdOrder by Norm I can sort the EPD file with CE Information and delete the game numbers by hand in CBH file.

Reject information can be found in my database v1.03 file in the Stockfish subdirectory: reject

With other words ... during this work I can't see that the position is 4 times in the database.

After I reject what Stockfish find out + the update database of 4.728 games I build the beta-1.pgn file.

In beta-1.pgn (82.704 games) is now three times the PGN included because only 1 time removed.

That is indeed a problem yes!
Because better is to reject 4/4.

Maybe possible with your programming?

If epd more as 1 time in database delete all of it if one of them higher as value in criteria.txt

---

Now Komodo analysed the database without doubles. I will not have the problem again. After Komodo all other engines will be analysed again without doubles.

You wrote:
This duplicates makes the calculation complicated.

I know that ... and not thinking about it at first.

Best
Frank

Much more easy is to do this one.
Forget the 87.417 alpha database.

New main database is the beta-1.pgn database after Stockfish analyses with 26.619 games without doubles or 82.704 with doubles. If Komodo is ready we have 26.619 positions in EPD with ce. too because Komodo analysesd not all ... only the smaller database without doubles.

Maybe it make more sense to work and compare results from your tool with the beta-1 database and not with the alpha.pgn database with or without the update I create.

I don't know!

In around 4 days Komodo is done and I can create the beta-2 file and Houdini will be the next.

Best
Frank
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: But ... have a look here!

Post by Frank Quisinsky »

Hi Norm,

--nofaux
I think --nofauxep
?

But how I can used that?

pgn-extract --nofauxep ... and than?
I try out different combinations!

At the moment differents in the final results !

30.238 with your tools is right
30.220 with pgn-extract is wrong

18 games missed!
How I can find the 18 games and how I can create with pgn-extract the right results?

The hint is great!!

I am thinking if I am working with --nofauxep I must do it in two steps and not in one step in combination with ...

pgn-extract --fuzzydepth 0 --duplicates dupes.pgn --output unique-beta-1.pgn beta-1.pgn

Best
Frank
Frank Quisinsky
Posts: 6808
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: But ... have a look here!

Post by Frank Quisinsky »

Hi Ferdinand,

again ... I reject by hand under Chessbase GUI.
Possible that I made mistakes here!

Example:
I reject not game number 60212, I reject game number 60112.

If so, all games I reject I have in on other database (can check that later).
Komodo will find such mistakes and will give me again the same "bad lines" Stockfish found.

After all ...
This is possible but I think I am working without many mistakes here. I can't do it with an other way because no tools are available for it.

In reality, after Komodo analysis, the first good database will be available.

Maybe you should work in testing with your new tool with the beta-1.pgn or the beta-2.pgn, in a short time ... in 3-4 days available.

Best
Frank
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: But ... have a look here!

Post by Ferdy »

Norm Pollock wrote:Fsux en passant in end positions could be causing the difference. Try using the --nofaux option with pgn-extract.
I tried to run it with option --nofauxep but I get same numbers. Note that the fen that I use is from the analyzed epd file which is already existing. I don't know if Frank's tool he used to generate the analyzed epd from pgn file considers the undefined ep sq.