James,
thanks for your analysis and valid question. Basically for the almost doubles we answered this in our previous post.
http://www.talkchess.com/forum/viewtopi ... =&start=20
there we tried to explain the whole situation and why it is almost impossible to search 5.2 million games game by game. The percentage is very small to almost doubles and as we learnt sometimes it is not almost double, but really a different game just with one extra move, or in other cases there were two recorders of the games which recorded the game with +/- one game. I wonder if somebody did similar search in the most famous databases such as Mega from CB or Huge from CA.
To your question:
However doing a default double search in chessbase including similar names finds 1258 doubles. Having the same game with a small difference in the name (perhaps only 1 letter) undermines the quality claim a little.
The creators of databases use different methods of dedups, they cannot even agree between themselves (and now try to convince your customers). It's like egg-chicken discussion. So our technical answer :
If we exclude headers and we deduplicate only the body of the game (have you included the headers in your dedup process?) we eliminate huge problems in the deduplication process which includes headers. Our own experience is whenever you include headers in the deduplication process you will always find doubles. Always. Even with a simple query. This is never-ending process when you want to achieve real ZERO duplicates.
Second question of James:
Also the base seems to include correspondence games - so quantity-wise shouldn't it be compared to megabase + correspondence 2009?
Since this is far away to the original post of A00 - Irregular openings, we will use your question as a basis for another valid discussion. We are opening a new post (link will be added later to this post) with a subject : ''Difference in quality between correspondence games and practical games'' and try to answer your question there.
Best Regards,
Alexander Horvath, SIM ICCF (ELO 2474)
http://www.openingmaster.com