we are moving in circles and not getting anywhere. Your statements are very vague not supported by the facts. Those shown in the above posts are irrelevant and let me tell you why. I will once for all put the facts on the table and people around this forum can discuss about the QUALITY. Thanks God, users can read both parties and theirs statements, otherwise they would probably give up after reading your 2-3 posts. And to answer your questions, why only me is replying to all the posts : well because I decided as a head of the project to be responsible for the overall PR. Having said that, let's look at the facts :
FACT 1: Same or similar games?
There are so many examples like this in each database not only OM and we know about it. There are games similar to each other but 1-2 moves different. The thing is, there are more ''games recorders'' and they all do their job. And believe it or not they sometimes do it differently. And nobody knows who was right after 10-20 years. We also came across Grand Masters playing the same opening 2 years later but made 2 moves extra than before and then they drew again. These were two different games and somebody from distance may say it's the same game but mistake in date or number of moves. You cannot judge the quality of the database on these games. There is no algorithm to know these two games are same. We de-duplicate only the body of the games (exact moves) not caring about the headers therefore the algorithm is totally independent on the header. If there are such games inside it is not a mistake of the database we cannot manually go through 5.2 million games and look if there are these examples. There is no effect on the quality of the database whether or not there are these plays.Quote from Dann: That said, this database looks to be better than average. It also has a number of games that are not in Chessbase. But some of the ones that are there look suspect. For example, here is a game that is near-identical to one in the Chessbase Correspondence database (2006), but Opening Master is missing the last two moves, has a different name for the White player, and a different date:
[Event "URS-ch08 corr6768"]
[Site "USSR"]
[Date "1967.??.??"]
[Round "?"]
[White "Sokolsky, Aleksey P"]
[Black "Zagorovsky, Mikhail Pavlovich"]
[Result "0-1"]
[ECO "A00"]
[PlyCount "60"]
[EventDate "1967.??.??"]
[EventType "corr"]
[EventCountry "URS"]
[Source "ChessBase"]
[SourceDate "2000.04.19"]
1. b4 e5 2. Bb2 f6 3. e4 Bxb4 4. Bc4 Nc6 5. f4 d6 6. c3 Ba5 7. Ne2 Qe7 8. O-OBb6+ 9. Kh1 Bd7 10. d4 O-O-O 11. Nd2 Nh6 12. Bd5 Na5 13. a4 f5 14. Nc4 exd4 15.cxd4 fxe4 16. Nxa5 Bxa5 17. Bc3 Bxc3 18. Nxc3 c6 19. Bxe4 d5 20. Bf3 Rhe8 21.a5 Nf5 22. Rb1 Ne3 23. Qb3 Bf5 24. Rbe1 Qf6 25. a6 b6 26. Rc1 Nxf1 27. Nxd5 Rxd5 28. Qxd5 Ng3+ 29. hxg3 Re1+ 30. Rxe1 cxd5 0-1
[Event "USSR corr"]
[Site "USSR corr"]
[Date "1990.??.??"]
[Round "?"]
[White "Sokolovski, R."]
[Black "Zagorovsky, Mikhail Pavlovich"]
[Result "0-1"]
[ECO "A00"]
[PlyCount "58"]
[EventDate "1990.??.??"]
[Source "Opening Master"]
[SourceDate "2008.09.09"]
1. b4 e5 2. Bb2 f6 3. e4 Bxb4 4. Bc4 Nc6 5. f4 d6 6. c3 Ba5 7. Ne2 Qe7 8. O-OBb6+ 9. Kh1 Bd7 10. d4 O-O-O 11. Nd2 Nh6 12. Bd5 Na5 13. a4 f5 14. Nc4 exd4 15.cxd4 fxe4 16. Nxa5 Bxa5 17. Bc3 Bxc3 18. Nxc3 c6 19. Bxe4 d5 20. Bf3 Rhe8 21.a5 Nf5 22. Rb1 Ne3 23. Qb3 Bf5 24. Rbe1 Qf6 25. a6 b6 26. Rc1 Nxf1 27. Nxd5 Rxd5 28. Qxd5 Ng3+ 29. hxg3 Re1+ 0-1
As mentioned by Robin Smith in his post above, there is even a third source. (btw it's such an honor to read from Robin on our posts, and his book is just inspirational for anybody who would like to analyze chess games using computer)
FACT 2: ELO assigned to players before the measurement... extract from our article on our web ppage....The analysis of the open game is a technical as well as ethical. In his book, Smith, R.: Modern Chess Analysis, Gambit 2004, ICCF Grand Master and US Champion Robin Smith for the first time opened up a dilemma on using computer analysis in the correspondence chess. The computer analysis is no longer a domain of correspondence chess players. We all know even the best players use the team of advisers and computer analysis so they have ready novelties in the openings or playing unexpected moves in the already played novelties. Smith in this booked showed many weaknesses of the engines and inability to pick their results of analysis. Therefore you need to strictly differentiate supporting analysis which helps chess player to verify if he didn’t do any blunder moves (big mistakes) or looking if the variety of known moves is complete or not. Also the dynamic or interactive analysis simulates possible course of the game however nobody knows if players will play it – only the future will show it.
Reply : Let us inform you, that few years back there was a general effort made by various top players within the association to assign ELOs even to players who were before the ELO measurement. They simulated their historical games and based on the outcome they assigned the ELOs to the players. We very well know that you can't have ELO when there was no ELO system, but now you see you can actually have an ''assigned ELO''. And this is the case. The data is 2003 because it was most likely done in 2003 even though we talk about the game from 1895. Again, if this was written by the chess recorders like this, we have no possibility to go game by game through 5.2 million of games. The integrity check was performed and the quality of the database remains.Quote from Dann : You are right about correspondence games but I am also concerned about the player and date info. Here is another error: this game was played in 1895 (not 2003!): The ratings are also bogus (no FIDE rating system was in effect in 1895).
FACT 3 : PGN Extract by Barnes or any other freeware programs
Reply: I already mentioned in the posts above. Using freeware programs creates just mess. Look at the games you posted in your post, don't you think it's little bit suspicious when from a header in CB you have disaster in PGN which you then export further on? Each of our games DO HAVE HEADER (please provide a name / number of game which doesn't). The classification on A00p, r, q sorry, I still don't know what is the point. There is no such classification. But perhaps I misunderstood, could you please once more clarify? For your information about deduplication process using CA 9.1. When you perform a deduplication process in CA in a sample file you get zero duplicates. You make export to PGN and import into CB. And guess what in CB you find many duplicates. Just for your information, which tools you are using. CB and CA have different de-duplication algorithms, therefore we deduplicate strictly WITHOUT HEADERS. The standard deviation is caused by the headers and algorithm setup. One more cherry : In CA you perform deduplication process on the body of the game only, you would think nothing else can happen, you add headers or their parts and bingo, you got plenty of duplicates suddenly again. After a time spent on analysis, people learn not to use freeware programs or any ''PGN Extracts by XYZ'' simple because they don't do the jobs professionally. I won't comment SCID either - the only advantage it has - it's free.Quote from Dann about performing his analysis by exporting our data out by:
Chess Assistant 9.1
Scid
Pgn-Extract by Barnes
The mistakes are in your database. I unloaded the data as PGN from ChessBase so that it could be processed by other programs. ChessBase is wrong. I examined the dups and they definitely were dups.
I used Scid, ChessAssistant, and PGN-Extract to remove the dups.
FACT 3 : Deduplication process in CA
Quote from Dann : Try the following experiment yourself:
1. Unload the programs using ChessBase itself as PGN or convert the archive to a decompressed CB file.
2. Load the data into ChessAssistant 9.1
3. Run the check for duplicates function.
4. Look at the duplicate games. There will be a huge pile of them but if you examine them one by one you will see very clearly that they are duplicates. The header problems are header problems. Fix them or don't -- I don't care.
Reply : in previous reply we tried to explain in brief the deduplication process (in brief I repeat, because you write huge manuals or books on this). The advantage of deduplication process in CB is you receive one file with games which are duplicated. The game with incomplete header or information is crossed out and the one which has it stays. You clearly see both games. In CA you receive two files one for trash and one deduplicated. Now talk about transparency in comparing the games. We have great experience in comparing the both programs in deduplication however we still haven't found the ultimate truth. Please sort out our file into :
1) according to moves
2) find the most games which has the same number of moves
3) compare players
This should convince you. Please run your process of deduplication and tell us exactly which 2 games are duplicates. You don't need to mention thousands of games. Just 5 examples (e.g. game number 418 is equal to game number 2,345, or game 9,034 is equal to 23,450 etc... But please work with correctly extracted PGNs and provide us with equal games with same names, same dates, same bodies etc...)
PS in CA you know that first name and last name is in one field.
FACT 4 : Rolf syndrome
1) Rybka has nothing to do with our business, we only provided it to our members as articles
http://www.openingmaster.com/index.php? ... -Articles/
2) Your every statement is about our ''bad'' quality, however you don't support it with any facts. Just vague statements and some psychological bogus. Now, you are very respected member of this team room, that is the fact and 3,889 posts can only demonstrate it. You mentioned yourself you like to give hard times to other members and not many survived your psychological torture. Well, so far we reply to you, because we have the facts which we can demonstrate. However sooner or later this goes in the circles and we will start to refer to the previous posts made. We do appreciate your feedback, but please be constructive and provide FACTS or statements which are supported by facts. As said above, the times are gone when somebody said something and the crowd follows. Now people do check both sides.
3) Be sure the first quality database was made to be used by me so I made sure this was tuned up into the value add. Those players who don't have that much time to collect we offer a free or paid service so they can download the same thing which we are using. Those people will know better about the quality and can decide on their own and apologies to say with or without Rolf.
4) On every forum there are people who post post and post everything and everywhere just to be visible. And talking general is just time consuming. If you noticed we offer on our web site three types of databases OM main, OM 2500 ELO and OM 2300 ELO. Every player can choose which suits him/her best.
Conclusion: if you have a specific question, please raise it specifically, and please without vague statements about quality.
As you see this is still written in the calm language as I and the whole team around OM respect any opinion. However we have also the limits of respect in the business and private.
Best Regards,
Alexander Horvath, SIM ICCF
http://www.openingmaster.com