Alexander
Using the default search the first 3 of the 1254 doubles that are listed are games:
27778/28283
49660/49724
45026/45030
James
			
			
									
						
										
						A00 - Irregular Openings / Orangutan-Sokolsky
Moderator: Ras
- 
				James Constance
- Posts: 358
- Joined: Wed Mar 08, 2006 8:36 pm
- Location: UK
- 
				budfit
Re: A00 - Irregular Openings / Orangutan-Sokolsky - FACTS on
Hi James, 
thanks for the effort on this, let me clarify all three samples.
45026/45030 - comment : in both games, the 24th move is different therefore no double. What is your default setting?
49660/49724 - comment : in both games the 40th move is different. Therefore no double.
See, this was our point before. There are no pseudo doubles or quasi doubles. You need to have a same body of the game and similar headers, then we can talk about ''real double''. We can go on and on through all 1254 games but most likely the result would be similar. The deduplication itself is a science and we spent enormous time on figuring out the most optimal way (although still not perfect)
And special message to Rolf :
You have your opinion on Chessbase Mega Database that is fine. They are big, they have been around for long time... so as Microsoft Windows.... Until somebody challenges that, chess players don't have much choices.
We only offer alternative. That's all. It's up to each chess player to decide on his/her own.
PS for your info TWIC is not a database, it is collection of separate files. Somebody would need to organize it into a database, deduplicate, filter analyze and then offer as final product of a database.
Best Regards,
Alexander Horvath, SIM ICCF
http://www.openingmaster.com
			
			
									
						
										
						thanks for the effort on this, let me clarify all three samples.
27778/28283 - comments : this is not exact double. The first game has 23 moves and the second game has 22 moves. This could be a mistake of two various recorders, we cannot control this unless we go one by one through the entire 5.2 millions. With this ''default'' process you would find thousands of doubles both in CB and in CAUsing the default search the first 3 of the 1254 doubles that are listed are games:
27778/28283
49660/49724
45026/45030
45026/45030 - comment : in both games, the 24th move is different therefore no double. What is your default setting?
49660/49724 - comment : in both games the 40th move is different. Therefore no double.
See, this was our point before. There are no pseudo doubles or quasi doubles. You need to have a same body of the game and similar headers, then we can talk about ''real double''. We can go on and on through all 1254 games but most likely the result would be similar. The deduplication itself is a science and we spent enormous time on figuring out the most optimal way (although still not perfect)
And special message to Rolf :
Before you take somebody else's theory as your own, make some investigation yourself. Since the very beginning you have been always only nodding what other members said, never did your analysis on your own which ''good reporters'' usually doQuote from Rolf : Again, you are large because you still have too much doubles. So that isnt quality either. Admit it; TWIC and Megabase are better. So much better. Largely.

You have your opinion on Chessbase Mega Database that is fine. They are big, they have been around for long time... so as Microsoft Windows.... Until somebody challenges that, chess players don't have much choices.
We only offer alternative. That's all. It's up to each chess player to decide on his/her own.
PS for your info TWIC is not a database, it is collection of separate files. Somebody would need to organize it into a database, deduplicate, filter analyze and then offer as final product of a database.
Best Regards,
Alexander Horvath, SIM ICCF
http://www.openingmaster.com
- 
				James Constance
- Posts: 358
- Joined: Wed Mar 08, 2006 8:36 pm
- Location: UK
Re: A00 - Irregular Openings / Orangutan-Sokolsky - FACTS on
Alexander
The default setting is as yours but looks for similar names and similar moves. I've quoted the 6 games below so that others can see what we were talking about.
James
			
			
									
						
										
						The default setting is as yours but looks for similar names and similar moves. I've quoted the 6 games below so that others can see what we were talking about.
James
[Event "Norway ch NM97"]
[Site "corr NPSF"]
[Date "1997.??.??"]
[Round "?"]
[White "Aasum, Anker"]
[Black "Gundersen, Helge Folmer"]
[Result "0-1"]
[ECO "A00"]
[WhiteElo "2196"]
[BlackElo "2320"]
[PlyCount "46"]
[EventDate "1997.??.??"]
[EventType "tourn (corr)"]
[EventCountry "NOR"]
[Source "Opening Master"]
[SourceDate "2008.09.09"]
1. Nc3 d5 2. e4 d4 3. Nce2 e5 4. Ng3 Be6 5. Nf3 f6 6. Be2 g6 7. b3 Bg7 8. a4
Ne7 9. Ba3 Nbc6 10. h4 Qd7 11. h5 Bh6 12. Qb1 Bf4 13. Nf1 g5 14. g3 g4 15. gxf4
gxf3 16. Bxf3 d3 17. f5 dxc2 18. Qxc2 Nd4 19. Qc3 Bxb3 20. Bd1 Bxd1 21. Rxd1
Qxa4 22. Bxe7 Nc2+ 23. Ke2 Qxe4+ 0-1
[Event "corr NOR-ch"]
[Site "NPSF"]
[Date "1997.??.??"]
[Round "?"]
[White "Aasum Anker (NOR)"]
[Black "Gundersen Helge Folmer (NOR)"]
[Result "0-1"]
[ECO "A00"]
[WhiteElo "2090"]
[PlyCount "44"]
[EventDate "1997.??.??"]
[Source "Opening Master"]
[SourceDate "2008.09.09"]
1. Nc3 d5 2. e4 d4 3. Nce2 e5 4. Ng3 Be6 5. Nf3 f6 6. Be2 g6 7. b3 Bg7 8. a4
Ne7 9. Ba3 Nbc6 10. h4 Qd7 11. h5 Bh6 12. Qb1 Bf4 13. Nf1 g5 14. g3 g4 15. gxf4
gxf3 16. Bxf3 d3 17. f5 dxc2 18. Qxc2 Nd4 19. Qc3 Bxb3 20. Bd1 Bxd1 21. Rxd1
Qxa4 22. Bxe7 Nc2+ 0-1
[Event "Capablanca mem"]
[Site "Havana"]
[Date "1971.??.??"]
[Round "5"]
[White "Bilek, Istvan"]
[Black "Geller, Efim P"]
[Result "0-1"]
[ECO "A00"]
[WhiteElo "2500"]
[BlackElo "2630"]
[PlyCount "80"]
[EventDate "1971.03.??"]
[EventType "tourn"]
[EventRounds "15"]
[EventCountry "CUB"]
[EventCategory "9"]
[Source "Opening Master"]
[SourceDate "2008.09.09"]
1. g3 d5 2. Bg2 c6 3. d3 Nf6 4. Nd2 e5 5. e4 Bc5 6. Ngf3 O-O 7. O-O Re8 8. h3
Nbd7 9. Kh2 a5 10. exd5 cxd5 11. d4 exd4 12. Nb3 Bb6 13. Nbxd4 Ne4 14. c3 Ndf6
15. Bf4 Nh5 16. Bd2 Bd7 17. Be1 Nhf6 18. a4 Rc8 19. Ng1 Nd6 20. Nge2 Nc4 21. b3
Nd6 22. Qd3 Nde4 23. Rd1 Qe7 24. f3 Nc5 25. Qc2 Bc7 26. Kh1 Nh5 27. Bf2 Qg5 28.
f4 Qh6 29. Nf5 Bxf5 30. Qxf5 Ne4 31. Bxe4 dxe4 32. Be3 g6 33. Qg5 Qxg5 34. fxg5
Nxg3+ 35. Nxg3 Bxg3 36. Rd7 Rxc3 37. Rfxf7 Be5 38. Bf4 Rf3 39. Rxh7 Rxf4 40.
Rh6 Re6 0-1
[Event "Memorial J.Capablanca"]
[Site "Habana (Cuba)"]
[Date "1971.??.??"]
[Round "5"]
[White "Bilek Istvan (HUN)"]
[Black "Geller Efim P (RUS)"]
[Result "0-1"]
[ECO "A00"]
[WhiteElo "2490"]
[BlackElo "2455"]
[PlyCount "80"]
[EventDate "1971.??.??"]
[Source "Opening Master"]
[SourceDate "2008.09.09"]
1. g3 d5 2. Bg2 c6 3. d3 Nf6 4. Nd2 e5 5. e4 Bc5 6. Ngf3 O-O 7. O-O Re8 8. h3
Nbd7 9. Kh2 a5 10. exd5 cxd5 11. d4 exd4 12. Nb3 Bb6 13. Nbxd4 Ne4 14. c3 Ndf6
15. Bf4 Nh5 16. Bd2 Bd7 17. Be1 Nhf6 18. a4 Rc8 19. Ng1 Nd6 20. Nge2 Nc4 21. b3
Nd6 22. Qd3 Nde4 23. Rd1 Qe7 24. f3 Nc5 25. Qc2 Bc7 26. Kh1 Nh5 27. Bf2 Qg5 28.
f4 Qh6 29. Nf5 Bxf5 30. Qxf5 Ne4 31. Bxe4 dxe4 32. Be3 g6 33. Qg5 Qxg5 34. fxg5
Nxg3+ 35. Nxg3 Bxg3 36. Rd7 Rxc3 37. Rfxf7 Be5 38. Bf4 Rf3 39. Rxh7 Rxf4 40.
Rh6 e3 0-1
[Event "Bs As."]
[Site "?"]
[Date "1985.??.??"]
[Round "?"]
[White "Bianchi G"]
[Black "Hoffman A"]
[Result "1/2-1/2"]
[ECO "A00"]
[PlyCount "47"]
[EventDate "1985.??.??"]
[Source "Opening Master"]
[SourceDate "2008.09.09"]
1. b4 e5 2. Bb2 Bxb4 3. Bxe5 Nf6 4. c4 O-O 5. Nf3 Re8 6. e3 d5 7. Bb2 Bf8 8.
cxd5 Nxd5 9. Bc4 Be6 10. Be2 c5 11. O-O Nc6 12. Nc3 Rc8 13. Nxd5 Bxd5 14. Bc3
Be7 15. Qb1 Rb8 16. Qb2 Bf8 17. a4 Qe7 18. Rfc1 Red8 19. Ne5 Qg5 20. Bf1 Rbc8
21. f4 Qe7 22. Re1 Nxe5 23. Bxe5 Be4 24. Rad1 1/2-1/2
[Event "Buenos Aires"]
[Site "Buenos Aires ARG"]
[Date "1985.??.??"]
[Round "10"]
[White "Bianchi, Guillermo"]
[Black "Hoffman, Alejandro S"]
[Result "1/2-1/2"]
[ECO "A00"]
[WhiteElo "2355"]
[BlackElo "2270"]
[PlyCount "47"]
[EventDate "1985.??.??"]
[Source "Opening Master"]
[SourceDate "2008.09.09"]
1. b4 e5 2. Bb2 Bxb4 3. Bxe5 Nf6 4. c4 O-O 5. Nf3 Re8 6. e3 d5 7. Bb2 Bf8 8.
cxd5 Nxd5 9. Bc4 Be6 10. Be2 c5 11. O-O Nc6 12. Nc3 Rc8 13. Nxd5 Bxd5 14. Bc3
Be7 15. Qb1 Rb8 16. Qb2 Bf8 17. a4 Qe7 18. Rfc1 Red8 19. Ne5 Qg5 20. Bf1 Rbc8
21. f4 Qe7 22. Re1 Nxe5 23. Bxe5 Be4 24. Red1 1/2-1/2
- 
				budfit
Re: A00 - Irregular Openings / Orangutan-Sokolsky - FACTS on
James, 
looking at the games you provided we see different bodies. Different number of moves. In your file, there is even one game which doesn't have any mutant and is only one, in your default setting it is removed.
Again, we appreciate very much your opinion and we see your point. Our rules are strict on method of dedup, the two bodies must match.
Best Regards,
Alexander Horvath, SIM ICCF
http://www.openingmaster.com
			
			
									
						
										
						looking at the games you provided we see different bodies. Different number of moves. In your file, there is even one game which doesn't have any mutant and is only one, in your default setting it is removed.
Again, we appreciate very much your opinion and we see your point. Our rules are strict on method of dedup, the two bodies must match.
Best Regards,
Alexander Horvath, SIM ICCF
http://www.openingmaster.com
- 
				BubbaTough
- Posts: 1154
- Joined: Fri Jun 23, 2006 5:18 am
Re: A00 - Irregular Openings / Orangutan-Sokolsky - FACTS on
I appreciate your strict rule for the automatic removal of duplicates. In the cases identified above, however, a human can clearly tell they are duplicates (even if there is a different last move in one case). It would be a service if you manually removed such duplicates when you come across them by applying human judgement with regard to which to remove (and when to remove one). It would increase the quality of your DB, and help eliminate complaints about the artificial inflation of the size of your DB. Alas, human sculpting is usually a required step when introducing claims of DB quality.Again, we appreciate very much your opinion and we see your point. Our rules are strict on method of dedup, the two bodies must match.
-Sam
- 
				budfit
Re: A00 - Irregular Openings / Orangutan-Sokolsky - FACTS on
Hi Sam, 
from the practical point of view it is impossible to search the entire DB for ''quasi double'' even though we don't like this terminology - because there is either a double or there is not. We are listening to your statements and to others. The analysis which was performed by James removed many singles as well. He found 1,200 of these ''doubles'' which is 2% from the A00 provided sample.
So far we have to rely on algorithms and stick to our strict rules. This comparison have not been made on CB and CA, same results (if not worse) would be found there too.
Best Regards,
Alexander Horvath, SIM ICCF
http://www.openingmaster.com
			
			
									
						
										
						from the practical point of view it is impossible to search the entire DB for ''quasi double'' even though we don't like this terminology - because there is either a double or there is not. We are listening to your statements and to others. The analysis which was performed by James removed many singles as well. He found 1,200 of these ''doubles'' which is 2% from the A00 provided sample.
So far we have to rely on algorithms and stick to our strict rules. This comparison have not been made on CB and CA, same results (if not worse) would be found there too.
Best Regards,
Alexander Horvath, SIM ICCF
http://www.openingmaster.com
- 
				BubbaTough
- Posts: 1154
- Joined: Fri Jun 23, 2006 5:18 am
Re: A00 - Irregular Openings / Orangutan-Sokolsky - FACTS on
Quite sensible. Perhaps a pragmatic approach would be to simply provide a mechanism for users to report issues (such as duplicates) and only manually adjust things (such as by removing a duplicate) when a user complains. That way your time investment is not that large, and over time the database will improve. Also, I noticed back in my shareware days that users will forgive a lot (and more often pay for services) if it feels like they are participatory in building a better experience. Almost every time I fixed a bug or added a feature sent to me by a user, that user ended up paying me for the game.from the practical point of view it is impossible to search the entire DB for ''quasi double'' even though we don't like this terminology - because there is either a double or there is not.
-Sam