dannyb wrote: ↑Sun Sep 29, 2019 10:48 pm
chrisw wrote: ↑Sun Sep 29, 2019 11:55 am
The data speaks for itself, both positively and negatively.
Exactly and since the data tell about the origins of the engines, this thread belongs to the Engines Origins subforum especially since this is not a certified tool of any kind
I'ld be intriged to know what is a "certified tool" and where the "certifying authority" is located.
I can certify, however that we applied the scientific method:
The Similarity Report, using an entirely unbiased engine selection process, an unbiased and established epd test suite, and produced, via transparent and verifiable engine d=1, move selection process, a correlation matrix showing percentages of same move selection across the 135 engines tested.
The testing data is known, the engines are known, the procedure is known, the process is repeatable and verifiable.
and no one knows exactly how to interpret the results. Someone said the line should be drawn at 60% and yet Crafty and Fruit have a 60% sim result.
The graphic presentation of Force Directed network graphs as a 'time' sequence based on similarity, plus colour coding by Elo, allows the observer to get his/her own feel for the results data, and gets away from the arbitrary drawing of lines in the sand that have been used in the past to classify as derivative or clone or whatever.
Anyone that can read the source code can see how totally unrelated they are.
The problem with source code comparison, compared to similarity comparison by results, is several fold:
1. it is an inherently experimenter-biased process.
2. It can't compare everything, so there is little big-picture comparison (which we achieve via Simex), and bias in the choice of comparator engines.
3. It is highly subjective.
4. It is heavily biased to protect experienced programmers against the less experienced. Experienced programmers using ideas from other programs will, by merit of their experience, be coding in their own style which will likely look nothing like the style of the used-idea engine, and may well find ways of incorporating the used-idea into some already coded other idea. Inexperienced programmers are more likely to be influenced by the coding structure of the place they found the idea in, and this will more likely reflect in the resulting code.
5. Simex is not interested in the actual coding of ideas, it detects (quite sensitively imo) usage of comparable ideas and the tying those ideas together in comparable ways.
6. Conversely, it also detects the opposite process, the usage or addition of original ideas which unsurprisingly reduce similarity.
7. Finally, Simex also contains information about what engine series an engine evaluation is NOT linked/connected to. This is revealed in the big picture analysis which we are able to show in Force directed network graphs.
I've taken a look in the Engine Origins subforum and there are many threads with similarity tests, dendograms and so on. So, such threads have always been moved there.