I suggest the following process
Step 1:Take a big pgn of chess games between humans and give 2 top engines to analyze all the positions in the game for 0.1 second(can be for example stockfish and lc0).
Take only positions when they disagree which side is better(and one say at least 0.2 for white when the second say at least 0.2 for black) and give them 10 seconds to analyze every position.
Include only the position when they still disagree which side is better so it is probably not because of some missed tactics.
Questions:
1)What is the percentage of positions that they disagree?
2)What is going to be the biggest difference in evaluations?
3)Is it a good idea to test engines from the positions with the biggest disagreement?
disagreement between engines idea for testing
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Uri Blass
- Posts: 10098
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
-
gordonr
- Posts: 190
- Joined: Thu Aug 06, 2009 8:04 pm
- Location: UK
Re: disagreement between engines idea for testing
Interesting idea.
How to account for differences in evaluation scales? For example, I believe that Stockfish evaluations can be inflated more than LC0 evaluations.
How to account for differences in evaluation scales? For example, I believe that Stockfish evaluations can be inflated more than LC0 evaluations.