disagreement between engines idea for testing

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Uri Blass
Posts: 10098
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

disagreement between engines idea for testing

Post by Uri Blass »

I suggest the following process
Step 1:Take a big pgn of chess games between humans and give 2 top engines to analyze all the positions in the game for 0.1 second(can be for example stockfish and lc0).

Take only positions when they disagree which side is better(and one say at least 0.2 for white when the second say at least 0.2 for black) and give them 10 seconds to analyze every position.
Include only the position when they still disagree which side is better so it is probably not because of some missed tactics.

Questions:
1)What is the percentage of positions that they disagree?
2)What is going to be the biggest difference in evaluations?
3)Is it a good idea to test engines from the positions with the biggest disagreement?
gordonr
Posts: 190
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

Re: disagreement between engines idea for testing

Post by gordonr »

Interesting idea.

How to account for differences in evaluation scales? For example, I believe that Stockfish evaluations can be inflated more than LC0 evaluations.