Engine vs. engine rating difference
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Fritz 0
- Posts: 144
- Joined: Fri Mar 11, 2022 12:10 pm
- Full name: Branislav Đošić
Engine vs. engine rating difference
It has been said that the engine ratings range is expanded compared to the human ratings range. I understand this if we compare the same engine at different search depths. For example, Komodo 14 level 21 vs. Komodo 14 level 20 (8 ply vs. 7 ply) shows 160-170 Elo difference, while the real difference (in human terms) is estimated to be 114. One ply more, while everything else being equal, means much more to the engine than to the human, that is understandible. But if we, for instance, compare Dragon and Stockfish at 12 ply both, why would the difference be expanded? I'm not saying this is not true, just don't get the reason for that and would appreciate if someone explained it to me.
-
lkaufman
- Posts: 5942
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Engine vs. engine rating difference
If one engine is significantly stronger than the other at a fixed depth like 12 ply, and both are using NNUE, it is probably mostly due to some difference in pruning or reduction or extension. So the stronger engine may still be typically seeing one ply deeper in most lines, for example, even if they are both reporting "12 ply", which is really an iteration count.Fritz 0 wrote: ↑Wed Apr 27, 2022 10:55 am It has been said that the engine ratings range is expanded compared to the human ratings range. I understand this if we compare the same engine at different search depths. For example, Komodo 14 level 21 vs. Komodo 14 level 20 (8 ply vs. 7 ply) shows 160-170 Elo difference, while the real difference (in human terms) is estimated to be 114. One ply more, while everything else being equal, means much more to the engine than to the human, that is understandible. But if we, for instance, compare Dragon and Stockfish at 12 ply both, why would the difference be expanded? I'm not saying this is not true, just don't get the reason for that and would appreciate if someone explained it to me.
Note that when the search depth drops to one ply, and the elo differences are due to varying amounts of Variety or randomness at different levels, the opposite may be true; engine ratings may contract compared to human ratings. At least that's what it looks like now to me.
Komodo rules!
-
Fritz 0
- Posts: 144
- Joined: Fri Mar 11, 2022 12:10 pm
- Full name: Branislav Đošić
Re: Engine vs. engine rating difference
So it boils down to the search depth. Does it mean that all the rating differences between the engines are expanded (regardless of the reason for the difference), and are really lesser in human terms? If so, is there an universal formula to convert it to the human rating difference?lkaufman wrote: ↑Wed Apr 27, 2022 5:41 pmIf one engine is significantly stronger than the other at a fixed depth like 12 ply, and both are using NNUE, it is probably mostly due to some difference in pruning or reduction or extension. So the stronger engine may still be typically seeing one ply deeper in most lines, for example, even if they are both reporting "12 ply", which is really an iteration count.Fritz 0 wrote: ↑Wed Apr 27, 2022 10:55 am It has been said that the engine ratings range is expanded compared to the human ratings range. I understand this if we compare the same engine at different search depths. For example, Komodo 14 level 21 vs. Komodo 14 level 20 (8 ply vs. 7 ply) shows 160-170 Elo difference, while the real difference (in human terms) is estimated to be 114. One ply more, while everything else being equal, means much more to the engine than to the human, that is understandible. But if we, for instance, compare Dragon and Stockfish at 12 ply both, why would the difference be expanded? I'm not saying this is not true, just don't get the reason for that and would appreciate if someone explained it to me.
Note that when the search depth drops to one ply, and the elo differences are due to varying amounts of Variety or randomness at different levels, the opposite may be true; engine ratings may contract compared to human ratings. At least that's what it looks like now to me.
-
lkaufman
- Posts: 5942
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Engine vs. engine rating difference
Well, if you are comparing NNUE engines with ancient engines, then eval is also very different, and the effect is not the same; also for very weak engines it's quite different. I used to say that elo differences on engine lists should be contracted by 25% for human interpretation, and I still think that's reasonably close to the truth for most engines above 2000 Elo or so, but there is some evidence that better eval (as in NNUE) pays off more against humans than against dumber but deeper-searching engines, so it's not a perfect guide.Fritz 0 wrote: ↑Wed Apr 27, 2022 8:33 pmSo it boils down to the search depth. Does it mean that all the rating differences between the engines are expanded (regardless of the reason for the difference), and are really lesser in human terms? If so, is there an universal formula to convert it to the human rating difference?lkaufman wrote: ↑Wed Apr 27, 2022 5:41 pmIf one engine is significantly stronger than the other at a fixed depth like 12 ply, and both are using NNUE, it is probably mostly due to some difference in pruning or reduction or extension. So the stronger engine may still be typically seeing one ply deeper in most lines, for example, even if they are both reporting "12 ply", which is really an iteration count.Fritz 0 wrote: ↑Wed Apr 27, 2022 10:55 am It has been said that the engine ratings range is expanded compared to the human ratings range. I understand this if we compare the same engine at different search depths. For example, Komodo 14 level 21 vs. Komodo 14 level 20 (8 ply vs. 7 ply) shows 160-170 Elo difference, while the real difference (in human terms) is estimated to be 114. One ply more, while everything else being equal, means much more to the engine than to the human, that is understandible. But if we, for instance, compare Dragon and Stockfish at 12 ply both, why would the difference be expanded? I'm not saying this is not true, just don't get the reason for that and would appreciate if someone explained it to me.
Note that when the search depth drops to one ply, and the elo differences are due to varying amounts of Variety or randomness at different levels, the opposite may be true; engine ratings may contract compared to human ratings. At least that's what it looks like now to me.
Komodo rules!