Viz wrote: ↑Wed Oct 30, 2024 10:00 am
90% draw ratio is a big understatement, it would be 99% if we go for balaned lines.
Which boils down to the statement: chess performance turned out to be +very+ difficult to measure these days, so we started to measure something else.
There is a lot about the battles of chess entities that is being ignored these days when it is about computerchess competition in "modern" ratinglists, like:
1.) Opening books
2.) Book learning
3.) Position learning
4.) Pondering
5.) Performance against worse opponents
Viz wrote: ↑Wed Oct 30, 2024 6:40 pm
Spoiler - it's not interesting. And in fact a lot depends on sheer luck, much more than matches against top engines.
This is just not very true. I have provided some evidence how Stockfish sucks against Crafty e.g. if we include 1-5 again. (3 just for sake of completeness, no one seems to have worked on this in recent years if at all).
I don't know how deep you are into this topic, as there is also some truth in what you write. One major difficulty for Stockfish is that it goes for the Berlin with black and seems to be very happy to reach equality fast. This is probably very much coincidence given the very minor eval difference to something like the Sicilan Paulsen where it would kill Crafty.
Anyway: the human top are fighting the exact same obstacles (logical given their usage of engines in preparation), and they are forced to play the real game of chess after all, while engines have abandoned to play the same game and do UHO games now, which might provide more easily to measure rating differences. Only problem: this is just a different (and irrelevant for humans) game.