ydebilloez wrote: ↑Fri Jan 09, 2026 10:54 am
I see a strong correlation in between two graphs, so I repost them again next to each other.
I am sorry to have used the word 'wrong' instead of 'miscalibrated'. You are absolutely right. On long time controls, which means high node count, the relative difference of the engines is much smaller. It probably also means that in classic classic games, the elo edge of engines over human players would probably be much less, due to the draw rate.
The bottom graph you quoted is wonky (biased) because I was selectively narrowing data to illustrate a curiosity. To your point here's a better one.
The hill-shaped graph is derived from this one by taking differences relative to Stockfish 10, with the mass of lower curves discarded. The older versions of Stockfish do not start at the very left hand side because they do not have the granularity to measure low node counts per move. The very lowest curve running from 16 to 25 on the x-axis is Glaurung 1.01, the predecessor to the Stockfish family.
Let's find a sponsor and a super GM to play 100 classic games against a SF 17....
A long shot, but with sufficient money, maybe.
For a Man-Machine calibration event - and there hasn't been one of those in a long time - my proposal would be this. Set a family of engines in strength steps of 100 from 2400 to 3200, estimated as best can be done. Offer players a betting scenario with the leeway to choose their opponent. The stronger opponent they choose to play the more money they earn from a draw or a win. Players can choose a different opponent before the beginning of the next round. I'd propose starting with Blitz and Rapid events to see if the format catches on. Promote the event as a fun exhibition match -- Who can come out on top versus the machines? Motivate the players with a big money board. On account of larger winnings being possible by playing stronger engines, a come-from-behind victory is always possible. (In Yasser Seirawan's voice: "Oooh. Nakamura goes big in the final round! Can he pass Carlsen?") Additional prize money is awarded for finishing first through third among human players.
I wouldn't use Stockfish either. It's UCI_Elo setting algorithm is, ah, somewhere between "nice try" and brain-dead. I'd train specialty Leela nets to play like Grandmasters at those specified levels.