lkaufman wrote:
Another issue is that the programming of engines, as well as the play of human grandmasters, is aimed to maximize score with draws counting as 1/2, rather than just number of wins (although wins are sometimes used as a tiebreak). WILO might be better mathematically, but it does not correspond to the actual scoring of tournaments. This is not a minor issue. Suppose Komodo (or even Carlsen) reaches a middlegame position with a half-pawn advantage or so. He has to decide between retaining queens with let's say a 60% winning chance, a 20% losing chance, and a 20% drawing chance. Or he can simplify to an endgame with a 24% winning chance, a 75% drawing chance, and a 1% losing chance (i.e. a gross blunder or flag fall). In any normal tournament or match, he should keep queens on (assuming a neutral tournament/match situation) to maximize his expected score. But to maximize WILO, he should trade queens. Komodo has code to try to avoid simplifying in such a situation (maybe not very effective, but that's irrelevant); if we wanted to maximize WILO we would have to make significant program changes. In my view, we would have to return to the old practice of replaying draws until someone wins to justify switching to WILO. Elimination tournaments with playoffs at faster time limits to break ties are a version of this, but then you are rating blitz games together with slow ones. This is also my objection to Bayes Elo; it also makes an assumption that does not correspond to normal match/tournament scoring.
Valid objection. It stems from the fact that in ELO you assume a given N=number of games, and one has to optimise W-L for fixed N. In WILO you don't assume a fixed number of games, and one has to optimise W/L every time. If you assume in WILO a fixed N, W-L and W/L descriptions are equivalent. So rating lists with games played according to ELO, which optimize W-L for a certain given number of games, shouldn't be used as WILO rating lists with cleaned-out Draws, because the optimizing of ELO with fixed N and WILO with floating N' bring different playing goals. Only games played to optimize WILO should be used, which are none

. Fixed N for WILO might be adopted (replaying drawn games, which would bring goals back on track). Your point is valid, it's a bit different game, and both humans and engines will have to adjust to a maybe better rating system for Chess like WILO. Also, there are many cases when the game of Chess has different goals, depending on the tournament, Match, RR with 2 opponents, RR with 40 opponents, Swiss, Knock-Out, Tie-Breaks, ELO gap 200, ELO gap 1000, and so on. For each, the "evals" of both Humans and Engines have to adjust.