I started my ratinglist-testrun of Arasan 23.1 (avx2, of course). If I see a clear regression, too, I will report (and abort the testrun)...
What I can say is, the annoying message in cutechess-cli after each game played by Arasan "EngineProcess: Process destroyed while engine still running" is not fixed. Arasan is the only engine, where this warning is printed. Again, and again, and again... Very annoying!
Testrun of Arasan 23.1 aborted: Ed Schroeder measured a huge regression to Arasan 23.0.1 and so do I: -25 Elo for Arasan 23.1 after 350 games compared to Arasan 23.0.1.
Here 23.1 is beating 23.0.1 handsomely, but only a couple of games played. However, the network should be set explicitly, by default it does not use one and displays a message to that effect.
EDIT: They look about equal now, too few games to form an opinion.
I am also doing a little more testing. FYI, Linux is primary development and testing environment now, and 23.1 scored about +30 ELO above 23.0 in that environment (2000 games). However, a short test on Windows I did showed 23.0 scoring above 23.1 (not significant, though). Puzzling to me since "bench" shows 23.1 is a little faster, and the new code + network scored well on Linux.
My tests show that arasan-d9-10-20211029.nnue (23.1) is clearly worse than arasan-d8-9-20210827.nnue (23.0.1).
The speed gain of 23.1 seems to be bigger in the Linux build than in the Windows build so it is eaten up by the bad network in the Windows build but not (completely) in the Linux one.
jdart wrote: ↑Sat Nov 13, 2021 9:20 pm
I am also doing a little more testing. FYI, Linux is primary development and testing environment now, and 23.1 scored about +30 ELO above 23.0 in that environment (2000 games). However, a short test on Windows I did showed 23.0 scoring above 23.1 (not significant, though). Puzzling to me since "bench" shows 23.1 is a little faster, and the new code + network scored well on Linux.
To be sure about the regression, I played more than 2200 games (vs. 7 opponents). In my ratinglist, it would look like this: