Guenther wrote: ↑Fri Oct 15, 2021 7:51 pm
You can see it acchieved the same rating as Rustic 2 Alpha, so either it has a big regression (unlikely), or it doesn't like your testing

Joking aside, may be you played at very fast tc to get so much games so quick and it is not really good at this?
I did play at a very fast time control (10s+0.1s) which I do with all of my tests, and the estimated ratings I achieve are normally close to what CCRL also measures. Princhess didn't have any time forfeits. It did, however, miss a lot of wins and draws by being unable to convert, checkmate, or hold the draw. As said, I've seen the engine not being able to deliver mate with a queen+ material advantage against a bare king. I've even seen it throw away games where it is up more than a rook in an endgame, losing enough material to get mated itself.
So yes, I do think it could be a regression; or it is indeed an engine that does not function well at super-fast time controls. (I've seen it uses MCTS; maybe this method of searching doesn't work well for fast time controls. I don't know; didn't try it yet. But I plan, at some point, to add MCTS to Rustic as an experimental alternative search feature.) I cannot see how fast it is, because it doesn't put out any statistics (yet) apart from a single PV-line (without depth or speed). And, the TT is mostly already at 100% within the very first few seconds. I run with a 16 MB TT to test replacement schemes at this super-fast TC; but I can imagine the thousands / millions (?) of MCTS playouts fill up the TT extremely fast.
Mergi wrote: ↑Fri Oct 15, 2021 8:06 pm
As to Marcel's last point about not being able to checkmate even with a huge material advantage - in testing, to save time, most CC games are usually adjudicated when a certain advantage is reached, so just getting into a winning position is all that matters. In my testing i adjudicate when 8 points advantage is reached. And i remember seeing that for LC0 testing they end the game when 5 point advantage is reached. I'd assume CCRL testers do something similar, as otherwise endgames might take a huge amount of time to conclude. So that's probably where the big ELO desparity comes from, if you let every game reach it's ultimate conclusion (checkmate/draw) in your testing.
Also plausible. Because if engines not being able to convert wins or deliver mate, I don't adjudicate games in my testing, so yes, just getting into a winning position is not enough. The engine actually has to win it. I don't adjudicate in my testing.
So there can be several conclusion. Either:
- This engine is not good at fast time controls
- It has a massive regression with regard to converting wins, checkmates, and holding draws, losing many points
- It has always had problem 2, but it goes unnoticed because of game adjudication.