mmt wrote: ↑Mon Feb 17, 2020 5:08 pm
Thanks for looking! Definitely have to clean look at these time forfeit games. I can create cleaned-up versions of the pgn files without PV.
Guenther wrote: ↑Mon Feb 17, 2020 9:45 am
A first quick look also revealed that (as I supposed) that most of the times SF/2 wins vs. SF were only due to crappy openings starting with +2 or so.
A large book has its minuses. Since both sides are played, it shouldn't be too big of a deal for getting the right Elo (Ordo) difference.
This is wrong and will defeat the whole purpose of the test. Bad openings cannot be cured by playing them for both sides.
That is a widely spread illogical opinion. The only thing it does, is to help the suspected weaker program to push its score
further to equality.
Also long openings generally make those statistically tests unreliable for various reasons.
I noticed e.g. 120+ and 140+ games with early 3 time reps below move 30! in your pgn files. (same effect, pushing the weaker one towards equality)
Sometimes even directly after book end. IMO long books should be abolished at all for serious and statistical tests.
mmt wrote: ↑Mon Feb 17, 2020 5:08 pm
Guenther wrote: ↑Mon Feb 17, 2020 9:45 amLast not least, ofc Alyan was correct that using 20GB hash per program added extreme noise and errors too, because loading 20GB hash in a 10s or 5s game might already use most of the basetime for loading and slowing things down. (even 64 or 128MB would be enough here).
No, this is not the case with Arena. I can set up a 1s/0 tournament with a 20 GB cache and it works fine, whether the engines are loaded beforehand or not.
Well, in one file there were seven unterminated games, which Arena replayed automatically for you, in some of them the second program may be
crashed, or didn't move at all for the first move after book end.
mmt wrote: ↑Mon Feb 17, 2020 5:08 pm
Guenther wrote: ↑Mon Feb 17, 2020 9:45 am
For this experiment cutechess-cli would be the only serious and scientific way to do it. (plus a much better, balanced and not too much plies opening file)
Yeah, I will try Cute Chess CLI. Any large book recommendations?
No, see above, I would use a 6-12 (at max!) plies opening file.
Moreover I have some doubts now how to set the tc at all. Most programs now have too clever time management and I noticed that
in crucial positions sometimes the program, which should use half of the time actually used more time than the other.
Just for curiosity I did a match myself until today between SF and SFx2 (half time) on my slow hardware with 1 cpu each, at a very fast tc with given time per move. 1move/0.5s vs. 1move/0.25s (128MB) in cutechess-cli with a 6 plies general book and the diff was around 160 rating points.
(Need to calculate average depth for midgame to compare)
Even here I find artifacts of assymmetric time usage and I am not sure how much noise this adds to the outcome.
There is also an effect I completely forgot (but it is very rare) and we talked about long ago here.
Sometimes it is even negative to see further than your opponent, because you see more and more how worse your position might become
and defend against something the other won't see at all and play suboptimal against the time handicapped program until it even wins.
This and the assymmetric time usage explains most of the wins the handicapped version can make against the non-handicapped one.
(plus bad book lines ofc)
Moreover there is contempt in SF too, which also will influence this test...