I deviced many, sometimes fairly dubious, methods to have a "microscope" in discerning with statistical significance the differences, using maybe 100 time less CPU time than usually people do. Often they were specially built opening suites, with an order of magnitude higher resolution power than regular suites. Now, I deviced a kind of "dubious microscope" for checking these two parameters in Komodo: King Safety and Dynamism. It came to me as I was testing the opening performance of Komodo to that of Stockfish dev. In some peculiar conditions. Opening suite was short 3-mover GM2600 book, trimmed to have balanced openings. This is not that peculiar. The peculiarity was the adjudication rules: "sudden draw" at 11 moves played (so only opening phase is played), and "sudden win" if both engines agree that the threshold of 80cp was achieved. The threshold of 80cp at used time controls is borderline for Komodo between Win and Draw, about 50%/50%. Evaluation of 80cp means for Komodo roughly a performance of 75% in self-games. These "special" adjudication rules are impossible to use with Cutechess-Cli, it has a bit broken Win adjudication: if one engine alone considers it is losing by more than the threshold, Cutechess adjudicates it as Loss. This usually doesn't matter, with threshold for Wins is set as usual, at say 400cp or 700cp. In this, usual case, if an engine cheats, and won't ever show a losing score, it anyway will be soon mated in more than 99.9% of cases. But here, if it is losing and cheats, the game will be adjudicated as draw in 11 moves. LittleBlitzer GUI adjudicates wins correctly (it has some other issues, side-reversed broken in Round-Robin, one has to use Gauntlet mode instead, en passant bug, 50-moves rule problems, but all these problems are irrelevant here). So, I used LittleBlitzer GUI.
First I observed better performance with some Komodo settings against Stockfish in the openings compared to default settings. I came with King Safety of 70 instead of default 73, and Dynamism of 110 instead of default 130 as performing best against Stockfish in 11 moves of the opening phase after the short balanced book, with adjudication rule "sudden Win" at more than 80cp shown by both engines. Then, I let play Komodo "mod" versus Komodo default at 0.4s/move:
Code: Select all
Games Completed = 5000 of 5000 (Avg game length = 9.275 sec)
Settings = Gauntlet/64MB/400ms per move/M 80cp for 1 moves, D 11 moves/EPD:C:\LittleBlitzer\3moves_GM_04_T.epd(817)
Time = 6964 sec elapsed, 0 sec remaining
1. Komodo 11.2.2 64-bit KS=70 Dyn=110 2511.5/5000 107-84-4809 (L: m=0 t=0 i=0 a=84) (D: r=1 i=0 f=0 s=0 a=4808) (tpm=407.3 d=15.52 nps=1112068)
2. Komodo 11.2.2 64-bit Default 2488.5/5000 84-107-4809 (L: m=0 t=0 i=0 a=107) (D: r=1 i=0 f=0 s=0 a=4808) (tpm=407.2 d=15.47 nps=1119832)
To test the scaling of the modified parameters, I played the same at twice time control, almost bullet one, 0.8s/move:
Code: Select all
Games Completed = 5000 of 5000 (Avg game length = 18.357 sec)
Settings = Gauntlet/64MB/800ms per move/M 80cp for 1 moves, D 11 moves/EPD:C:\LittleBlitzer\3moves_GM_04_T.epd(817)
Time = 13449 sec elapsed, 0 sec remaining
1. Komodo 11.2.2 64-bit KS=70 Dyn=110 2522.0/5000 99-55-4846 (L: m=0 t=0 i=0 a=55) (D: r=1 i=0 f=0 s=0 a=4845) (tpm=804.3 d=16.86 nps=1113683)
2. Komodo 11.2.2 64-bit Default 2478.0/5000 55-99-4846 (L: m=0 t=0 i=0 a=99) (D: r=1 i=0 f=0 s=0 a=4845) (tpm=804.5 d=16.77 nps=1121626)
The Komodo "mod" seems to scale well against Komodo default, increasing its strength difference. The real benefit of Komodo from this in full games is smaller than 3.06 ELO points. The win is adjudicated at roughly 75% performance, not 100%, half of the adjudicated Wins will eventually be Draws. So, expected improvement of Komodo 11.2.2 is only about 1.5 ELO points. But this difference is scaling well, might be larger at LTC. That would be extremely hard to measure.
Several issues remain uncontrolled: middlegame phase, which is even more important than the opening phase in determining the outcome, unbalanced positions, sharp positions, King attacks and so on, absent in the short balanced openings. So, the gain is restricted to using "mod" values of King Safety and Dynamism only for 10 or so moves out of a short balanced book, otherwise it is prudent to use default values. If the "mod" values can be used more generally remains to be seen. I will maybe try middlegame 10-11 movers and unbalanced starting positions of order 100cp. This dubious "microscope" is useful because one can check for tiny ELO differences in a matter of several hours instead of weeks. But it doesn't cover the whole game.