Possible gain of ELO points for Komodo 11.2.2

Laskos · Post by **Laskos** » Sun Sep 24, 2017 1:48 pm

Komodo 11.2.2 has the default parameters "King Safety" set at 73 and "Dynamism" set at 130. It is hard to find their optimum playing regular games, from regular openings to the normal end of games, adjudicated in usual manner, or without any adjudication. The games are pretty long, 60-80 moves on average, and many games are needed to discern with statistical significance the reasonable value of these two parameters. If using hyper-fast time controls, one has to check for the scaling, because both of them have a non-trivial scaling with the time control. Practically one needs huge CPU time, to measure say at bullet TC the scaling from hyper-fast for pretty small ELO difference.

I deviced many, sometimes fairly dubious, methods to have a "microscope" in discerning with statistical significance the differences, using maybe 100 time less CPU time than usually people do. Often they were specially built opening suites, with an order of magnitude higher resolution power than regular suites. Now, I deviced a kind of "dubious microscope" for checking these two parameters in Komodo: King Safety and Dynamism. It came to me as I was testing the opening performance of Komodo to that of Stockfish dev. In some peculiar conditions. Opening suite was short 3-mover GM2600 book, trimmed to have balanced openings. This is not that peculiar. The peculiarity was the adjudication rules: "sudden draw" at 11 moves played (so only opening phase is played), and "sudden win" if both engines agree that the threshold of 80cp was achieved. The threshold of 80cp at used time controls is borderline for Komodo between Win and Draw, about 50%/50%. Evaluation of 80cp means for Komodo roughly a performance of 75% in self-games. These "special" adjudication rules are impossible to use with Cutechess-Cli, it has a bit broken Win adjudication: if one engine alone considers it is losing by more than the threshold, Cutechess adjudicates it as Loss. This usually doesn't matter, with threshold for Wins is set as usual, at say 400cp or 700cp. In this, usual case, if an engine cheats, and won't ever show a losing score, it anyway will be soon mated in more than 99.9% of cases. But here, if it is losing and cheats, the game will be adjudicated as draw in 11 moves. LittleBlitzer GUI adjudicates wins correctly (it has some other issues, side-reversed broken in Round-Robin, one has to use Gauntlet mode instead, en passant bug, 50-moves rule problems, but all these problems are irrelevant here). So, I used LittleBlitzer GUI.

First I observed better performance with some Komodo settings against Stockfish in the openings compared to default settings. I came with King Safety of 70 instead of default 73, and Dynamism of 110 instead of default 130 as performing best against Stockfish in 11 moves of the opening phase after the short balanced book, with adjudication rule "sudden Win" at more than 80cp shown by both engines. Then, I let play Komodo "mod" versus Komodo default at 0.4s/move:

Code: Select all

Games Completed = 5000 of 5000 (Avg game length = 9.275 sec)
Settings = Gauntlet/64MB/400ms per move/M 80cp for 1 moves, D 11 moves/EPD:C:\LittleBlitzer\3moves_GM_04_T.epd(817)
Time = 6964 sec elapsed, 0 sec remaining
 1.  Komodo 11.2.2 64-bit KS=70 Dyn=110     	2511.5/5000	107-84-4809  	(L: m=0 t=0 i=0 a=84)	(D: r=1 i=0 f=0 s=0 a=4808)	(tpm=407.3 d=15.52 nps=1112068)
 2.  Komodo 11.2.2 64-bit Default          	 2488.5/5000	84-107-4809  	(L: m=0 t=0 i=0 a=107)	(D: r=1 i=0 f=0 s=0 a=4808)	(tpm=407.2 d=15.47 nps=1119832)

The difference turns out to be statistically significant: 1.60 +/- 1.39 ELO points (pentanomial error margins for 95% confidence).

To test the scaling of the modified parameters, I played the same at twice time control, almost bullet one, 0.8s/move:

Code: Select all

Games Completed = 5000 of 5000 (Avg game length = 18.357 sec)
Settings = Gauntlet/64MB/800ms per move/M 80cp for 1 moves, D 11 moves/EPD:C:\LittleBlitzer\3moves_GM_04_T.epd(817)
Time = 13449 sec elapsed, 0 sec remaining
 1.  Komodo 11.2.2 64-bit KS=70 Dyn=110     	2522.0/5000	99-55-4846  	(L: m=0 t=0 i=0 a=55)	(D: r=1 i=0 f=0 s=0 a=4845)	(tpm=804.3 d=16.86 nps=1113683)
 2.  Komodo 11.2.2 64-bit Default          	 2478.0/5000	55-99-4846  	(L: m=0 t=0 i=0 a=99)	(D: r=1 i=0 f=0 s=0 a=4845)	(tpm=804.5 d=16.77 nps=1121626)

3.06 +/- 1.22 ELO points (pentanomial error margins for 95% confidence).

The Komodo "mod" seems to scale well against Komodo default, increasing its strength difference. The real benefit of Komodo from this in full games is smaller than 3.06 ELO points. The win is adjudicated at roughly 75% performance, not 100%, half of the adjudicated Wins will eventually be Draws. So, expected improvement of Komodo 11.2.2 is only about 1.5 ELO points. But this difference is scaling well, might be larger at LTC. That would be extremely hard to measure.

Several issues remain uncontrolled: middlegame phase, which is even more important than the opening phase in determining the outcome, unbalanced positions, sharp positions, King attacks and so on, absent in the short balanced openings. So, the gain is restricted to using "mod" values of King Safety and Dynamism only for 10 or so moves out of a short balanced book, otherwise it is prudent to use default values. If the "mod" values can be used more generally remains to be seen. I will maybe try middlegame 10-11 movers and unbalanced starting positions of order 100cp. This dubious "microscope" is useful because one can check for tiny ELO differences in a matter of several hours instead of weeks. But it doesn't cover the whole game.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Mon Sep 25, 2017 5:43 am

I am always sceptical to tuning large groups of parameters in a single patch, as it is not clear what the precise contribution of individual terms is.

for example, by raising King Safety, one should be affecting at least a number of separate terms to a predefined formula, where it is not clear which ones should go up and which down, but probably all go up or down.

still, this is a valid tuning approach and only points out how mich still has to be gained in engines by 'random' tuning of a larger set of parameters.

SF already completely discarded that approach(they are tuning just 1 term at a time) and that is one of the reasons for the slow pace of SF as of late.

Laskos · Post by **Laskos** » Mon Sep 25, 2017 3:26 pm

I am getting neutral to postive results for other sections of the game too:

King Safety 70 Dynamism 110
vs
Default

Unbalanced Openings

Code: Select all

Games Completed = 5000 of 5000 (Avg game length = 18.235 sec)
Settings = Gauntlet/64MB/800ms per move/M 150cp for 1 moves, D 11 moves/EPD:C:\LittleBlitzer\4plies_90_100_T.epd(2500)
Time = 13370 sec elapsed, 0 sec remaining
 1.  Komodo 11.2.2 64-bit mod     	2514.0/5000	131-103-4766  	(L: m=0 t=0 i=0 a=103)	(D: r=0 i=0 f=0 s=0 a=4766)	(tpm=804.2 d=16.63 nps=1099606)
 2.  Komodo 11.2.2 64-bit         	2486.0/5000	103-131-4766  	(L: m=0 t=0 i=0 a=131)	(D: r=0 i=0 f=0 s=0 a=4766)	(tpm=804.5 d=16.54 nps=1107694)

1.95 +/- 1.38 ELO points (pentanomial error margins for 95% confidence)

Balanced Midgame

Code: Select all

Games Completed = 5000 of 5000 (Avg game length = 12.765 sec)
Settings = Gauntlet/64MB/800ms per move/M 90cp for 1 moves, D 8 moves/EPD:C:\LittleBlitzer\Mid_Ultrabalanced_T.epd(3789)
Time = 9468 sec elapsed, 0 sec remaining
 1.  Komodo 11.2.2 64-bit mod     	2494.5/5000	213-224-4563  	(L: m=0 t=0 i=0 a=224)	(D: r=281 i=0 f=0 s=0 a=4282)	(tpm=799.6 d=20.55 nps=1449244)
 2.  Komodo 11.2.2 64-bit         	2505.5/5000	224-213-4563  	(L: m=0 t=0 i=0 a=213)	(D: r=281 i=0 f=0 s=0 a=4282)	(tpm=799.5 d=20.46 nps=1452004)

-0.76 +/- 2.55 ELO points (pentanomial error margins for 95% confidence)

Unbalanced Midgame

Code: Select all

Games Completed = 5000 of 5000 (Avg game length = 12.196 sec)
Settings = Gauntlet/64MB/800ms per move/M 170cp for 1 moves, D 8 moves/EPD:C:\LittleBlitzer\Mid_08_10.epd(1933)
Time = 9060 sec elapsed, 0 sec remaining
 1.  Komodo 11.2.2 64-bit mod     	2532.5/5000	549-484-3967  	(L: m=0 t=0 i=0 a=484)	(D: r=8 i=0 f=0 s=0 a=3959)	(tpm=802.4 d=18.80 nps=1430199)
 2.  Komodo 11.2.2 64-bit         	2467.5/5000	484-549-3967  	(L: m=0 t=0 i=0 a=549)	(D: r=8 i=0 f=0 s=0 a=3959)	(tpm=802.2 d=18.72 nps=1433446)

4.52 +/- 2.23 ELO points (pentanomial error margins for 95% confidence)

And at half a run of of almost fully fledged 5000 games at 0.8s/moeve, I get an encouraging 6 +/- 6 ELO points gain as of now for King Safety = 70, Dynamism = 110. "Fully fledged" means that adjudication rules and openings are the following: "Sudden Draw" at 40 moves. "Sudden Win" at 150cp. 150cp corresponds to about 95% win percentage in Komodo self-games at this time control. Openings are balanced 2-movers. Waiting to decrease the error margins. If I get something sensible (say 6 +/- 4 ELO), I will run a longer match of normal games with these setting at 60''+ 0.6'. '

shrapnel · Post by **shrapnel** » Mon Sep 25, 2017 7:09 pm

King Safety 70 Dynamism 110

I tried those settings a long time ago....didn't help. In fact, nothing did. No amount of tweaking will help a weak engine. Waiting for Lefler to update.
Though I agree with you that Default Dynamism 130 always did seem over-optimistic.

Laskos · Post by **Laskos** » Mon Sep 25, 2017 9:19 pm

shrapnel wrote:
King Safety 70 Dynamism 110
I tried those settings a long time ago....didn't help. In fact, nothing did. No amount of tweaking will help a weak engine. Waiting for Lefler to update.
Though I agree with you that Default Dynamism 130 always did seem over-optimistic.

I got a barely significant result in "almost fully fledged games" in 5000 games at 0.8s/move:

+5.62 +/- 5.13 ELO points (95% confidence) for my settings against default (pentanomial error margins). On the other hand, in full, normal games, the results as of now at 60''+ 0.6'' are inconclusive, but not that good, -7 +/- 13 ELO points after some 700 games, but the test will continue for much longer. It is possible that my "dubious microscope" is indeed dubious, maybe the endgames, which I didn't check, are to blame. About engines: I like very much Stockfish, Komodo and Houdini. They are all impressive. I use them for analysis and as toy models in peculiar conditions. Special features like "Contempt" and "Tactical Mode" are to be played with for fun. I not into those blitz games on Playchess Engine Room. I am also interested in TCEC, and if one is an astute observer, some results emerge by now that all three top engines have very sensible chances to win TCEC. Look at FGRL of today: Komodo seems to scale a bit better than SF and Houdini on single core. SMP scaling seems comparable between the three. If with my settings I can add 3-4 ELO points to Komodo, I would be very satisfied. It doesn't seem to be the case, but I will leave it play more of these normal games overnight to make it sure.

mjlef · Post by **mjlef** » Tue Sep 26, 2017 3:32 am

Laskos wrote:
shrapnel wrote:
King Safety 70 Dynamism 110
I tried those settings a long time ago....didn't help. In fact, nothing did. No amount of tweaking will help a weak engine. Waiting for Lefler to update.
Though I agree with you that Default Dynamism 130 always did seem over-optimistic.
I got a barely significant result in "almost fully fledged games" in 5000 games at 0.8s/move:

+5.62 +/- 5.13 ELO points (95% confidence) for my settings against default (pentanomial error margins). On the other hand, in full, normal games, the results as of now at 60''+ 0.6'' are inconclusive, but not that good, -7 +/- 13 ELO points after some 700 games, but the test will continue for much longer. It is possible that my "dubious microscope" is indeed dubious, maybe the endgames, which I didn't check, are to blame. About engines: I like very much Stockfish, Komodo and Houdini. They are all impressive. I use them for analysis and as toy models in peculiar conditions. Special features like "Contempt" and "Tactical Mode" are to be played with for fun. I not into those blitz games on Playchess Engine Room. I am also interested in TCEC, and if one is an astute observer, some results emerge by now that all three top engines have very sensible chances to win TCEC. Look at FGRL of today: Komodo seems to scale a bit better than SF and Houdini on single core. SMP scaling seems comparable between the three. If with my settings I can add 3-4 ELO points to Komodo, I would be very satisfied. It doesn't seem to be the case, but I will leave it play more of these normal games overnight to make it sure.

Just letting you know that we are following your results. Full game testing of the parameters you selected did not show an overall gain for us, but they did inspired some related value which did seem to help.

Laskos · Post by **Laskos** » Tue Sep 26, 2017 3:40 am

mjlef wrote:
Laskos wrote:
shrapnel wrote:
King Safety 70 Dynamism 110
I tried those settings a long time ago....didn't help. In fact, nothing did. No amount of tweaking will help a weak engine. Waiting for Lefler to update.
Though I agree with you that Default Dynamism 130 always did seem over-optimistic.
I got a barely significant result in "almost fully fledged games" in 5000 games at 0.8s/move:

+5.62 +/- 5.13 ELO points (95% confidence) for my settings against default (pentanomial error margins). On the other hand, in full, normal games, the results as of now at 60''+ 0.6'' are inconclusive, but not that good, -7 +/- 13 ELO points after some 700 games, but the test will continue for much longer. It is possible that my "dubious microscope" is indeed dubious, maybe the endgames, which I didn't check, are to blame. About engines: I like very much Stockfish, Komodo and Houdini. They are all impressive. I use them for analysis and as toy models in peculiar conditions. Special features like "Contempt" and "Tactical Mode" are to be played with for fun. I not into those blitz games on Playchess Engine Room. I am also interested in TCEC, and if one is an astute observer, some results emerge by now that all three top engines have very sensible chances to win TCEC. Look at FGRL of today: Komodo seems to scale a bit better than SF and Houdini on single core. SMP scaling seems comparable between the three. If with my settings I can add 3-4 ELO points to Komodo, I would be very satisfied. It doesn't seem to be the case, but I will leave it play more of these normal games overnight to make it sure.
Just letting you know that we are following your results. Full game testing of the parameters you selected did not show an overall gain for us, but they did inspired some related value which did seem to help.

Yes, in full games at 60''+ 0.6'' (bullet) I got -7 +/ 4 Elo points regression, and I stopped the test right now. What remains is that in the first 10 moves of the game from short opening book, they did statistically significantly improve the outcome (by small ELO amount), albeit with artificial adjudications.

mjlef · Post by **mjlef** » Tue Sep 26, 2017 3:46 am

Laskos wrote:
mjlef wrote:
Laskos wrote:
shrapnel wrote:
King Safety 70 Dynamism 110
I tried those settings a long time ago....didn't help. In fact, nothing did. No amount of tweaking will help a weak engine. Waiting for Lefler to update.
Though I agree with you that Default Dynamism 130 always did seem over-optimistic.
I got a barely significant result in "almost fully fledged games" in 5000 games at 0.8s/move:

+5.62 +/- 5.13 ELO points (95% confidence) for my settings against default (pentanomial error margins). On the other hand, in full, normal games, the results as of now at 60''+ 0.6'' are inconclusive, but not that good, -7 +/- 13 ELO points after some 700 games, but the test will continue for much longer. It is possible that my "dubious microscope" is indeed dubious, maybe the endgames, which I didn't check, are to blame. About engines: I like very much Stockfish, Komodo and Houdini. They are all impressive. I use them for analysis and as toy models in peculiar conditions. Special features like "Contempt" and "Tactical Mode" are to be played with for fun. I not into those blitz games on Playchess Engine Room. I am also interested in TCEC, and if one is an astute observer, some results emerge by now that all three top engines have very sensible chances to win TCEC. Look at FGRL of today: Komodo seems to scale a bit better than SF and Houdini on single core. SMP scaling seems comparable between the three. If with my settings I can add 3-4 ELO points to Komodo, I would be very satisfied. It doesn't seem to be the case, but I will leave it play more of these normal games overnight to make it sure.
Just letting you know that we are following your results. Full game testing of the parameters you selected did not show an overall gain for us, but they did inspired some related value which did seem to help.
Yes, in full games at 60''+ 0.6'' (bullet) I got -7 +/ 4 Elo points regression, and I stopped the test right now. What remains is that in the first 10 moves of the game from short opening book, they did statistically significantly improve the outcome (by small ELO amount), albeit with artificial adjudications.

We have not tried to confirm that yet, but it does seem some game stage related internal modifications could be beneficial. Lot of ideas for things to try. So thanks for letting us know. For example we could start off with a lower initial king safety early in the game, it peaks at some stage, then diminishes. Most program "peak" it in the opening or early game stages, but probably the program should concentrate on better piece development before shifting to attacking the king.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Sep 26, 2017 10:15 am

Laskos wrote:
shrapnel wrote:
King Safety 70 Dynamism 110
I tried those settings a long time ago....didn't help. In fact, nothing did. No amount of tweaking will help a weak engine. Waiting for Lefler to update.
Though I agree with you that Default Dynamism 130 always did seem over-optimistic.
I got a barely significant result in "almost fully fledged games" in 5000 games at 0.8s/move:

+5.62 +/- 5.13 ELO points (95% confidence) for my settings against default (pentanomial error margins). On the other hand, in full, normal games, the results as of now at 60''+ 0.6'' are inconclusive, but not that good, -7 +/- 13 ELO points after some 700 games, but the test will continue for much longer. It is possible that my "dubious microscope" is indeed dubious, maybe the endgames, which I didn't check, are to blame. About engines: I like very much Stockfish, Komodo and Houdini. They are all impressive. I use them for analysis and as toy models in peculiar conditions. Special features like "Contempt" and "Tactical Mode" are to be played with for fun. I not into those blitz games on Playchess Engine Room. I am also interested in TCEC, and if one is an astute observer, some results emerge by now that all three top engines have very sensible chances to win TCEC. Look at FGRL of today: Komodo seems to scale a bit better than SF and Houdini on single core. SMP scaling seems comparable between the three. If with my settings I can add 3-4 ELO points to Komodo, I would be very satisfied. It doesn't seem to be the case, but I will leave it play more of these normal games overnight to make it sure.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Sep 26, 2017 10:23 am

mjlef wrote:
Laskos wrote:
mjlef wrote:
Laskos wrote:
shrapnel wrote:
King Safety 70 Dynamism 110
I tried those settings a long time ago....didn't help. In fact, nothing did. No amount of tweaking will help a weak engine. Waiting for Lefler to update.
Though I agree with you that Default Dynamism 130 always did seem over-optimistic.
I got a barely significant result in "almost fully fledged games" in 5000 games at 0.8s/move:

+5.62 +/- 5.13 ELO points (95% confidence) for my settings against default (pentanomial error margins). On the other hand, in full, normal games, the results as of now at 60''+ 0.6'' are inconclusive, but not that good, -7 +/- 13 ELO points after some 700 games, but the test will continue for much longer. It is possible that my "dubious microscope" is indeed dubious, maybe the endgames, which I didn't check, are to blame. About engines: I like very much Stockfish, Komodo and Houdini. They are all impressive. I use them for analysis and as toy models in peculiar conditions. Special features like "Contempt" and "Tactical Mode" are to be played with for fun. I not into those blitz games on Playchess Engine Room. I am also interested in TCEC, and if one is an astute observer, some results emerge by now that all three top engines have very sensible chances to win TCEC. Look at FGRL of today: Komodo seems to scale a bit better than SF and Houdini on single core. SMP scaling seems comparable between the three. If with my settings I can add 3-4 ELO points to Komodo, I would be very satisfied. It doesn't seem to be the case, but I will leave it play more of these normal games overnight to make it sure.
Just letting you know that we are following your results. Full game testing of the parameters you selected did not show an overall gain for us, but they did inspired some related value which did seem to help.
Yes, in full games at 60''+ 0.6'' (bullet) I got -7 +/ 4 Elo points regression, and I stopped the test right now. What remains is that in the first 10 moves of the game from short opening book, they did statistically significantly improve the outcome (by small ELO amount), albeit with artificial adjudications.
We have not tried to confirm that yet, but it does seem some game stage related internal modifications could be beneficial. Lot of ideas for things to try. So thanks for letting us know. For example we could start off with a lower initial king safety early in the game, it peaks at some stage, then diminishes. Most program "peak" it in the opening or early game stages, but probably the program should concentrate on better piece development before shifting to attacking the king.

I wish Kai could produce more of those 'wrong' tests of his...

Possible gain of ELO points for Komodo 11.2.2

Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2

Re: Possible gain of ELO points for Komodo 11.2.2