Komodo Dragon 2.5 released.

lkaufman · Post by **lkaufman** » Sat Oct 02, 2021 10:57 pm

dkappe wrote: ↑Sat Oct 02, 2021 10:17 pm
lkaufman wrote: ↑Sat Oct 02, 2021 10:10 pm The quality of multipv with MCTS engines depends greatly on whether there are multiple moves of similar merit or not. If one move is pretty clearly best, the quality of the other lines will be low. But if say three moves are essentially too close to call, then the quality of the analysis of those three should be comparable. That is why I think multipv with MCTS is a good combination, you get quality analysis of the lines that have merit.
I was just speaking about lc0, not Komodo/Dragon MCTS. There you would sometimes get a 3rd of 4th “best” move that would permit mate.

That might also happen with Komodo MCTS if there were only two decent moves in the position, for example a knight was attacked and had only two squares to go to. If all the other moves lose a piece, it might well miss that one of them gets it mated. The search is very limited for clearly inferior moves.

Uri Blass · Post by **Uri Blass** » Sat Oct 02, 2021 11:02 pm

lkaufman wrote: ↑Sat Oct 02, 2021 5:36 pm
Michel wrote: ↑Sat Oct 02, 2021 8:19 am
pohl4711 wrote: ↑Sat Oct 02, 2021 7:55 am
Joerg Oster wrote: ↑Fri Oct 01, 2021 7:33 pm
pohl4711 wrote: ↑Fri Oct 01, 2021 3:19 pm
lkaufman wrote: ↑Fri Sep 24, 2021 4:30 pm KomodoChess has just released Dragon 2.5 at KomodoChess.com.
Huge experimental RoundRobin tournament (10500 games, 3min+1sec) with 3 engines (Stockfish 14, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...
I think, the results are pretty interesting, especially, when you use engines for analyzing human games using the MultiPV-mode.

https://www.sp-cc.de/experiments.htm
Well, MultiPV mode in Stockfish is unnecessarily expensive.
With MCTS you always have full MultiPV info available by default!

If you're interested I can provide a SF version with a small change which
should give noticeably better results.
At the moment, I watch the ChesstourFinals with Carlsen on youtube, where Stockfish is used for analyzing with pv=3. That reminded me, that analyzing human chess is one of the most important fields of use for the top-engines. And my experiment has shown very clear, that MCTS is clearly better here, because pv=7 is possible without any Elo-loss. So, I think, it is a really bad thing, that MCTS is not part of the official Stockfish. MCTS in the next official SF release is, what the chess players really need IMHO.
This is not related to MCTS. It is due to the fact that the SF developers have chosen to give equal resources to all pv's in a multipv search.
It's true that giving less resources (by for example more reduction) to later moves in multipv will raise the quality of the best move chosen, but MCTS does this much better for two reasons. 1. It is totally free, not just less costly 2. If two or more moves are nearly tied for being best, they will get nearly equal resources. Only significantly worse moves will get significantly less resources. With A/B you just have the order of the moves to go by, not how much they differ. MCTS handles MultiPV exactly the way it should be handled, with the quality of analysis being in proportion to the merit of the move, not just to its position on a list.

With A/B you also have how much the moves differ in multi-pv(not all the moves but the best moves)

Suppose you have multi-pv with 5 different moves and have the scores 0.3,0.2,0.1,-2.3,-2.4 at depth 19 then it is clear that the 4th move is evaluated to be clearly worse and you can decide to give it significantly less time after you get exact scores for the first 3 moves at depth 20.

Cornfed · Post by **Cornfed** » Sat Oct 02, 2021 11:25 pm

dkappe wrote: ↑Sat Oct 02, 2021 9:03 pm
Cornfed wrote: ↑Sat Oct 02, 2021 5:41 pm
But, lets be honest here, pv7 is really overkill in most positions. pv7 could lead to a need to redesign of some online interfaces for the additional screen clutter of irrelevant information. This IS after all 'human chess' with human commentators...I know my eyes are on the main screen with (right now) Peter and Simon and the game they are commentating on (I think without help of engines ?)....and sometimes I look at the space efficient 'eval bar'. I personally never look at even the pv3 at the bottom of the chess.24 website I am on.

I think one could argue that a pv 7 (or 6 or 5 etc) for a move with no context really does nothing at all to enhance the experience for either really good, good, poor...really poor chess playing viewers...it is just a string of possible moves with a number attached to it. For example, why is 34. Qe2 +2.43 while 34 f5 +4.45 ???
I recall a discussion with GM Hammer about lc0 a few years ago. His complaint was that with multipv=4, the quality past the first move was often pretty questionable. If you’re familiar with UCT, that stands to reason. You get multipv “for free,” but it isn’t very good. Don’t know if they’ve done anything to fix that.

At any rate, we may not think that multipv isn’t useful for the majority of chess players, but what we think doesn’t really matter much.

Not 100% sure how to take the double negative "we may not think that multipv isn't useful...". But, the sheer numbers (and comments) of the vast majority of players watching those broadcasts would lead me to believe that the general chess IQ is pretty darn low...and that multi-pv of 3 or 7 lines is largely irrelevant.

Now, for analysis of games, opening prep and such for individual, Multi-PV as Dragon does it, is a pretty sweet idea!

carldaman · Post by **carldaman** » Sun Oct 03, 2021 12:28 am

You could convert it to 'we may not think that multipv is useless'..

dkappe · Post by **dkappe** » Sun Oct 03, 2021 2:04 am

Sorry, should have been

At any rate, we may not think that multipv is useful for the majority of chess players, but what we think doesn’t really matter much

Cornfed · Post by **Cornfed** » Sun Oct 03, 2021 5:21 am

dkappe wrote: ↑Sun Oct 03, 2021 2:04 am Sorry, should have been

At any rate, we may not think that multipv is useful for the majority of chess players, but what we think doesn’t really matter much

Right, that's what I figured.
'Eye Candy' does have its appeal to those who are maybe not be capable of understanding what is going on during a game. How many times have I heard someone in chat exclaim "Blunder!" when the move chosen by a GM gives...say a + 2.75 when an engine eval for some move not played was +3.35.

pohl4711 · Post by **pohl4711** » Wed Oct 06, 2021 12:49 pm

lkaufman wrote: ↑Fri Oct 01, 2021 5:04 pm
pohl4711 wrote: ↑Fri Oct 01, 2021 3:19 pm
lkaufman wrote: ↑Fri Sep 24, 2021 4:30 pm KomodoChess has just released Dragon 2.5 at KomodoChess.com.
Huge experimental RoundRobin tournament (10500 games, 3min+1sec) with 3 engines (Stockfish 14, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...
I think, the results are pretty interesting, especially, when you use engines for analyzing human games using the MultiPV-mode.

https://www.sp-cc.de/experiments.htm
Thanks for running this tournament! Although you skipped over MPV=4, by interpolation it would appear that this is the magic number where Dragon 2.5 MCTS becomes stronger than SF14. With Dragon MCTS, MultiPV is merely displaying information already calculated, so there should be no measurable elo cost to it, and indeed your results showed all settings for MCTS as equal with margin of error. I believe that MPV=3 is the most popular setting for this parameter, so it is a worthwhile and achievable goal for Dragon MCTS to surpass SF14 with both set to MPV=3 in the near future.

I restarted the MultiPV-experimental RoundRobin, same conditions, except Stockfish 14 is replaced by Stockfish 211006, because until now, there was a bad time-management-bug in Stockfish, when running in MultiPV-mode. Now, it is fixed.

Author: J. Oster
Date: Wed Oct 6 11:53:33 2021 +0200
Timestamp: 1633514013

Time-management fix in MultiPV mode.

When playing games in MultiPV mode we must take care to only track the
best move changing for the first PV line. Otherwise, SF will spend most
of its time for the initial moves after the book exit.

This has been observed and reported on Discord, but can also be seen in
games played in Stefan Pohl's MultiPV experiment.

Tested with MultiPV=4.

STC:
https://tests.stockfishchess.org/tests/ ... 38a969b990
LLR: 2.95 (-2.94,2.94) <-0.50,2.50>
Total: 1744 W: 694 L: 447 D: 603 Elo +49.54
Ptnml(0-2): 32, 125, 358, 278, 79

LTC:
https://tests.stockfishchess.org/tests/ ... 38a969b993
LLR: 2.94 (-2.94,2.94) <0.50,3.50>
Total: 2048 W: 723 L: 525 D: 800 Elo +33.70
Ptnml(0-2): 10, 158, 511, 314, 31

pohl4711 · Post by **pohl4711** » Sat Oct 09, 2021 8:25 am

I repeated the experimental MultiPV-RoundRobin tournament, because in the first testrun, Stockfish 14 played, which has a buggy time-management in MultiPV-mode. Now, this bug is fixed in Stockfish 211006. So, the tournament was replayed with 3 engines (Stockfish 211006, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...

The new (and old) results can be seen on my website (games download is there, too (old and new testrun (plus statitics)):

https://www.sp-cc.de/experiments.htm

Joerg Oster · Post by **Joerg Oster** » Sat Oct 09, 2021 11:22 am

pohl4711 wrote: ↑Sat Oct 09, 2021 8:25 am I repeated the experimental MultiPV-RoundRobin tournament, because in the first testrun, Stockfish 14 played, which has a buggy time-management in MultiPV-mode. Now, this bug is fixed in Stockfish 211006. So, the tournament was replayed with 3 engines (Stockfish 211006, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...

The new (and old) results can be seen on my website (games download is there, too (old and new testrun (plus statitics)):

https://www.sp-cc.de/experiments.htm

Just 2 notes from me.

1. Time management in Stockfish and MultiPV mode is very likely still not optimal.
2. There is no difference for Dragon MCTS (or Lc0 or any other MCTS like searcher in that regard) with 2, 3, 5 or 7, or even 100 PVs. None!
(All you can measure here is the overhead for sending more than 1 PV line to the GUI!)

Anyway, thank you for this test, especially for repeating it after the fix!

pohl4711 · Post by **pohl4711** » Sat Oct 09, 2021 1:15 pm

Joerg Oster wrote: ↑Sat Oct 09, 2021 11:22 am
2. There is no difference for Dragon MCTS (or Lc0 or any other MCTS like searcher in that regard) with 2, 3, 5 or 7, or even 100 PVs. None!
(All you can measure here is the overhead for sending more than 1 PV line to the GUI!)

I know this, of course. Making Komodo MCTS part of the tournament should proof, that MCTS has no Elo-loss, setting pv higher than 1 - in opposite to the classical AlphaBeta-engines, which have a clear Elo-loss with pv higher than 1.

Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.