Komodo Dragon 2.5 released.

Cornfed · Post by **Cornfed** » Mon Sep 27, 2021 2:46 pm

"1000 games head-to-head vs. Stockfish 14 is impressive, too: 1000 (+ 27,=872,-101) = 46.3 % (means: -26 Elo performance)"

lets remember that chess is fundamentally 'drawish' at the top levels. So...
Being able to even draw is important. But, I can't get beyond the importance of wins vs losses. Lose to win of 'roughly' 4:1 is important to note.

Still, for the number lovers, a current Dragon pulling within about 50 elo of the current Stockfish is notable.

CMCanavessi · Post by **CMCanavessi** » Mon Sep 27, 2021 11:45 pm

dkappe wrote: ↑Mon Sep 27, 2021 9:55 am
To Answer your question, we trained primarily against Stockfish for this version, but only trained on Komodo's moves and scores, not Stockfish moves or scores. We did not train against Lc0 yet, but you can expect us to do so.
So no training data generated by SF was used. The idea is that the network learns how to beat an opponent through this method of adversarial training.

Some experiments suggest that adversarial training is especially effective against mcts/nn engines like lc0. Time will tell.

How does this work though? Let's say the game is in position X and it's Komodo's time to move (last move by SF), so I assume it won't train on that position. Then Komodo makes a move and position Y is reached, which is used for training, as it was a Komodo move. Is that how it goes? But then, position Y already has SF's moves "inside"... I mean, it was reached by having stockfish make half of the moves. Maybe by using only Komodo, it would have never reached that position... How do you separate that? It's impossible.

dkappe · Post by **dkappe** » Tue Sep 28, 2021 12:11 am

CMCanavessi wrote: ↑Mon Sep 27, 2021 11:45 pm How does this work though? Let's say the game is in position X and it's Komodo's time to move (last move by SF), so I assume it won't train on that position. Then Komodo makes a move and position Y is reached, which is used for training, as it was a Komodo move. Is that how it goes? But then, position Y already has SF's moves "inside"... I mean, it was reached by having stockfish make half of the moves. Maybe by using only Komodo, it would have never reached that position... How do you separate that? It's impossible.

That is the point with adversarial data generation. We want to emphasize those evals where Dragon is able to beat SF and de-emphasize those where it loses to SF. Same with lc0 when we add it to the mix.

Note that this is very different than using leela training data directly, as SF does. We’re not making use of the adversarial engine’s scores, which is essentially what calls the tune in terms of network training.

While this is interesting from a training standpoint, the Komodo specific inputs, network logic, and the RL specific model loss have a great deal more impact.

Cornfed · Post by **Cornfed** » Thu Sep 30, 2021 1:54 am

Something I've been curious about...and given the increase in strength in both Dragon 2.5 and slightly more in its MCTS brother: In a match between the two - where sides alternate the same openings, would MCTS ever win a game (black or white) that Dragon would not win?

dkappe · Post by **dkappe** » Thu Sep 30, 2021 3:20 am

Cornfed wrote: ↑Thu Sep 30, 2021 1:54 am Something I've been curious about...and given the increase in strength in both Dragon 2.5 and slightly more in its MCTS brother: In a match between the two - where sides alternate the same openings, would MCTS ever win a game (black or white) that Dragon would not win?

Without going into too much detail, ab engines at higher depths have lots of noise in their scores. All attempts at training with deep searches have failed miserably. (Night Nurse/Bad Gyal 800 node data is about as strong as SF11 d13.5, but is less noisy, and Leela data is likely even better). That’s why most data generation by ab engines is done at d9 to d11. Dragon MCTS would have to be adapted for training because of this and we’re planning on doing that but not yet.

Incidentally, it worth pointing out that adversarial data gen is twice as expensive as regular data gen, as you’re throwing out half the positions.

lkaufman · Post by **lkaufman** » Thu Sep 30, 2021 5:01 am

Cornfed wrote: ↑Thu Sep 30, 2021 1:54 am Something I've been curious about...and given the increase in strength in both Dragon 2.5 and slightly more in its MCTS brother: In a match between the two - where sides alternate the same openings, would MCTS ever win a game (black or white) that Dragon would not win?

I think the answer is pretty much the same as if you asked whether SF 12 would ever win a game from SF 14 without SF 14 winning with the opposite color. It won't happen often, but it will certainly happen if you play a lot of games. Many games will simply go different ways after a few moves, and the weaker player will sometimes win one, while the other game will just be a totally different game. Regular Dragon usually plays better moves, but sometimes MCTS mode will choose the better move. The searches are very different, so the programs are different.

pohl4711 · Post by **pohl4711** » Fri Oct 01, 2021 3:19 pm

lkaufman wrote: ↑Fri Sep 24, 2021 4:30 pm KomodoChess has just released Dragon 2.5 at KomodoChess.com.

Huge experimental RoundRobin tournament (10500 games, 3min+1sec) with 3 engines (Stockfish 14, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...
I think, the results are pretty interesting, especially, when you use engines for analyzing human games using the MultiPV-mode.

https://www.sp-cc.de/experiments.htm

lkaufman · Post by **lkaufman** » Fri Oct 01, 2021 5:04 pm

pohl4711 wrote: ↑Fri Oct 01, 2021 3:19 pm
lkaufman wrote: ↑Fri Sep 24, 2021 4:30 pm KomodoChess has just released Dragon 2.5 at KomodoChess.com.
Huge experimental RoundRobin tournament (10500 games, 3min+1sec) with 3 engines (Stockfish 14, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...
I think, the results are pretty interesting, especially, when you use engines for analyzing human games using the MultiPV-mode.

https://www.sp-cc.de/experiments.htm

Thanks for running this tournament! Although you skipped over MPV=4, by interpolation it would appear that this is the magic number where Dragon 2.5 MCTS becomes stronger than SF14. With Dragon MCTS, MultiPV is merely displaying information already calculated, so there should be no measurable elo cost to it, and indeed your results showed all settings for MCTS as equal with margin of error. I believe that MPV=3 is the most popular setting for this parameter, so it is a worthwhile and achievable goal for Dragon MCTS to surpass SF14 with both set to MPV=3 in the near future.

Joerg Oster · Post by **Joerg Oster** » Fri Oct 01, 2021 7:33 pm

pohl4711 wrote: ↑Fri Oct 01, 2021 3:19 pm
lkaufman wrote: ↑Fri Sep 24, 2021 4:30 pm KomodoChess has just released Dragon 2.5 at KomodoChess.com.
Huge experimental RoundRobin tournament (10500 games, 3min+1sec) with 3 engines (Stockfish 14, KomodoDragon 2.5 and KomodoDragon 2.5 MCTS), each with 5 different MultiPV-settings (1,2,3,5 and 7, were 1 is the normal, default playing mode). Goal: Measure, how much Elo is lost by calculating more than one PV-line. And to measure, if Dragon 2.5 MCTS has less Elo-loss, than the AlphaBeta-engines, when MultiPV is 3 or higher...
I think, the results are pretty interesting, especially, when you use engines for analyzing human games using the MultiPV-mode.

https://www.sp-cc.de/experiments.htm

Well, MultiPV mode in Stockfish is unnecessarily expensive.
With MCTS you always have full MultiPV info available by default!

If you're interested I can provide a SF version with a small change which
should give noticeably better results.

Uri Blass · Post by **Uri Blass** » Fri Oct 01, 2021 8:30 pm

I do not think that it is a correct way to test the multi-pv quality.
You can easily get better result with multi-pv if you reduce the depth of all the moves except the best one by a lot of plies and do a research only if you find that they are better than the best move.

Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.

Re: Komodo Dragon 2.5 released.