Komodo Dragon 2.5 running for the GRL

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Rebel
Posts: 7522
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Komodo Dragon 2.5 running for the GRL

Post by Rebel »

Komodo Dragon 2.5 running for the GRL.

First run 1000 games against the current top-5

http://rebel13.nl/b/grl.htm

A second run of 1000 games will follow later.

Match shot after 141 games

Code: Select all

Gambit Rating List
Running      : Gauntlet Komodo Dragon 2.5 [first run] 
Time Control : Time control : 40/120
Games        : 1000

Results from file gauntlet-dragon.pgn:

No. Name                 Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------------
  1 Komodo-Dragon 2.5    +64  =73   -4   *0  100.5   141   71.3%
  2 Stockfish 14          +3  =24   -2   *0   15.0    29   51.7%
  3 Ethereal 13.25-NNUE   +1  =12  -15   *0    7.0    28   25.0%
  4 SlowChess 2.7         +0  =14  -14   *0    7.0    28   25.0%
  5 RubiChess 2.2         +0  =12  -16   *0    6.0    28   21.4%
  6 Koivisto 6.16         +0  =11  -17   *0    5.5    28   19.6%

Total Games:     141
White Wins:       31 (22.0%)
Black Wins:       37 (26.2%)
Draws:            73 (51.8%)
Unfinished:        0 (0.0%)

Estimated elo gain for Komodo-Dragon_2.5
Elo pool : 3491
Komodo Dragon 2 : 3590.0
Komodo-Dragon_2.5 : 3632.3
Difference : 42.3
Dragon closing in on SF14 :wink:
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7522
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Komodo Dragon 2.5 running for the GRL

Post by Rebel »

Komodo Dragon 2.5 +34

Code: Select all

Gambit Rating List
Running      : Gauntlet Komodo Dragon 2.5 [first run] 
Time Control : Time control : 40/120
Games        : 1000

Results from file gauntlet-dragon.pgn:

No. Name                 Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------------
  1 Komodo-Dragon 2.5   +464 =476  -60   *0  702.0  1000   70.2%
  2 Stockfish 14         +45 =142  -13   *0  116.0   200   58.0%
  3 SlowChess 2.7         +4  =88 -108   *0   48.0   200   24.0%
  4 Ethereal 13.25-NNUE   +6  =82 -112   *0   47.0   200   23.5%
  5 RubiChess 2.2         +2  =84 -114   *0   44.0   200   22.0%
  6 Koivisto 6.16         +3  =80 -117   *0   43.0   200   21.5%

Total Games:    1000
White Wins:      272 (27.2%)
Black Wins:      252 (25.2%)
Draws:           476 (47.6%)
Unfinished:        0 (0.0%)

Estimated elo gain for Komodo-Dragon_2.5
Elo pool : 3491
Komodo Dragon 2 : 3590.0
Komodo-Dragon_2.5 : 3624.5
Difference : 34.5
Second run of 1000 games - http://rebel13.nl/b/grl.htm

Maybe Dragon 2.5 can pass the revolutionary SF12.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7522
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Komodo Dragon 2.5 running for the GRL

Post by Rebel »

Second cycle of 1000 games.

Komodo-Dragon 2.5 +20

Code: Select all

Gambit Rating List
Running      : Gauntlet Komodo Dragon 2.5 [second run] 
Time Control : Time control : 40/120
Games        : 1000

Results from file gauntlet-dragon.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +522 =434  -44   *0  739.0  1000   73.9%
  2 Stockfish 12       +29 =131  -40   *0   94.5   200   47.2%
  3 Koivisto 6.0-sse2   +5  =76 -119   *0   43.0   200   21.5%
  4 RubiChess 2.1       +4  =77 -119   *0   42.5   200   21.2%
  5 Ethereal 13-NNUE    +3  =77 -120   *0   41.5   200   20.8%
  6 SlowChess 2.6       +3  =73 -124   *0   39.5   200   19.8%

Total Games:    1000
White Wins:      286 (28.6%)
Black Wins:      280 (28.0%)
Draws:           434 (43.4%)
Unfinished:        0 (0.0%)

Estimated elo gain for Komodo-Dragon_2.5
Elo pool : 3451
Komodo Dragon 2 : 3590.0
Komodo-Dragon_2.5 : 3610.4
Difference : 20.4
Perhaps not the wished result but on the bright side Dragon 2.5 beat SF12 convincingly.

Better luck on CCRL, CEGT, FGRL and SPCC!
90% of coding is debugging, the other 10% is writing bugs.
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Komodo Dragon 2.5 running for the GRL

Post by lkaufman »

Rebel wrote: Sat Sep 25, 2021 9:42 pm Second cycle of 1000 games.

Komodo-Dragon 2.5 +20

Code: Select all

Gambit Rating List
Running      : Gauntlet Komodo Dragon 2.5 [second run] 
Time Control : Time control : 40/120
Games        : 1000

Results from file gauntlet-dragon.pgn:

No. Name               Win Draw Loss Unf.  Score Games       %
--------------------------------------------------------------
  1 Komodo-Dragon 2.5 +522 =434  -44   *0  739.0  1000   73.9%
  2 Stockfish 12       +29 =131  -40   *0   94.5   200   47.2%
  3 Koivisto 6.0-sse2   +5  =76 -119   *0   43.0   200   21.5%
  4 RubiChess 2.1       +4  =77 -119   *0   42.5   200   21.2%
  5 Ethereal 13-NNUE    +3  =77 -120   *0   41.5   200   20.8%
  6 SlowChess 2.6       +3  =73 -124   *0   39.5   200   19.8%

Total Games:    1000
White Wins:      286 (28.6%)
Black Wins:      280 (28.0%)
Draws:           434 (43.4%)
Unfinished:        0 (0.0%)

Estimated elo gain for Komodo-Dragon_2.5
Elo pool : 3451
Komodo Dragon 2 : 3590.0
Komodo-Dragon_2.5 : 3610.4
Difference : 20.4
Perhaps not the wished result but on the bright side Dragon 2.5 beat SF12 convincingly.

Better luck on CCRL, CEGT, FGRL and SPCC!
Thanks. In standard chess Dragon 2.5 is closely matched with SF13, which is consistent with your clear victory for it over SF 12. Only in FRC can we win long matches from SF14. Remind me about the hardware you use, we get better results vs SF on newer hardware it seems. Also we never test at repeating time controls, only increment (like CCRL blitz does now, and FGRL and SPCC), but that shouldn't affect the elo gain from Dragon 2, since we didn't work on time management recently. It is possible that our newer net is not much better against weaker engines than the old one, just against near-equal opponents since it is trained that way. Early CEGT results are more in line with my expectations than yours, but sample size is small so far.
Komodo rules!
Modern Times
Posts: 3832
Joined: Thu Jun 07, 2012 11:02 pm

Re: Komodo Dragon 2.5 running for the GRL

Post by Modern Times »

lkaufman wrote: Sat Sep 25, 2021 10:51 pm Only in FRC can we win long matches from SF14.
It is certainly a huge improvement at FRC. My own tests are running, but it will take 4 or 5 days to complete.
User avatar
Rebel
Posts: 7522
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Komodo Dragon 2.5 running for the GRL

Post by Rebel »

Estimated elo rating run1 - Komodo Dragon 2.5 +34 (see above)
Estimated elo rating run2 - Komodo Dragon 2.5 +20 (see above)

While on the rating list calculating the whole database Komodo Dragon 2.5 earns +59 elo.

http://rebel13.nl/a/yanchor.txt

It deserves an explanation.

The live estimated elo progress calculation during gauntlet matches works reasonable well, when the final rating list is created using the full database the final rating of an engine fluctuates on average with -10 / +10 elo. And so the question arises why not with Dragon 2.5, the difference is bizarre.

Explanation, apparently it depends on the composition of the elo pool of engines and their elo rating. Elo pools are created with engines that fluctuate -30 / +30 elo. And the estimated elo (using ORDO) works quite well as an indication of progress. However.... elo pools testing Stockfish / Komodo don't fluctuate with -30 / +30 at all. The elo gap with other engines is 150 elo or more and it becomes impossible for ORDO to produce accurate elo estimations.

The cure, don't use the live estimated elo progress calculation during gauntlet matches any more when an elo pool exceeds -50 / +50. Which currently only is the case with new versions of Stockfish and Komodo. Instead produce a full rating list afterwards.
90% of coding is debugging, the other 10% is writing bugs.