Komodo MCTS scales worse with TC than Komodo A/B?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Komodo MCTS scales worse with TC than Komodo A/B?

Post by Laskos »

First, I checked that K12.2 AB and K12.2 MCTS behave very similarly in all sorts of variants and phases of the game, so Leela weird behavior in variants and endgames (weaker than regular chess performance) is due mostly to its eval and not to its MCTS search. K MCTS is simply about 100-150 Elo points weaker than K AB in most situations (say at a draw rate of 50%). At 1s/move time control on 1 thread.

Then I wanted to see the scaling. I haven't chosen regular chess, as draw rate at longer TC between them is going high, and in 100 games matches I will have weak Wins and Losses statistics. I have chosen the playable 3-Queens Chess:
[d]rnqqkqnr/pppppppp/8/8/8/8/PPPPPPPP/RNQQKQNR w KQkq - 0 1
It has a lower draw rate and the draw rate is increasing mildly with time control. I have built a small 2-mover EPD opening book for it.

The result at 1s/move in 100 games is (engines on 1 i7 thread):

Code: Select all

Games Completed = 100 of 100 (Avg game length = 138.082 sec)
Settings = Gauntlet/64MB/1000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 3521 sec elapsed, 0 sec remaining
 1.  Komodo                   	67.5/100	44-9-47  	(L: m=9 t=0 i=0 a=0)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=972.5 d=24.10 nps=1914802)
 2.  Komodo MCTS              	32.5/100	9-44-47  	(L: m=43 t=0 i=0 a=1)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=990.3 d=9.22 nps=2812)
The result at 10s/move in 100 games is:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1397.608 sec)
Settings = Gauntlet/64MB/10000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 36779 sec elapsed, 0 sec remaining
 1.  Komodo                   	71.0/100	44-2-54  	(L: m=2 t=0 i=0 a=0)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9672.4 d=31.39 nps=1925693)
 2.  Komodo MCTS              	29.0/100	2-44-54  	(L: m=41 t=0 i=0 a=3)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9832.4 d=14.26 nps=2941)
Elo-wise it's hard to distinguish between the two, just a bit worse scaling for K MCTS. But we know that due to increased draw rate with time control, the better indicator for scaling is "Normalized Elo" of Michel Van Den Bergh http://hardy.uhasselt.be/Toga/normalized_elo.pdf

And here are the results for Normalized Elo:

1s/move: 0.548 sigma=0.1
10s/move: 0.789 sigma=0.1

The difference is 0.241 and 1 sigma of the difference is 0.141. The likelihood that K MCTS scales worse than K AB is 95%.
To note that this result is for this particular game with 3 Queens, but I saw no differences in behavior with respect to different variants of chess or chess board games between K AB and K MCTS.
lkaufman
Posts: 6023
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by lkaufman »

A couple points worth mentioning. If you want to eliminate the possible distortion of one engine simply being much stronger than the other, I suggest you test Komodo 12.2 mcts (or wait for the bugfix 12.2.1) vs. Komodo 9 (or 9.02 or 9.1 if you prefer), which is our best free version and is very evenly matched with Komodo 12.2 MCTS in my tests. But most likely you will find that normal Komodo scales better from 1 to 10 seconds on one thread. The reason is that at one second per move, Komodo MCTS doesn't have enough time to really "do its thing" and is more or less a crippled normal Komodo. But at ten seconds per move (or even five) the MCTS aspect is in full effect. So my main point is that how Komodo MCTS scales from 1 to 10 seconds on one thread is not predictive of how it would scale from 5" to 50". My data is inconclusive on this point, I think the scaling is pretty similar. We'll know when CCRL has ratings for both 40/4 and 40/40 for Komodo MCTS, or CEGT for 40/4 and 40/20, or fastgm for 10' and 60', which can be compared with Komodo 9.
I think your three queens solution to the draw problem is interesting, but perhaps not so predictive of normal chess. My preferred solution to the draw problem is to start with positions evaluated around 0.7 or so by Komodo, counting draws as wins for the bad side. With alternating colors, no draws at all, equal chances, and reasonably normal chess.
Komodo rules!
shrapnel
Posts: 1339
Joined: Fri Nov 02, 2012 9:43 am
Location: New Delhi, India

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by shrapnel »

Laskos wrote: Fri Nov 09, 2018 12:02 am First, I checked that K12.2 AB and K12.2 MCTS behave very similarly in all sorts of variants and phases of the game, so Leela weird behavior in variants and endgames (weaker than regular chess performance) is due mostly to its eval and not to its MCTS search. K MCTS is simply about 100-150 Elo points weaker than K AB in most situations (say at a draw rate of 50%). At 1s/move time control on 1 thread.

Then I wanted to see the scaling. I haven't chosen regular chess, as draw rate at longer TC between them is going high, and in 100 games matches I will have weak Wins and Losses statistics. I have chosen the playable 3-Queens Chess:
[d]rnqqkqnr/pppppppp/8/8/8/8/PPPPPPPP/RNQQKQNR w KQkq - 0 1
It has a lower draw rate and the draw rate is increasing mildly with time control. I have built a small 2-mover EPD opening book for it.

The result at 1s/move in 100 games is (engines on 1 i7 thread):

Code: Select all

Games Completed = 100 of 100 (Avg game length = 138.082 sec)
Settings = Gauntlet/64MB/1000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 3521 sec elapsed, 0 sec remaining
 1.  Komodo                   	67.5/100	44-9-47  	(L: m=9 t=0 i=0 a=0)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=972.5 d=24.10 nps=1914802)
 2.  Komodo MCTS              	32.5/100	9-44-47  	(L: m=43 t=0 i=0 a=1)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=990.3 d=9.22 nps=2812)
The result at 10s/move in 100 games is:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1397.608 sec)
Settings = Gauntlet/64MB/10000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 36779 sec elapsed, 0 sec remaining
 1.  Komodo                   	71.0/100	44-2-54  	(L: m=2 t=0 i=0 a=0)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9672.4 d=31.39 nps=1925693)
 2.  Komodo MCTS              	29.0/100	2-44-54  	(L: m=41 t=0 i=0 a=3)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9832.4 d=14.26 nps=2941)
Elo-wise it's hard to distinguish between the two, just a bit worse scaling for K MCTS. But we know that due to increased draw rate with time control, the better indicator for scaling is "Normalized Elo" of Michel Van Den Bergh http://hardy.uhasselt.be/Toga/normalized_elo.pdf

And here are the results for Normalized Elo:

1s/move: 0.548 sigma=0.1
10s/move: 0.789 sigma=0.1

The difference is 0.241 and 1 sigma of the difference is 0.141. The likelihood that K MCTS scales worse than K AB is 95%.
To note that this result is for this particular game with 3 Queens, but I saw no differences in behavior with respect to different variants of chess or chess board games between K AB and K MCTS.
I should have thought it was obvious, no testing required to know beforehand that Komodo MCTS sucks big time. As was stated much earlier, Komodo Team was simply trying to cash in on the interest generated at the time when AlphaZero wiped the floor with the alpha-beta Stockfish.
MCTS without NN is an exercise in futility and Komodo MCTS was released simply to trap the gullible.
The moment the Komodo team admitted that Komodo MCTS couldn't use the power of the GPU, I knew it was just a fraud.
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Laskos »

lkaufman wrote: Fri Nov 09, 2018 5:42 am A couple points worth mentioning. If you want to eliminate the possible distortion of one engine simply being much stronger than the other, I suggest you test Komodo 12.2 mcts (or wait for the bugfix 12.2.1) vs. Komodo 9 (or 9.02 or 9.1 if you prefer), which is our best free version and is very evenly matched with Komodo 12.2 MCTS in my tests. But most likely you will find that normal Komodo scales better from 1 to 10 seconds on one thread. The reason is that at one second per move, Komodo MCTS doesn't have enough time to really "do its thing" and is more or less a crippled normal Komodo. But at ten seconds per move (or even five) the MCTS aspect is in full effect. So my main point is that how Komodo MCTS scales from 1 to 10 seconds on one thread is not predictive of how it would scale from 5" to 50". My data is inconclusive on this point, I think the scaling is pretty similar. We'll know when CCRL has ratings for both 40/4 and 40/40 for Komodo MCTS, or CEGT for 40/4 and 40/20, or fastgm for 10' and 60', which can be compared with Komodo 9.
I think your three queens solution to the draw problem is interesting, but perhaps not so predictive of normal chess. My preferred solution to the draw problem is to start with positions evaluated around 0.7 or so by Komodo, counting draws as wins for the bad side. With alternating colors, no draws at all, equal chances, and reasonably normal chess.
Ok, soon I will leave for a small weekend vacation, but I will leave 60''/move test running (it will take the full weekend) from unbalanced positions of the regular chess (70-90cp disbalance). Komodo 12.2 MCTS against Komodo 9.x. Then I will perform much faster 6''/move test. I will count points as you suggest. Or maybe I will use Normalized Elo and pentanomial variance, as the pairs of games are strongly correlated in outcomes.

100 games matches on 1 thread. Will report maybe on Monday or so.
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by mjlef »

shrapnel wrote: Fri Nov 09, 2018 6:25 am
I should have thought it was obvious, no testing required to know beforehand that Komodo MCTS sucks big time. As was stated much earlier, Komodo Team was simply trying to cash in on the interest generated at the time when AlphaZero wiped the floor with the alpha-beta Stockfish.
MCTS without NN is an exercise in futility and Komodo MCTS was released simply to trap the gullible.
The moment the Komodo team admitted that Komodo MCTS couldn't use the power of the GPU, I knew it was just a fraud.
Monte Carlo Tree Search does not require a neural network, so I think you might be confusing the two. It is a different way to generate a search tree with some interesting features, like being able to exploit opponent weaknesses easier. We have been clear about what we are doing all along, and have never claimed it uses a neural network.

As for neural networks, we have actually been doing neural network training and design since early summer, and it is continuing. Should it prove better than what we are already doing, we will switch or more likely add it as an option. But we do not want to leave the thousands of people who do not have or who cannot afford a fast GPU from being able to benefit from Komodo MCTS.

As for "trap the gullible" and "fraud", you are simply wrong. How much extra did we charge people for the MCTS mode in Komodo? Nothing. There is no fraud because it does what we said it does. With a 200 elo gain just since the last release in June. If this is "fraud" then it is clear we do not know how to do it very well.

Please stop making false claims.

Mark
Joerg Oster
Posts: 950
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Joerg Oster »

Laskos wrote: Fri Nov 09, 2018 12:02 am First, I checked that K12.2 AB and K12.2 MCTS behave very similarly in all sorts of variants and phases of the game, so Leela weird behavior in variants and endgames (weaker than regular chess performance) is due mostly to its eval and not to its MCTS search. K MCTS is simply about 100-150 Elo points weaker than K AB in most situations (say at a draw rate of 50%). At 1s/move time control on 1 thread.

Then I wanted to see the scaling. I haven't chosen regular chess, as draw rate at longer TC between them is going high, and in 100 games matches I will have weak Wins and Losses statistics. I have chosen the playable 3-Queens Chess:
[d]rnqqkqnr/pppppppp/8/8/8/8/PPPPPPPP/RNQQKQNR w KQkq - 0 1
It has a lower draw rate and the draw rate is increasing mildly with time control. I have built a small 2-mover EPD opening book for it.

The result at 1s/move in 100 games is (engines on 1 i7 thread):

Code: Select all

Games Completed = 100 of 100 (Avg game length = 138.082 sec)
Settings = Gauntlet/64MB/1000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 3521 sec elapsed, 0 sec remaining
 1.  Komodo                   	67.5/100	44-9-47  	(L: m=9 t=0 i=0 a=0)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=972.5 d=24.10 nps=1914802)
 2.  Komodo MCTS              	32.5/100	9-44-47  	(L: m=43 t=0 i=0 a=1)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=990.3 d=9.22 nps=2812)
The result at 10s/move in 100 games is:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1397.608 sec)
Settings = Gauntlet/64MB/10000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 36779 sec elapsed, 0 sec remaining
 1.  Komodo                   	71.0/100	44-2-54  	(L: m=2 t=0 i=0 a=0)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9672.4 d=31.39 nps=1925693)
 2.  Komodo MCTS              	29.0/100	2-44-54  	(L: m=41 t=0 i=0 a=3)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9832.4 d=14.26 nps=2941)
Elo-wise it's hard to distinguish between the two, just a bit worse scaling for K MCTS. But we know that due to increased draw rate with time control, the better indicator for scaling is "Normalized Elo" of Michel Van Den Bergh http://hardy.uhasselt.be/Toga/normalized_elo.pdf

And here are the results for Normalized Elo:

1s/move: 0.548 sigma=0.1
10s/move: 0.789 sigma=0.1

The difference is 0.241 and 1 sigma of the difference is 0.141. The likelihood that K MCTS scales worse than K AB is 95%.
To note that this result is for this particular game with 3 Queens, but I saw no differences in behavior with respect to different variants of chess or chess board games between K AB and K MCTS.
When running Komodo MCTS over a testsuite with 10 seconds per position,
it noticeably helped to increase the 'MCTS Explore' setting to 12.

Maybe it is worth a try.
Jörg Oster
lkaufman
Posts: 6023
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by lkaufman »

More exploration should indeed help on solving problems, but we optimize for actual play. It is quite possible that our default value can be improved upon when using more threads or more time, we can't optimize for every possibility. We encourage others to report good results with non-default values as you have done. Thank you.
Komodo rules!
chrisw
Posts: 4477
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by chrisw »

mjlef wrote: Fri Nov 09, 2018 9:10 pm
shrapnel wrote: Fri Nov 09, 2018 6:25 am
I should have thought it was obvious, no testing required to know beforehand that Komodo MCTS sucks big time. As was stated much earlier, Komodo Team was simply trying to cash in on the interest generated at the time when AlphaZero wiped the floor with the alpha-beta Stockfish.
MCTS without NN is an exercise in futility and Komodo MCTS was released simply to trap the gullible.
The moment the Komodo team admitted that Komodo MCTS couldn't use the power of the GPU, I knew it was just a fraud.
Monte Carlo Tree Search does not require a neural network, so I think you might be confusing the two. It is a different way to generate a search tree with some interesting features, like being able to exploit opponent weaknesses easier. We have been clear about what we are doing all along, and have never claimed it uses a neural network.

As for neural networks, we have actually been doing neural network training and design since early summer, and it is continuing. Should it prove better than what we are already doing, we will switch or more likely add it as an option. But we do not want to leave the thousands of people who do not have or who cannot afford a fast GPU from being able to benefit from Komodo MCTS.

As for "trap the gullible" and "fraud", you are simply wrong. How much extra did we charge people for the MCTS mode in Komodo? Nothing. There is no fraud because it does what we said it does. With a 200 elo gain just since the last release in June. If this is "fraud" then it is clear we do not know how to do it very well.

Please stop making false claims.

Mark
Past history aside, what you’ve done is impressive. The figures are not clear, but getting within a couple of hundred Elo or so of the AB version and still on a roll is no mean achievement.
I only really got back into studying computer chess from the time of AZ, and having got over my surprise at just how much pruning SF was doing way back in the tree, obviously potentially missing lines (as AZ showed and as looking at the search code showed), it became intuitively clear that if SF could be so strong with many lines missed, then why not MCTS be strong even with the inherent lines missing of MCTS. Then, intuitively, why not strong, inherently pruned MCTS and strongly pruned AB converging on the same kind of tree, it’s all down to candidate move choice in the end, for both algorithms.
So, well done for going with the MCTS, it’s always tougher to be “first” because one never knows if the effort will work. Following on a trodden, proven path is a lot easier. I guess you can expect to have opened the field up now.
Uri Blass
Posts: 10486
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Uri Blass »

shrapnel wrote: Fri Nov 09, 2018 6:25 am
Laskos wrote: Fri Nov 09, 2018 12:02 am First, I checked that K12.2 AB and K12.2 MCTS behave very similarly in all sorts of variants and phases of the game, so Leela weird behavior in variants and endgames (weaker than regular chess performance) is due mostly to its eval and not to its MCTS search. K MCTS is simply about 100-150 Elo points weaker than K AB in most situations (say at a draw rate of 50%). At 1s/move time control on 1 thread.

Then I wanted to see the scaling. I haven't chosen regular chess, as draw rate at longer TC between them is going high, and in 100 games matches I will have weak Wins and Losses statistics. I have chosen the playable 3-Queens Chess:
[d]rnqqkqnr/pppppppp/8/8/8/8/PPPPPPPP/RNQQKQNR w KQkq - 0 1
It has a lower draw rate and the draw rate is increasing mildly with time control. I have built a small 2-mover EPD opening book for it.

The result at 1s/move in 100 games is (engines on 1 i7 thread):

Code: Select all

Games Completed = 100 of 100 (Avg game length = 138.082 sec)
Settings = Gauntlet/64MB/1000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 3521 sec elapsed, 0 sec remaining
 1.  Komodo                   	67.5/100	44-9-47  	(L: m=9 t=0 i=0 a=0)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=972.5 d=24.10 nps=1914802)
 2.  Komodo MCTS              	32.5/100	9-44-47  	(L: m=43 t=0 i=0 a=1)	(D: r=26 i=19 f=0 s=0 a=2)	(tpm=990.3 d=9.22 nps=2812)
The result at 10s/move in 100 games is:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 1397.608 sec)
Settings = Gauntlet/64MB/10000ms per move/M 1500cp for 3 moves, D 140 moves/EPD:C:\LittleBlitzer\3QEPD_Trim.epd(112)
Time = 36779 sec elapsed, 0 sec remaining
 1.  Komodo                   	71.0/100	44-2-54  	(L: m=2 t=0 i=0 a=0)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9672.4 d=31.39 nps=1925693)
 2.  Komodo MCTS              	29.0/100	2-44-54  	(L: m=41 t=0 i=0 a=3)	(D: r=13 i=39 f=0 s=0 a=2)	(tpm=9832.4 d=14.26 nps=2941)
Elo-wise it's hard to distinguish between the two, just a bit worse scaling for K MCTS. But we know that due to increased draw rate with time control, the better indicator for scaling is "Normalized Elo" of Michel Van Den Bergh http://hardy.uhasselt.be/Toga/normalized_elo.pdf

And here are the results for Normalized Elo:

1s/move: 0.548 sigma=0.1
10s/move: 0.789 sigma=0.1

The difference is 0.241 and 1 sigma of the difference is 0.141. The likelihood that K MCTS scales worse than K AB is 95%.
To note that this result is for this particular game with 3 Queens, but I saw no differences in behavior with respect to different variants of chess or chess board games between K AB and K MCTS.
I should have thought it was obvious, no testing required to know beforehand that Komodo MCTS sucks big time. As was stated much earlier, Komodo Team was simply trying to cash in on the interest generated at the time when AlphaZero wiped the floor with the alpha-beta Stockfish.
MCTS without NN is an exercise in futility and Komodo MCTS was released simply to trap the gullible.
The moment the Komodo team admitted that Komodo MCTS couldn't use the power of the GPU, I knew it was just a fraud.
You always need testing in order to know so I totally disagree with you.
I have no problem with people who do testing and share the results.
I have problem with people who say it is obvious and no testing is needed.

You cannot know that komodo is stronger than Komodo MCTS without testing.
Today it is the case.
It is not obvious that it is going to be the case forever.

I do not agree that komodo MCTS was a fraud.
I do not see the wrong information that they gave.
I do not see the relevance of the information if it is MCTS or NN or both for most people.
I do not see a reason for people to buy a chess engine without knowing the results of it(including hardware assuming they care about playing strength).

Uri
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo MCTS scales worse with TC than Komodo A/B?

Post by Laskos »

lkaufman wrote: Fri Nov 09, 2018 5:42 am A couple points worth mentioning. If you want to eliminate the possible distortion of one engine simply being much stronger than the other, I suggest you test Komodo 12.2 mcts (or wait for the bugfix 12.2.1) vs. Komodo 9 (or 9.02 or 9.1 if you prefer), which is our best free version and is very evenly matched with Komodo 12.2 MCTS in my tests. But most likely you will find that normal Komodo scales better from 1 to 10 seconds on one thread. The reason is that at one second per move, Komodo MCTS doesn't have enough time to really "do its thing" and is more or less a crippled normal Komodo. But at ten seconds per move (or even five) the MCTS aspect is in full effect. So my main point is that how Komodo MCTS scales from 1 to 10 seconds on one thread is not predictive of how it would scale from 5" to 50". My data is inconclusive on this point, I think the scaling is pretty similar. We'll know when CCRL has ratings for both 40/4 and 40/40 for Komodo MCTS, or CEGT for 40/4 and 40/20, or fastgm for 10' and 60', which can be compared with Komodo 9.
I think your three queens solution to the draw problem is interesting, but perhaps not so predictive of normal chess. My preferred solution to the draw problem is to start with positions evaluated around 0.7 or so by Komodo, counting draws as wins for the bad side. With alternating colors, no draws at all, equal chances, and reasonably normal chess.
In 4 days I managed to perform this test you propose, and then I interpreted the result using a mathematically sound pentanomial variance (error margin) for paired (side-reversed) games developed and derived by Michel Van den Bergh and me, and described briefly here https://www.chessprogramming.org/Match_Statistics. My openings are pretty markedly unbalanced (80cp-100cp advantage for White), are played side and reversed, and draw rate is kept pretty low. The correct pentanomial error margins in this case are 1.8-2.2 times smaller than naive trinomial error margins usually shown in rating tools, because the outcomes in paired games are pretty highly correlated.

The tests are at 6'' per move and 60'' per move. on 1 i7 3.8 GHz thread (4 concurrent games are running on 4 cores). I set hash at 512MB in both cases. The results are:

6'' per move:

Code: Select all

Games Completed = 100 of 100 (Avg game length = 804.127 sec)
Settings = Gauntlet/512MB/6000ms per move/M 1500cp for 5 moves, D 160 moves/EPD:C:\LittleBlitzer\OP_08_10_W_Trim.epd(5840)
Time = 20394 sec elapsed, 0 sec remaining
 1.  Komodo 9.1               	49.0/100	37-39-24  	(L: m=39 t=0 i=0 a=0)	(D: r=4 i=18 f=0 s=1 a=1)	(tpm=5736.5 d=27.38 nps=2090592)
 2.  Komodo 12.2 MCTS         	51.0/100	39-37-24  	(L: m=37 t=0 i=0 a=0)	(D: r=4 i=18 f=0 s=1 a=1)	(tpm=5934.5 d=12.90 nps=2064)
-6.9 Elo points with 15.9 Elo points 1 sigma pentanomial error margin.


60'' per move

Code: Select all

Games Completed = 100 of 100 (Avg game length = 9338.734 sec)
Settings = Gauntlet/512MB/60000ms per move/M 1500cp for 5 moves, D 160 moves/EPD:C:\LittleBlitzer\OP_08_10_W_Trim.epd(5840)
Time = 241335 sec elapsed, 0 sec remaining
 1.  Komodo 9.1               	55.5/100	40-29-31  	(L: m=29 t=0 i=0 a=0)	(D: r=8 i=17 f=1 s=0 a=5)	(tpm=57467.1 d=33.96 nps=2194737)
 2.  Komodo 12.2 MCTS         	44.5/100	29-40-31  	(L: m=39 t=0 i=0 a=1)	(D: r=8 i=17 f=1 s=0 a=5)	(tpm=59428.3 d=17.66 nps=1658)
38.4 Elo points with 15.2 Elo points 1 sigma pentanomial error margin.

============================================================


  • Difference: 45.3 Elo points.
    1 sigma (pentanomial) for the difference: 22.0 Elo points.

98.0% that Komodo 12.2 MCTS scales worse than Komodo 9.1 A/B. That is already a fairly significant result.