Made In Heaven class Time Control Comparison

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

Vinvin
Posts: 5296
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Made In Heaven class Time Control Comparison

Post by Vinvin »

Modern Times wrote:Brilliant work Aser, the graph is very enlightening.

Question is, at what point does Komodo level off...
+1 :!:
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Made In Heaven class Time Control Comparison

Post by Milos »

Aser Huerga wrote:As a Shaun Brewer suggestion, I decided to run my games at different Time Controls to see how the top engines strength change as time increases. Here are the results:

Five i7-3930K CPUs 4.25 GHz
1 core for all engines
Ponder off
1024 Hash
3-4-5 EGTBs (when available) in SSDs

Code: Select all

3'+1" Time Control

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Houdini 4       :   32.1   13.3    340.0     600   56.7%
   2 Komodo TCEC     :  -14.4   13.4    282.0     600   47.0%
   3 Stockfish DD    :  -17.6   13.8    278.0     600   46.3%


9'+3" Time Control

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :   16.6   14.1    320.5     600   53.4%
   2 Houdini 4       :    8.1   13.5    310.0     600   51.7%
   3 Komodo TCEC     :  -24.7   13.4    269.5     600   44.9%

27'+9" Time Control

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :   22.6   14.1    328.0     600   54.7%
   2 Houdini 4       :    5.7   13.4    307.0     600   51.2%
   3 Komodo TCEC     :  -28.3   13.4    265.0     600   44.2%

54'+18" Time Control

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :   11.4   14.2    314.0     600   52.3%
   2 Houdini 4       :    0.4   13.7    300.5     600   50.1%
   3 Komodo TCEC     :  -11.8   13.6    285.5     600   47.6%

90'+30" Time Control

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :   10.5   12.9    313.0     600   52.2%
   2 Komodo TCEC     :    0.8   13.2    301.0     600   50.2%
   3 Houdini 4       :  -11.3   13.1    286.0     600   47.7%

Image

I want to thanks Adam Hair for his help in the presentation of graph and results.

( All the games can be downloaded here: TTC_All_Games )
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Made In Heaven class Time Control Comparison

Post by Laskos »

Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Made In Heaven class Time Control Comparison

Post by michiguel »

Laskos wrote:
Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.
It is a bit more straightforward to analyze if the rating numbers are run not "against the average" (errors are misleading, particularly for a low number of participants) but run against Houdini as reference.

For instance
ordo -q -p TCC31.pgn -a0 -A"Houdini 4" -W -s10000 -F90

(quiet, TCC31.pgn as input, center to 0, reference is Houdini, calculate white advantage, calculate errors with 10k simulations, confidence 90%)

Will give

Code: Select all

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Houdini 4       :    0.0   ----    340.0     600   56.7%
   2 Komodo TCEC     :  -46.5   19.5    282.0     600   47.0%
   3 Stockfish DD    :  -49.7   19.5    278.0     600   46.3%

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :    8.5   19.6    320.5     600   53.4%
   2 Houdini 4       :    0.0   ----    310.0     600   51.7%
   3 Komodo TCEC     :  -32.8   19.6    269.5     600   44.9%

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :   17.0   19.6    328.0     600   54.7%
   2 Houdini 4       :    0.0   ----    307.0     600   51.2%
   3 Komodo TCEC     :  -33.9   19.7    265.0     600   44.2%

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :   11.0   19.7    314.0     600   52.3%
   2 Houdini 4       :    0.0   ----    300.5     600   50.1%
   3 Komodo TCEC     :  -12.2   19.6    285.5     600   47.6%

   # PLAYER          : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish DD    :   21.7   19.4    313.0     600   52.2%
   2 Komodo TCEC     :   12.1   19.8    301.0     600   50.2%
   3 Houdini 4       :    0.0   ----    286.0     600   47.7%
There is no doubt that SF crosses Houdini comparing both extremes with 90% confidence. But you are right Kai, you can combine the intermediate TC data and the confidence will increase. The trend is not meaningless.

Miguel
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Made In Heaven class Time Control Comparison

Post by Milos »

Laskos wrote:
Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.
Well you are wrong as usual. Even if you group it 2 by 2, 1SD is 10Elo, and quoting smaller numbers doesn't help since comparison is of 3 engines not 2 so 2SD is simply 30Elo, or if it suites you 28Elo ;).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Made In Heaven class Time Control Comparison

Post by Laskos »

Milos wrote:
Laskos wrote:
Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.
Well you are wrong as usual. Even if you group it 2 by 2, 1SD is 10Elo, and quoting smaller numbers doesn't help since comparison is of 3 engines not 2 so 2SD is simply 30Elo, or if it suites you 28Elo ;).
As is so usual with you, you cover your crass mistake by misleading statements. You were talking about the errors in the engines' ratings as being 30 Elo points 2SD against the average. EloStat gives simply mine 20 points 2SD against the average, and BayesElo 15 points 2SD. Miguel fixed Houdini's rating, and got 20 points almost 2SD in simulations, which can be used directly for comparison. So, Milos, shut up when you are wrong (most of the times), and don't mislead people here with your incorrect statements resulting in "plot is meaningless".
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 4:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: Made In Heaven class Time Control Comparison

Post by ouachita »

the specific testing methods and data points can and will be debated, but the basic trend lines should make sense to anyone who has been paying even casual attention to recent match and testing results.
SIM, PhD, MBA, PE
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Made In Heaven class Time Control Comparison

Post by Milos »

Laskos wrote:Miguel fixed Houdini's rating, and got 20 points almost 2SD in simulations, which can be used directly for comparison. So, Milos, shut up when you are wrong (most of the times), and don't mislead people here with your incorrect statements resulting in "plot is meaningless".
Ordo is crap as usual and 90% confidence rate is nowhere near 2SD. Maybe you should check basic statistics before you post stupid stuff.

And if anyone is misleading that are couple of you here fans of Komodo...
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Made In Heaven class Time Control Comparison

Post by michiguel »

Milos wrote:
Laskos wrote:Miguel fixed Houdini's rating, and got 20 points almost 2SD in simulations, which can be used directly for comparison. So, Milos, shut up when you are wrong (most of the times), and don't mislead people here with your incorrect statements resulting in "plot is meaningless".
Ordo is crap as usual and 90% confidence rate is nowhere near 2SD. Maybe you should check basic statistics before you post stupid stuff.
90% confidence, is 90% confidence, regardless of how many SD or even what type of distribution we talk about. The point is, 90% is not meaningless.
Again, this is just considering the extreme. But there are three other conditions above 9'+3'' in which SF perfoms better than Houdini (2400 games total). That will easily take it above 95% confidence.

"Ordo is crap" = Why do you keep trolling every time you see Ordo in a post? The fact is that it is irrelevant how you calculate this. You can do it with BayesELO, calculate LOS, and get similar results (LOS, not necessarily the errors in BE, which could be misleading if you do not what options you choose). If you remove the draws included in BE the numbers are identical to Ordo.
And if anyone is misleading that are couple of you here fans of Komodo...
We are talking about SF. The fact is the statement that SF scales better than Houdini is not a joke, which makes us believe that what we saw in TCEC was not a fluke.

Miguel
PCM72
Posts: 8
Joined: Mon Dec 23, 2013 3:23 pm

Re: Made In Heaven class Time Control Comparison

Post by PCM72 »

Hi.
What book/testset has been used?
How much the quality of neutral starting positions is important in your opinion?