+1Modern Times wrote:Brilliant work Aser, the graph is very enlightening.
Question is, at what point does Komodo level off...

Moderator: Ras
+1Modern Times wrote:Brilliant work Aser, the graph is very enlightening.
Question is, at what point does Komodo level off...
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.Aser Huerga wrote:As a Shaun Brewer suggestion, I decided to run my games at different Time Controls to see how the top engines strength change as time increases. Here are the results:
Five i7-3930K CPUs 4.25 GHz
1 core for all engines
Ponder off
1024 Hash
3-4-5 EGTBs (when available) in SSDs
Code: Select all
3'+1" Time Control # PLAYER : RATING ERROR POINTS PLAYED (%) 1 Houdini 4 : 32.1 13.3 340.0 600 56.7% 2 Komodo TCEC : -14.4 13.4 282.0 600 47.0% 3 Stockfish DD : -17.6 13.8 278.0 600 46.3% 9'+3" Time Control # PLAYER : RATING ERROR POINTS PLAYED (%) 1 Stockfish DD : 16.6 14.1 320.5 600 53.4% 2 Houdini 4 : 8.1 13.5 310.0 600 51.7% 3 Komodo TCEC : -24.7 13.4 269.5 600 44.9% 27'+9" Time Control # PLAYER : RATING ERROR POINTS PLAYED (%) 1 Stockfish DD : 22.6 14.1 328.0 600 54.7% 2 Houdini 4 : 5.7 13.4 307.0 600 51.2% 3 Komodo TCEC : -28.3 13.4 265.0 600 44.2% 54'+18" Time Control # PLAYER : RATING ERROR POINTS PLAYED (%) 1 Stockfish DD : 11.4 14.2 314.0 600 52.3% 2 Houdini 4 : 0.4 13.7 300.5 600 50.1% 3 Komodo TCEC : -11.8 13.6 285.5 600 47.6% 90'+30" Time Control # PLAYER : RATING ERROR POINTS PLAYED (%) 1 Stockfish DD : 10.5 12.9 313.0 600 52.2% 2 Komodo TCEC : 0.8 13.2 301.0 600 50.2% 3 Houdini 4 : -11.3 13.1 286.0 600 47.7%
I want to thanks Adam Hair for his help in the presentation of graph and results.
( All the games can be downloaded here: TTC_All_Games )
2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
It is a bit more straightforward to analyze if the rating numbers are run not "against the average" (errors are misleading, particularly for a low number of participants) but run against Houdini as reference.Laskos wrote:2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
Code: Select all
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Houdini 4 : 0.0 ---- 340.0 600 56.7%
2 Komodo TCEC : -46.5 19.5 282.0 600 47.0%
3 Stockfish DD : -49.7 19.5 278.0 600 46.3%
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Stockfish DD : 8.5 19.6 320.5 600 53.4%
2 Houdini 4 : 0.0 ---- 310.0 600 51.7%
3 Komodo TCEC : -32.8 19.6 269.5 600 44.9%
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Stockfish DD : 17.0 19.6 328.0 600 54.7%
2 Houdini 4 : 0.0 ---- 307.0 600 51.2%
3 Komodo TCEC : -33.9 19.7 265.0 600 44.2%
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Stockfish DD : 11.0 19.7 314.0 600 52.3%
2 Houdini 4 : 0.0 ---- 300.5 600 50.1%
3 Komodo TCEC : -12.2 19.6 285.5 600 47.6%
# PLAYER : RATING ERROR POINTS PLAYED (%)
1 Stockfish DD : 21.7 19.4 313.0 600 52.2%
2 Komodo TCEC : 12.1 19.8 301.0 600 50.2%
3 Houdini 4 : 0.0 ---- 286.0 600 47.7%
Well you are wrong as usual. Even if you group it 2 by 2, 1SD is 10Elo, and quoting smaller numbers doesn't help since comparison is of 3 engines not 2 so 2SD is simply 30Elo, or if it suites you 28EloLaskos wrote:2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless.
As is so usual with you, you cover your crass mistake by misleading statements. You were talking about the errors in the engines' ratings as being 30 Elo points 2SD against the average. EloStat gives simply mine 20 points 2SD against the average, and BayesElo 15 points 2SD. Miguel fixed Houdini's rating, and got 20 points almost 2SD in simulations, which can be used directly for comparison. So, Milos, shut up when you are wrong (most of the times), and don't mislead people here with your incorrect statements resulting in "plot is meaningless".Milos wrote:Well you are wrong as usual. Even if you group it 2 by 2, 1SD is 10Elo, and quoting smaller numbers doesn't help since comparison is of 3 engines not 2 so 2SD is simply 30Elo, or if it suites you 28EloLaskos wrote:2SD are here ~20 Elo points, 1SD 10 Elo points and 84% confidence on one tail. Besides that, the points can be grouped 2 by 2, with 7 Elo points 1 SD. So the curves are fairly relevant.Milos wrote:
One small problem, the 2SD Elo margins are 30Elo so all the results you presented are pretty much meaningless..
Ordo is crap as usual and 90% confidence rate is nowhere near 2SD. Maybe you should check basic statistics before you post stupid stuff.Laskos wrote:Miguel fixed Houdini's rating, and got 20 points almost 2SD in simulations, which can be used directly for comparison. So, Milos, shut up when you are wrong (most of the times), and don't mislead people here with your incorrect statements resulting in "plot is meaningless".
90% confidence, is 90% confidence, regardless of how many SD or even what type of distribution we talk about. The point is, 90% is not meaningless.Milos wrote:Ordo is crap as usual and 90% confidence rate is nowhere near 2SD. Maybe you should check basic statistics before you post stupid stuff.Laskos wrote:Miguel fixed Houdini's rating, and got 20 points almost 2SD in simulations, which can be used directly for comparison. So, Milos, shut up when you are wrong (most of the times), and don't mislead people here with your incorrect statements resulting in "plot is meaningless".
We are talking about SF. The fact is the statement that SF scales better than Houdini is not a joke, which makes us believe that what we saw in TCEC was not a fluke.And if anyone is misleading that are couple of you here fans of Komodo...