nTCEC simulation

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: nTCEC simulation

Post by Laskos »

Milos wrote:
Laskos wrote:
Look again silly, if you cannot do in mind, take a pencil, 1SD error bars are about 30 points in 48 games with 70% draws. Komodo is expected to meet in the Superfinal an opponent which is 20-30 points weaker on average than itself. That is 2/3 to 1 SD, meaning 70% to 84% to win, and here comes your 76%. Got consistency? Now, shut up and stop posting in this thread your absurdities.
Finally.
1SD is indeed around 30Elo and this is exactly the difference between Komodo and Houdini/SF in the final (for other engines LoS is above 95% i.e. Komodo is 60, 100 and more Elo stronger) according to your predictions. 72% of winning chances for Komodo against Houdini/SF (when you remove other engines) is 72% LoS which means slightly less than 4 more wins in 48 games match. With 70% draws this is 9+/34=/5- or 29Elo.
Now let me cite you:
I too take a lot of informed guesses, by the way, the same 10 ELO points at this TC and hardware for Komodo above Houdini.
So now somehow 10 Elo magically became 30 :lol: :lol:
Did you take the pencil? After giving you simulations, explanations, you still insist with your dumb remarks? For SF I took 20 ELO being behind Komodo, and if you were smarter, you had observed that the combined chances of still weaker engines (by 50 or more points) to qualify for Superfinal are non-negligible, and that adds up to the average of 20-30, or, to please you, 20-25 points, close to 0.8 SD, hence your 76% (which you mistakenly assumed to be 180 ELO points difference in 48 games). Are you going to highjack this thread with your silly remarks? Observe how you started with claiming 100 points uncertainties, and now you pick on 5 points "discrepancy" with your wrong assumptions.
Last edited by Laskos on Mon Oct 28, 2013 8:38 pm, edited 2 times in total.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: nTCEC simulation

Post by Don »

Here is an update after the 48th round Bouquet 1.8b vs Gull 2.3 draw:

Code: Select all

Name         Win Odds     Stage 4
---------  ----------  ----------
Komodo         52.627      99.038
Houdini        29.747      96.995
Bouquet         7.994      89.065
Critter         3.986      75.050
Rybka           2.705      69.402
Hiarcs          1.378      51.050
Gull            0.786      51.962
Stockfish       0.663      48.579
Naum            0.114      18.317
Junior          0.000       0.542
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: nTCEC simulation.

Post by Milos »

Ajedrecista wrote:I wanted to say that in this case the stronger engine has a probability of score 52.9% of the points of one game.
I don't understand this sentence. Between 2 engines there is only win/draw/loss probability for a single game and this is trinomial distribution as you correctly noted. What is probability of score in % of points in one game is beyond understanding for me.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: nTCEC simulation

Post by Don »

Laskos wrote:
Milos wrote:
Laskos wrote:
Look again silly, if you cannot do in mind, take a pencil, 1SD error bars are about 30 points in 48 games with 70% draws. Komodo is expected to meet in the Superfinal an opponent which is 20-30 points weaker on average than itself. That is 2/3 to 1 SD, meaning 70% to 84% to win, and here comes your 76%. Got consistency? Now, shut up and stop posting in this thread your absurdities.
Finally.
1SD is indeed around 30Elo and this is exactly the difference between Komodo and Houdini/SF in the final (for other engines LoS is above 95% i.e. Komodo is 60, 100 and more Elo stronger) according to your predictions. 72% of winning chances for Komodo against Houdini/SF (when you remove other engines) is 72% LoS which means slightly less than 4 more wins in 48 games match. With 70% draws this is 9+/34=/5- or 29Elo.
Now let me cite you:
I too take a lot of informed guesses, by the way, the same 10 ELO points at this TC and hardware for Komodo above Houdini.
So now somehow 10 Elo magically became 30 :lol: :lol:
Did you took the pencil? After giving you simulations, explanations, you still insist with your dumb remarks? For SF I took 20 ELO being behind Komodo, and if you were smarter, you had observed that the combined chances of still weaker engines to qualify for Superfinal are non-negligible, and that adds up to the average of 20-30, or, to please you, 20-25 points, close to 0.8 SD, hence your 76% (which you mistakenly assumed to be 180 ELO points difference in 48 games). Are you going to highjack this thread with your silly remarks? Observe how you started with claiming 100 points uncertainties, and now you pick on 5 points "discrepancy" with your wrong assumptions.
In my case I tried to make the simulation as accurate as I possibly could, based on the information we have. I reduced Komodo's ELO as an engineering decisions from the values given to me by Miguel and Adams calculation since it is the version I am most interested in I wanted the result to be conservative. See this post for part of the reason I feel that Komodo is indeed at least slightly superior:

http://talkchess.com/forum/viewtopic.php?t=49829

Note that the version of Komodo playing in TCEC is NOT Komodo 6 but an improved Komodo. It's impressive performance in this seasons TCEC (even beating Houdini and Stockfish) is not the primary factor here since it is based on only a handful of games.

The rating compression I applied (80%) is an attempt to make my simulation more accurate and reflect the reality of super long time controls games, the relative difference between programs generally shrinks with time and I even gave a 40 ELO advantage to white to reflect the fact that white has a much easier go at it.

I also tried to accurately measure the high draw ratio's of long time control games and the fact that as the ELO difference goes up, the chances of a draw decrease.

How good is the simulation? I have no idea. A lot of this was guesswork and supposition but I did it for fun and I think it probably has a lot of relevance, at least in the big picture.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Ajedrecista
Posts: 2214
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: nTCEC simulation.

Post by Ajedrecista »

Hello again:
Milos wrote:
Ajedrecista wrote:I wanted to say that in this case the stronger engine has a probability of score 52.9% of the points of one game.
I don't understand this sentence. Between 2 engines there is only win/draw/loss probability for a single game and this is trinomial distribution as you correctly noted. What is probability of score in % of points in one game is beyond understanding for me.
I did not write in the correct terms. What I really mean is that the stronger engine is expected to score circa 52.9% (plus/minus uncertainties) of the points of the match.

Sadly, this thread has degenerated very quickly from the intention of the original post of bring orientative probabilities of certain events. This paragraph will not be quoted wih lots of luck although I think just the opposite. I do not want to spend more time answering more quotes while there is not intention of bring solutions but only obstacles.

Moderation team, feel free to delete this post if you think it is the best thing to maintsin peace in this forum.

Regards from Spain.

Ajedrecista.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: nTCEC simulation.

Post by Adam Hair »

Milos wrote:
Ajedrecista wrote:I wanted to say that in this case the stronger engine has a probability of score 52.9% of the points of one game.
I don't understand this sentence. Between 2 engines there is only win/draw/loss probability for a single game and this is trinomial distribution as you correctly noted. What is probability of score in % of points in one game is beyond understanding for me.
But you do understand that an engine with a 20 Elo advantage will have a greater than 52.9% chance of winning a 48 game match, right? And that an engine that has a 76% chance of winning a 48 game match need not be 180 Elo, on average :wink:, stronger than the other engines, correct?
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: nTCEC simulation.

Post by Milos »

Adam Hair wrote:But you do understand that an engine with a 20 Elo advantage will have a greater than 52.9% chance of winning a 48 game match, right? And that an engine that has a 76% chance of winning a 48 game match need not be 180 Elo, on average :wink:, stronger than the other engines, correct?
If rating uncertainty is 100Elo, and difference between engines is 20Elo, there are 48 games played with 70% draw rate, what is probability with 95% certainty that engine with higher Elo will win the match.
If you answer me this question (without help from Miguel) we can talk further.
However, I'm pretty sure that you have no clue what I'm talking about, and can't answer this simple question :lol:.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: nTCEC simulation

Post by Laskos »

Don wrote:
Laskos wrote:
Milos wrote:
Laskos wrote:
Look again silly, if you cannot do in mind, take a pencil, 1SD error bars are about 30 points in 48 games with 70% draws. Komodo is expected to meet in the Superfinal an opponent which is 20-30 points weaker on average than itself. That is 2/3 to 1 SD, meaning 70% to 84% to win, and here comes your 76%. Got consistency? Now, shut up and stop posting in this thread your absurdities.
Finally.
1SD is indeed around 30Elo and this is exactly the difference between Komodo and Houdini/SF in the final (for other engines LoS is above 95% i.e. Komodo is 60, 100 and more Elo stronger) according to your predictions. 72% of winning chances for Komodo against Houdini/SF (when you remove other engines) is 72% LoS which means slightly less than 4 more wins in 48 games match. With 70% draws this is 9+/34=/5- or 29Elo.
Now let me cite you:
I too take a lot of informed guesses, by the way, the same 10 ELO points at this TC and hardware for Komodo above Houdini.
So now somehow 10 Elo magically became 30 :lol: :lol:
Did you took the pencil? After giving you simulations, explanations, you still insist with your dumb remarks? For SF I took 20 ELO being behind Komodo, and if you were smarter, you had observed that the combined chances of still weaker engines to qualify for Superfinal are non-negligible, and that adds up to the average of 20-30, or, to please you, 20-25 points, close to 0.8 SD, hence your 76% (which you mistakenly assumed to be 180 ELO points difference in 48 games). Are you going to highjack this thread with your silly remarks? Observe how you started with claiming 100 points uncertainties, and now you pick on 5 points "discrepancy" with your wrong assumptions.
In my case I tried to make the simulation as accurate as I possibly could, based on the information we have. I reduced Komodo's ELO as an engineering decisions from the values given to me by Miguel and Adams calculation since it is the version I am most interested in I wanted the result to be conservative. See this post for part of the reason I feel that Komodo is indeed at least slightly superior:

http://talkchess.com/forum/viewtopic.php?t=49829

Note that the version of Komodo playing in TCEC is NOT Komodo 6 but an improved Komodo. It's impressive performance in this seasons TCEC (even beating Houdini and Stockfish) is not the primary factor here since it is based on only a handful of games.

The rating compression I applied (80%) is an attempt to make my simulation more accurate and reflect the reality of super long time controls games, the relative difference between programs generally shrinks with time and I even gave a 40 ELO advantage to white to reflect the fact that white has a much easier go at it.

I also tried to accurately measure the high draw ratio's of long time control games and the fact that as the ELO difference goes up, the chances of a draw decrease.

How good is the simulation? I have no idea. A lot of this was guesswork and supposition but I did it for fun and I think it probably has a lot of relevance, at least in the big picture.
I did similar things, rating compression of 75% compared to CCRL 40/40, assumed a bit better scaling of Komodo and SF vs. Houdini, of Rybka vs. Critter and Bouquet, Hiarcs vs. Naum and Junior. I also gathered all the info from this forum, including the link you gave me. I don't have a draw model or white advantage, so my estimations are rougher, but it seems that our simulations agree pretty well, coming from completely unrelated approaches. The 3rd Stage is about only 18 games each engine, and we see that luck is still important, even SF has sizable chance to not qualify. 4th stage is 30 games each, and the Superfinal 48 games, we will see how our assumptions work. It's fun, because we really don't know what will happen, in the past two seasons we had a clear favourite.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: nTCEC simulation.

Post by Adam Hair »

Milos wrote:
Adam Hair wrote:But you do understand that an engine with a 20 Elo advantage will have a greater than 52.9% chance of winning a 48 game match, right? And that an engine that has a 76% chance of winning a 48 game match need not be 180 Elo, on average :wink:, stronger than the other engines, correct?
If rating uncertainty is 100Elo, and difference between engines is 20Elo, there are 48 games played with 70% draw rate, what is probability with 95% certainty that engine with higher Elo will win the match.
If you answer me this question (without help from Miguel) we can talk further.
However, I'm pretty sure that you have no clue what I'm talking about, and can't answer this simple question :lol:.
Who do you think you and I are, Johann and Jacob Bernoulli? :lol:

To be honest, Milos, I do not know how to compute the probability that the higher rated player wins the match when the ratings difference is subject to uncertainty. Feel free to enlighten me. Given your reticence to show actual computations and the errors that I have seen you make in the past (such as a mistake in a simple application of Bayes theorem), I have some doubt on whether you know how to.

However, if the ratings difference is truly 20 Elo, then the probability that the higher rated player will win is 72% to 73%.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: nTCEC simulation

Post by Don »

Here is the sim after Hodini beats Rybka in round 50

Code: Select all


Name         Win Odds     Stage 4
---------  ----------  ----------
Komodo         44.791      99.113
Houdini        44.269      99.284
Bouquet         6.320      90.212
Critter         2.001      74.127
Hiarcs          1.160      60.497
Gull            0.551      54.432
Stockfish       0.518      51.339
Rybka           0.290      50.616
Naum            0.099      19.771
Junior          0.000       0.609

Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.