What happend to TCEC?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Dann Corbit
Posts: 12542
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: What happend to TCEC?

Post by Dann Corbit »

hgm wrote:
Dann Corbit wrote:Well then, the Stockfish testing people can rejoice because now instead of running thousands of tests, they can stop at 7.
Stupid, stupid, stupid... The Stockfish testing people want to reject 3-Elo regressions with confidence, not 300-Elo regressions. Hundred times smaller error bars requires 10,000 times as many games, because the error bars decrease as the square root of the number of games.
You seem to misunderstand the difference between an observation and the probability of that observation. But that is OK with me.
Just because it is unlikely does not mean you won't see it.
I really must advise you to educate yourself in the most elementary aspects of statistics, as anything you say here makes you look more and more ignorant....
Yes, you rely on 7 games to make your determination, while I have 2783 games to make my determination.

Maybe you are right. Maybe since TCEC 9 Jonny has not only picked up hundreds of Elo, with the others like Stockfish and Komodo not gaining any in the same time frame so that Jonny has caught up, but also Jonny has overturned Amdahl's law too.

I genuflect to your brilliance sir. Those seven games fill you with such confidence that I know you must really know what you are talking about and 7 is plenty, even the the 200 Elo error bars to know that for certain you are right.

No wonder you have so much confidence.

I guess maybe you are having a guilt complex over your role in the ICGA mess, but I think you should forgive yourself and just forget it.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
hgm
Posts: 27817
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: What happend to TCEC?

Post by hgm »

Dann Corbit wrote:Yes, you rely on 7 games to make your determination, while I have 2783 games to make my determination.
You played the WCCC version of Johnny 2783 times on 2400 cores against Komodo? I don't think so. You did 2783 games to determine something different.
Maybe you are right. Maybe since TCEC 9 Jonny has not only picked up hundreds of Elo, with the others like Stockfish and Komodo not gaining any in the same time frame so that Jonny has caught up, but also Jonny has overturned Amdahl's law too.
Note that this is not my claim. I just point out that the factual WCCC result rules out the 300-Elo hypothesis with > 95% confidence. As you so obsessively pointed out, flukes do happen, and this could be a fluke. But the likelihood of that is <5% (100% - confidence). That is an objective mathematical fact, not an opinion.

As to Amdahls law, I already pointed out why that doesn't apply to this case. Did you not read that, not agree with it, or just not understand it?
I genuflect to your brilliance sir. Those seven games fill you with such confidence that I know you must really know what you are talking about and 7 is plenty, even the the 200 Elo error bars to know that for certain you are right.
Is it so hard to understand for you that 200-Elo error bars deny a 300-Elo difference, that you think it requires brilliance?
I guess maybe you are having a guilt complex over your role in the ICGA mess, but I think you should forgive yourself and just forget it.
So what exactly was "my role in the ICGA mess"?

You'd better come up with facts to substantiate that remark, or apologize for it. At least, I don't suppose you would like to cultivate a reputation as a viscious liar that throws around fabricated accusations to discredit people that he happens to disagree with...
Dann Corbit
Posts: 12542
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: What happend to TCEC?

Post by Dann Corbit »

hgm wrote:
Dann Corbit wrote:Yes, you rely on 7 games to make your determination, while I have 2783 games to make my determination.
You played the WCCC version of Johnny 2783 times on 2400 cores against Komodo? I don't think so. You did 2783 games to determine something different.
Maybe you are right. Maybe since TCEC 9 Jonny has not only picked up hundreds of Elo, with the others like Stockfish and Komodo not gaining any in the same time frame so that Jonny has caught up, but also Jonny has overturned Amdahl's law too.
Note that this is not my claim. I just point out that the factual WCCC result rules out the 300-Elo hypothesis with > 95% confidence. As you so obsessively pointed out, flukes do happen, and this could be a fluke. But the likelihood of that is <5% (100% - confidence). That is an objective mathematical fact, not an opinion.

As to Amdahls law, I already pointed out why that doesn't apply to this case. Did you not read that, not agree with it, or just not understand it?
I genuflect to your brilliance sir. Those seven games fill you with such confidence that I know you must really know what you are talking about and 7 is plenty, even the the 200 Elo error bars to know that for certain you are right.
Is it so hard to understand for you that 200-Elo error bars deny a 300-Elo difference, that you think it requires brilliance?
I guess maybe you are having a guilt complex over your role in the ICGA mess, but I think you should forgive yourself and just forget it.
So what exactly was "my role in the ICGA mess"?

You'd better come up with facts to substantiate that remark, or apologize for it. At least, I don't suppose you would like to cultivate a reputation as a viscious liar that throws around fabricated accusations to discredit people that he happens to disagree with...
Are you a member of the ICGA?
Have you defended the actions of the ICGA?
If I were a member of the Mafia, I would feel shame.
If I were a member of the Crips or the Bloods I would feel shame.

I have heard that psychopaths do not feel shame.

You have a total of 7 games evidence for your assumptions about strength.

If I threw a penny 7 times and got 7 heads or 7 tails. I would not assume that the coin was biased. I would assume that I need a lot more trials to know the answer with any degree of certainty.

P.S.
You are a moderator. Feel free to ban me at any time.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
hgm
Posts: 27817
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: What happend to TCEC?

Post by hgm »

Yes, I am a member of the ICGA, to participate in the Olympiad and play Chinese Chess and Shogi against my Taiwanese and Japanese friends. I see no shame in that. To maintain that this "involves me in the ICGA mess" is just insane.

If you want to equate people to criminals just because they are a member, then your earlier remark about ICGA was indeed libelous.

I have declared on several occasions that I think the ICGA handled the Rybka affair in a stupid way, not at all like I would have done it. My way would not have involved any decompilation, but would likely have ended just as bad for Vas, as he obviously was not willing to cooperate in any investigation.

You keep twisting the truth. I made no assumption about strength at all. I just pointed out that the facts contradict your assumption (on things you have no first-hand knowledge of) with high confidence.
If I threw a penny 7 times and got 7 heads or 7 tails. I would not assume that the coin was biased. I would assume that I need a lot more trials to know the answer with any degree of certainty.
And that would be a mistake, as hard math shows... You would have a better chance of survival jumping in front of a train than you would have to win a bet on the outcome of a flip with that coin.
Henk
Posts: 7220
Joined: Mon May 27, 2013 10:31 am

Re: What happend to TCEC?

Post by Henk »

I still think that people publishing open source code are communists. So I don't want to be associated with that group.

I also think that ICGA should stay strict in only let play engines that are original.
Dann Corbit
Posts: 12542
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: What happend to TCEC?

Post by Dann Corbit »

hgm wrote:Yes, I am a member of the ICGA, to participate in the Olympiad and play Chinese Chess and Shogi against my Taiwanese and Japanese friends. I see no shame in that. To maintain that this "involves me in the ICGA mess" is just insane.

If you want to equate people to criminals just because they are a member, then your earlier remark about ICGA was indeed libelous.

I have declared on several occasions that I think the ICGA handled the Rybka affair in a stupid way, not at all like I would have done it. My way would not have involved any decompilation, but would likely have ended just as bad for Vas, as he obviously was not willing to cooperate in any investigation.

You keep twisting the truth. I made no assumption about strength at all. I just pointed out that the facts contradict your assumption (on things you have no first-hand knowledge of) with high confidence.
If I threw a penny 7 times and got 7 heads or 7 tails. I would not assume that the coin was biased. I would assume that I need a lot more trials to know the answer with any degree of certainty.
And that would be a mistake, as hard math shows... You would have a better chance of survival jumping in front of a train than you would have to win a bet on the outcome of a flip with that coin.
I misremembered you as supporting the ICGA's findings.
Therefore, I do apologize for that.

Until such time as they should apologize for the Rybka affair, I would not join the ICGA.

Merely being a member and disapproving of their handling of the WCCC Rybka tangle does not render you guilty in the way that I had imagined.

I think we see data in a fundamentally different way. It is OK to see things differently.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12542
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: What happend to TCEC?

Post by Dann Corbit »

Henk wrote:I still think that people publishing open source code are communists. So I don't want to be associated with that group.

I also think that ICGA should stay strict in only let play engines that are original.
Maybe that explains the exponential drop in the number of entrants.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
MonteCarlo
Posts: 188
Joined: Sun Dec 25, 2016 4:59 pm

Re: What happend to TCEC?

Post by MonteCarlo »

I may very well regret jumping in here, since things seem a bit tense, but here goes :D

First, the talk of probability is interesting, as it tends to be. To be fair, though, in the coin case we're leaving out some very important information: the prior.

If we knew that unbiased coins outnumbered biased coins 10000 to 1 (and in the real world, I'd wager most of the coins we've flipped have exhibited fair-ish behavior, so that intuition will be strong), then Dann's desire to run more tests even after 7 consecutive tails or heads is quite justified.

In such a case, it's more likely that you got an unusual result with a usual coin than a usual result with an unusual coin.

If we assume a uniform prior, then 7 consecutive heads/tails carries more weight, although I certainly wouldn't want to bet my life on the coin's being biased even then :)

On the Jonny bit, I think things have gotten a bit exuberant.

The results from 60-core Komodo vs 2000+ core Jonny do suggest that the difference in rating between those two entities is less than 300 rating points.

Could it be an odd 7 games? Of course, and more games would be wonderful, but the games provide enough evidence that if I could place an over/under bet on a score of 85% for Komodo in a longer match between 60-core K and 2000+ core J, I would gladly bet the under :)

Does this mean that Jonny is a super engine stronger than K and SF? Of course not, and I don't think anyone's claimed that in the thread (to be fair, I haven't gone back all the way to page 1), although some have taken the time to refute this mythical claim, but I'm not sure why.

Does this provide good evidence that Jonny has found some magically scaling solution to using high core counts for chess?

Equally obviously a no, although some of the arguments/claims that there simply cannot by definition be any more than 5-10 rating points gained by the unknown techniques used by Jonny strike me as bizarre at best.

I would think that we would need to know the techniques used and run some tests to determine that (or are we back to the days of claims like "lazy SMP can't work, by definition!"? Actually, probably should leave that can of worms alone too :) ).

As long as those of us who haven't talked to the author remain in the dark on what's been implemented, I feel like some degree of suspension of judgment is warranted.

Maybe I'm weird, but I don't like making bold claims about things about which I have zero details :)

At any rate, as I said at the outset, I'm sure I'll regret jumping in, but hey, you only live once, eh? :)
Dann Corbit
Posts: 12542
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: What happend to TCEC?

Post by Dann Corbit »

I pretty much agree with everything you say.

I guess the strength difference for Jonny and Komodo is about:
(3402-3157)=245 +/- 25 Elo.

Closer to 250 than to 300.

The original claim was that SF, Komodo, and Houdini are much stronger than Jonny.

The refutation was that the evidence shows Jonny is on par (7 draws implies equality).

It's not impossible that Jonny is equally strong, but I do not think the 7 draws is enough evidence to believe that.

I do not think 2400 cores will add 250 Elo to Jonny's strength compared to Komodo or Stockfish at 60 cores. I do not think Jonny has advanced 245 Elo in the past few months, though this is not impossible. I would not be surprised if Jonny has advanced 50 Elo. I do think that Jonny is a wonderful engine written by a talented software engineer. I just don't think it is as strong as Komodo or Stockfish.

All of these sorts of arguments are peripheral arguments to the original point/premise:
The TCEC contest is a better contest than the WCCC contest.

I do not think anyone has enough data for absolute proof of their chosen position.

I do think that neither position can simply be dismissed.
I also think my position (K and SF are stronger) is the more likely.
I can also be wrong.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12542
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: What happend to TCEC?

Post by Dann Corbit »

Here are the results for TCEC 9 {trimmed at Jonny 8}, computed as Elo with error bars. As you can see, there appears to be a couple hundred Elo separating them.

Code: Select all

    Program             Elo    +   -   Games   Score   Av.Op.  Draws 
  1 Komodo 9.42       &#58; 3347  142 158    30    93.3 %   2889   13.3 %
  2 Stockfish 160716  &#58; 3304   40  35   112    69.6 %   3160   60.7 %
  3 Komodo 10         &#58; 3287   65  55    60    75.0 %   3096   50.0 %
  4 Stockfish 110616  &#58; 3287   65  55    60    75.0 %   3096   50.0 %
  5 Houdini 200716    &#58; 3252   37  35   166    72.3 %   3086   50.6 %
  6 Komodo 10.1       &#58; 3250   38  35   112    61.6 %   3167   66.1 %
  7 Komodo 1692.19    &#58; 3244  100  85    54    87.0 %   2913   25.9 %
  8 Stockfish 030916  &#58; 3237   97  82    54    86.1 %   2920   27.8 %
  9 Jonny 7.30        &#58; 3186   54  51    60    61.7 %   3103   63.3 %
 10 Andscacs 0.86189b &#58; 3174   66  65    60    60.0 %   3104   46.7 %
 11 Fire 5            &#58; 3147   26  26   256    60.9 %   3069   61.7 %
 12 Houdini 4         &#58; 3141   60  59    90    64.4 %   3038   35.6 %
 13 Ginkgo 1.9h       &#58; 3109   81  77    54    75.0 %   2918   35.2 %
 14 Stockfish 210516  &#58; 3108  110  88    30    80.0 %   2867   40.0 %
 15 Jonny 8           &#58; 3103   36  36   166    50.6 %   3099   53.0 %
While there is some chance that a revolutionary pondering algorithm gives Jonny a big boost, we would expect to see it at 44 cores to some degree.

Here we see Jonny a couple hundred Elo below the top engines. About what is expected.

While it is true that Jonny will probably have improved since TCEC 9, this is equally true for Komodo and Stockfish (not sure how much effort RH puts into Houdini, so I reserve judgement on that one).

I would expect the CEGT results to be far more accurate, since there are 2700+ games for Jonny in that rating list.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.