Komodo 10.4 vs Houdini 5

Milos · Post by **Milos** » Mon Mar 20, 2017 12:22 am

mjlef wrote:BTW, the 4 elo lead you mention does not mean "luck". 4 elo is roughly 1 Standard Deviation of the error margin. You can see on the Ipon list a column marked "CFS(next)". The error margins and results are used to determine a confidence that one program is stronger than the one below it in the list. It shows 70% for Komodo 10.4 right now. This is not proof, just the likelihood that Komodo is stronger with these settings. More games will raise CFS. If you are interested in how this works, you can google it or study the Ordo source code.

Funny how you chose this condescending approach suggesting me to google what "CFS" is

.
That's like me suggesting you to go to chessprogramming wiki to inform yourself how to program a chess engine.
First, there is no such thing as CFS. Actual term is LOS and LOS does not depend on error margin or what not, but only on number of wins and losses (draws are also irrelevant). Your mentioning of error margins in context of LOS is a clear indication to me that you have very limited knowledge of general and also statistics used in chess.
In practice any LOS below 95-98% is insignificant. 70% LOS for K vs H is meaningless (as I said just random) and you mentioning it as some kind of indication is nothing but an additional illustration of you having not much clue about it.

kranium · Post by **kranium** » Mon Mar 20, 2017 12:34 am

mjlef wrote: Basically, Ingo uses sound statistical methods.

Yes he loads the PGN file into Ordo...

Mark, basically IPON is a (private) Komodo beta tester with a public face.
It makes sense you would jump to defend every aspect of it, thus one has to take your statements with a grain of salt (and a shot of vodka).

My question:
What is the estimated value of all the betas, dev versions, commercial releases, etc. that you send to Ingo for free every year?

leavenfish · Post by **leavenfish** » Mon Mar 20, 2017 1:57 am

I've tried the same on 365 CPU's and....

...just kidding of course.

One can of course argue that multi-core's/CPU's can produce different effects, but really, most everyone have just 1 cpu with multiple cores they could use so really...it's only how engines fare on such normal rigs that really matters.

David Xu · Post by **David Xu** » Mon Mar 20, 2017 1:58 am

In my experience, the best thing to do when faced with baseless accusations is to wait for the truth to come out. There's no need to validate the claims of random attention-seekers by engaging with them. Some words to live by:

“Never argue with an idiot. They will only bring you down to their level and beat you with experience.” -George Carlin

mjlef · Post by **mjlef** » Mon Mar 20, 2017 4:06 am

Sorry, but it is not meaningless. It means a 70% chance one program is stronger than the other. I do agree it is not proof, nor would any number of standard deviations be absolute proof. How many sigma is used varies based on opinion and the scientific field.

You seem to have a chip on your shoulder. This is a site to educate and inform. When someone tells you something you already know it is not a put down or condescending. But you keep saying inaccurate things like this is random. You reach conclusions without sufficient data. That is sloppy science.

mjlef · Post by **mjlef** » Mon Mar 20, 2017 4:16 am

kranium wrote:
mjlef wrote: Basically, Ingo uses sound statistical methods.
Yes he loads the PGN file into Ordo...

Mark, basically IPON is a (private) Komodo beta tester with a public face.
It makes sense you would jump to defend every aspect of it, thus one has to take your statements with a grain of salt (and a shot of vodka).

My question:
What is the estimated value of all the betas, dev versions, commercial releases, etc. that you send to Ingo for free every year?

Sorry you are spouting nonsense.

I actually do not send him many versions of Komodo. He has limited resources and he never even tested Komodo 10.3 because of that.

Ingo is not a beta tester at all for Komodo. Sometimes he is able to work in a development version for testing, but it is normally a release candidate which we send to him a few days before a release in case he has time to run it. Ingo decides what he wants to test. We are happy when he can run Komodo games, and understand when he cannot.

The only feedback we have ever gotten from Ingo is his test run results, just as others get. And we send the same things to other test groups as well, although most want to wait for an official release to begin testing.

I have no need to defend Ingo. His reputation is immaculate.

As for "loads the pgn into Ordo" you do not seem to understand the need for testing against a wide range of opponents with a huge number of games. Ingo does which is why his rating list is respected so much.

Please stop making up conspiracies which do not exist. They are just lies and things you make up. This should be a civil site where statements are made based on facts and not imaginings.

mjlef · Post by **mjlef** » Mon Mar 20, 2017 4:17 am

David Xu wrote:In my experience, the best thing to do when faced with baseless accusations is to wait for the truth to come out. There's no need to validate the claims of random attention-seekers by engaging with them. Some words to live by:

“Never argue with an idiot. They will only bring you down to their level and beat you with experience.” -George Carlin

You are right. I should know better. Thanks for the reminder.

lkaufman · Post by **lkaufman** » Mon Mar 20, 2017 4:20 am

kranium wrote:
mjlef wrote: Basically, Ingo uses sound statistical methods.
Yes he loads the PGN file into Ordo...

Mark, basically IPON is a (private) Komodo beta tester with a public face.
It makes sense you would jump to defend every aspect of it, thus one has to take your statements with a grain of salt (and a shot of vodka).

My question:
What is the estimated value of all the betas, dev versions, commercial releases, etc. that you send to Ingo for free every year?

We send free copies of all commercial releases to all the non-bullet rating lists (as far as I know), including CCRL, CEGT, IPON, FastGM, and until it stopped Frank's list. IPON doesn't even rate us all the time, skipping Komodo 10.3 for example. We do occasionally send dev versions to any or all of the above who express an interest in testing them, but IPON doesn't run many such tests for us per year. Of course they all get free SF dev versions whenever they want them, and I suppose Houdini gives them free copies at least of releases. I think we are doing the right thing.
There is a slight bias for Komodo in my testing because I test on the same hardware that we develop on, and use the same (huge) books and similar type time controls. That is probably why my results are usually a few elo better than the average of other people. The bias is pretty small but not zero. In particular, we have no results on AMD hardware.

lkaufman · Post by **lkaufman** » Mon Mar 20, 2017 4:51 am

lkaufman wrote:
kranium wrote:
mjlef wrote: Basically, Ingo uses sound statistical methods.
Yes he loads the PGN file into Ordo...

Mark, basically IPON is a (private) Komodo beta tester with a public face.
It makes sense you would jump to defend every aspect of it, thus one has to take your statements with a grain of salt (and a shot of vodka).

My question:
What is the estimated value of all the betas, dev versions, commercial releases, etc. that you send to Ingo for free every year?
We send free copies of all commercial releases to all the non-bullet rating lists (as far as I know), including CCRL, CEGT, IPON, FastGM, and until it stopped Frank's list. IPON doesn't even rate us all the time, skipping Komodo 10.3 for example. We do occasionally send dev versions to any or all of the above who express an interest in testing them, but IPON doesn't run many such tests for us per year. Of course they all get free SF dev versions whenever they want them, and I suppose Houdini gives them free copies at least of releases. I think we are doing the right thing.
There is a slight bias for Komodo in my testing because I test on the same hardware that we develop on, and use the same (huge) books and similar type time controls. That is probably why my results are usually a few elo better than the average of other people. The bias is pretty small but not zero. In particular, we have no results on AMD hardware.

I forgot to include "ipmanchess", another list we send free copies to, which I consider a totally honest list even though I disagree with some policies (lack of any increment, too many very similar versions of engines). Also I should not have included book as a source of bias, since on the Windows tests we use a different tester and books than we do when developing in Linux. I use "Little Blitzer" on Windows so at least our own tester cannot be blamed for any bias.

Milos · Post by **Milos** » Mon Mar 20, 2017 6:57 am

mjlef wrote:Sorry, but it is not meaningless. It means a 70% chance one program is stronger than the other. I do agree it is not proof, nor would any number of standard deviations be absolute proof. How many sigma is used varies based on opinion and the scientific field.

Ok, it is now obvious that you have no clue what you are talking about.
CFS or whatever bullshit name Miguel gave it doesn't really represent "chance one program is stronger than the other". It is just a gaussian cdf of rating_difference/sigma. It assumes the other program rating is fixed and represents a probability that first program rating will not be lower than the second one. And not surprisingly as some other things in Ordo, that one is also wrong (after all Miguel is just a chemist and not really a statistician), because it assumes one-tailed (directional) hypothesis, while in reality we have a case of two-tailed (non-directional) hypothesis which results in smaller confidence value.

And even without going into CFS BS, you assuming that 0.4 sigma difference means something just further demonstrates how little you understand. 0.4 sigma difference is roughly equivalent to Elo difference of 14 after only 40 games between 2 engines (with 60% draw rate). Ofc if someone told you that Elo difference between 2 engines is 14 after only 40 games you'd immediately claim it's not sufficient to draw any conclusion

.

Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5

Re: Komodo 10.4 vs Houdini 5