Top Three Engines Are Essentially Equal Strength!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Top Three Engines Are Essentially Equal Strength!

Post by Adam Hair »

Milos wrote:
Leo wrote:I appreciate this statement about more cores and powerful hardware. I highly suspected there would be a bigger elo difference. Stockfish defeated Houdini in the TCEC by a score of 13-7 with 80 draws. That must be quite an elo gain over Houdini in that match. I respect Dan Corbits opinion and Marks but I no longer respect Milos. There was no need to call me mentally challenged. Some people are gentleman here and some are not.
Sorry but you are only making wild claims which are in conflict with basic logic. When Larry does that it's quite obvious he's just trying to increase sales of his engine and earn more profit. In your case I don't know the motivation but I suspect it is just simple ignorance.
If engine A has k Elo more than engine B at time control T. It is absolutely impossible for Elo difference between engines A and B at time control >>T to be larger than k. You will not find a single case ever.
And btw. 13-7 with 80 draws is exactly "a whopping" 21Elo difference (with 29 Elo margins) which is absolutely trivial to calculate in one's head.
:lol: :lol: :lol:

Arrogant, unpleasant, and full of hot air. You just countered a "wild" claim with a claim that you can not support. Larry's claim may not be true. I have not seen any definitive proof. But I have definitely seen evidence that proves your statement false.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Top Three Engines Are Essentially Equal Strength!

Post by Milos »

Adam Hair wrote: :lol: :lol: :lol:

Arrogant, unpleasant, and full of hot air. You just countered a "wild" claim with a claim that you can not support. Larry's claim may not be true. I have not seen any definitive proof. But I have definitely seen evidence that proves your statement false.
Great. Btw. congratulation you just demonstrated that you have not a smallest clue about scientific method. You just laughed at me that I can not prove negative. Well mister smartass, negative cannot be proven, only disproven.
So please quote me one, and only one would be ofc enough case where "change of sign" happens in head-to-head match of 2 engines in different TC's and that it is not within error margins. Out of all engines and all testers, there must be one case.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Top Three Engines Are Essentially Equal Strength!

Post by Adam Hair »

Milos wrote:
Adam Hair wrote: :lol: :lol: :lol:

Arrogant, unpleasant, and full of hot air. You just countered a "wild" claim with a claim that you can not support. Larry's claim may not be true. I have not seen any definitive proof. But I have definitely seen evidence that proves your statement false.
Great. Btw. congratulation you just demonstrated that you have not a smallest clue about scientific method. You just laughed at me that I can not prove negative. Well mister smartass, negative cannot be proven, only disproven.
I was laughing at your response, which was typical Milos. Your arrogance causes you to make statements that I think you are too smart to be making.

You stated that it was absolutely impossible for the Elo difference to increase when the time control substantially increases. That is a blatant disregard for the scientific method, for you dismiss the possibility of a counter example.
So please quote me one, and only one would be ofc enough case where "change of sign" happens in head-to-head match of 2 engines in different TC's and that it is not within error margins. Out of all engines and all testers, there must be one case.
There are some engines, most prominently Zappa, that perform worse than expected at very fast time controls.

If you want to exclude hyper-bullet time controls for consideration, that is fine by me. I think that a less extreme version of your statement is generally true at longer time controls.
BrendanJNorman
Posts: 2559
Joined: Mon Feb 08, 2016 12:43 am
Full name: Brendan J Norman

Re: Top Three Engines Are Essentially Equal Strength!

Post by BrendanJNorman »

corres wrote:The smaller differences are in Elo the greater effect of the opening book is to the result. An unbiased opening book is needed for a really fair competition.
But where is this opening book?
I think you've got a really good point, Robert.

I really think that the only way to test engines in a very clean way is to test via test sets which the engines play against all opponents with both colors an even number of times.

Perhaps a mixture of sharp and strategic test positions from the Sicilian Najdorf and Semi Slav (for sharp) and Ruy Lopez, French and perhaps Nimzo Indian for strategic.

All external and internal opening books switched off.

I think if you can't eliminate all variables besides the engine's raw code (which is doing ALL the heavy lifting), then it's difficult to say one engine is "stronger" than another which is say 30 points weaker on Graham's prestigious list :)

You guys know more than me about this anyway, I'm just a crazy tweaker who uses weak engines for training.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Top Three Engines Are Essentially Equal Strength!

Post by Laskos »

Adam Hair wrote:
Milos wrote:
Adam Hair wrote: :lol: :lol: :lol:

Arrogant, unpleasant, and full of hot air. You just countered a "wild" claim with a claim that you can not support. Larry's claim may not be true. I have not seen any definitive proof. But I have definitely seen evidence that proves your statement false.
Great. Btw. congratulation you just demonstrated that you have not a smallest clue about scientific method. You just laughed at me that I can not prove negative. Well mister smartass, negative cannot be proven, only disproven.
I was laughing at your response, which was typical Milos. Your arrogance causes you to make statements that I think you are too smart to be making.

You stated that it was absolutely impossible for the Elo difference to increase when the time control substantially increases. That is a blatant disregard for the scientific method, for you dismiss the possibility of a counter example.
So please quote me one, and only one would be ofc enough case where "change of sign" happens in head-to-head match of 2 engines in different TC's and that it is not within error margins. Out of all engines and all testers, there must be one case.
There are some engines, most prominently Zappa, that perform worse than expected at very fast time controls.

If you want to exclude hyper-bullet time controls for consideration, that is fine by me. I think that a less extreme version of your statement is generally true at longer time controls.
Well, I quickly picked up some old differently scaling engines to show such an effect at hyper-bullet. Stockfish DD against Houdini 1.5a:

50ms/move:

Code: Select all

Games Completed = 2000 of 2000 (Avg game length = 6.056 sec)
Settings = Gauntlet/32MB/50ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 3244 sec elapsed, 0 sec remaining
 1.  Stockfish DD 64 SSE4.2   1011.5/2000	744-721-535  	(L: m=0 t=0 i=0 a=721)	(D: r=287 i=53 f=41 s=3 a=151)	(tpm=55.1 d=14.37 nps=2041982)
 2.  Houdini 1.5a x64          988.5/2000	721-744-535  	(L: m=1 t=0 i=0 a=743)	(D: r=287 i=53 f=41 s=3 a=151)	(tpm=45.2 d=11.42 nps=2948782)
Score of SF DD: 50.6% (4 Elo points difference).

200ms/move:

Code: Select all

Games Completed = 2000 of 2000 (Avg game length = 26.064 sec)
Settings = Gauntlet/32MB/200ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 13258 sec elapsed, 0 sec remaining
 1.  Stockfish DD 64 SSE4.2   1089.5/2000	765-586-649  	(L: m=0 t=0 i=0 a=586)	(D: r=355 i=44 f=43 s=10 a=197)	(tpm=208.9 d=17.63 nps=1949095)
 2.  Houdini 1.5a x64          910.5/2000	586-765-649  	(L: m=2 t=0 i=0 a=763)	(D: r=355 i=44 f=43 s=10 a=197)	(tpm=198.1 d=13.71 nps=2715919)
Score of SF DD: 54.5% (31 Elo points difference)

The enhancement of Elo difference with time control is outside 95% confidence interval. Moreover, Stockfish DD was beating even Houdini 4 at LTC, therefore is at least 50 Elo points stronger than Houdini 1.5a at LTC, so the enhancement continues.

Generally, a handwaving argument would be that taking the draw rate as the simplest increasing function of time control (therefore WinRate + LossRate decreasing):
Image

And WinRate/LossRate as pretty constant (regular case) or increasing (well scaling):
Image

We get, what is not obvious, is that having W+L and W/L, with Score = W + (1-W-L)/2, the plot for the Score (Elo) as a function of time can be non-trivial:
Image

Well scaling engine can have an increasing with time control stretch of performances. Sure, with very very long time controls, probably all will converge towards lowering Elo differences, as draw rate becomes very high. And engines can have more complicated behaviors, not like here, simplistically separated in "regular" and "well scaling".
Leo
Posts: 1082
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: Top Three Engines Are Essentially Equal Strength!

Post by Leo »

Very nice effort. I don't understand some of it but I am sure others will.
Advanced Micro Devices fan.
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: Top Three Engines Are Essentially Equal Strength!

Post by yurikvelo »

Leo wrote:I think on powerful hardware like that used in the TCEC the difference is greater.
Some claim that longer TC favours Komodo more than Stockfish.
Since at this point SF is over Komodo, longer TC decrease distance (it would increase if Komodo at this point was stronger than SF in CCRL conditions)

Also longer TC increase drawrate (under the same conditions).
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Top Three Engines Are Essentially Equal Strength!

Post by Milos »

Laskos wrote:Well, I quickly picked up some old differently scaling engines to show such an effect at hyper-bullet. Stockfish DD against Houdini 1.5a:
Nice analysis, especially the last graph. However, first decade of it is meaningless considering it takes 50ms per move data.
This is not even hyperbullet that's just a test which engine has less overhead per move in time manager and search init and which engine is quicker in getting resources from the CPU. If it was selftest, I'd say fine, but not for different engines.
95% confidence interval is certainly not 20Elo as you'd get from the basic equation.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Top Three Engines Are Essentially Equal Strength!

Post by Laskos »

Milos wrote:
Laskos wrote:Well, I quickly picked up some old differently scaling engines to show such an effect at hyper-bullet. Stockfish DD against Houdini 1.5a:
Nice analysis, especially the last graph. However, first decade of it is meaningless considering it takes 50ms per move data.
This is not even hyperbullet that's just a test which engine has less overhead per move in time manager and search init and which engine is quicker in getting resources from the CPU. If it was selftest, I'd say fine, but not for different engines.
95% confidence interval is certainly not 20Elo as you'd get from the basic equation.
Well, if you are not satisfied with ultra-bullet, Stockfish DD was even stronger at LTC compared to Houdini 1.5a than at 40/4', and that would be a satisfiable to you example. Each individual error 2SD is 13 ELO points, for the difference 18 Elo points. The difference itself is 27 Elo points, larger than that. So, there is >97.5% confidence SF DD improved comparatively to Houdini 1.5a.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Top Three Engines Are Essentially Equal Strength!

Post by Milos »

Laskos wrote:Well, if you are not satisfied with ultra-bullet, Stockfish DD was even stronger at LTC compared to Houdini 1.5a than at 40/4', and that would be a satisfiable to you example. Each individual error 2SD is 13 ELO points, for the difference 18 Elo points. The difference itself is 27 Elo points, larger tan that. So, there is >97.5% confidence SF DD improved comparatively to Houdini 1.5a.
Could you please post the bolded part?
Or at least the Elo difference at something like 2s per move (or longer) since at 200ms per move we have the difference to be 31Elo.