Adam Hair wrote:Milos wrote:
Great. Btw. congratulation you just demonstrated that you have not a smallest clue about scientific method. You just laughed at me that I can not prove negative. Well mister smartass,
negative cannot be proven, only disproven.
I was laughing at your response, which was typical Milos. Your arrogance causes you to make statements that I think you are too smart to be making.
You stated that it was
absolutely impossible for the Elo difference to increase when the time control substantially increases. That is a blatant disregard for the scientific method, for you dismiss the possibility of a counter example.
So please quote me one, and only one would be ofc enough case where "change of sign" happens in head-to-head match of 2 engines in different TC's and that it is not within error margins. Out of all engines and all testers, there must be one case.
There are some engines, most prominently Zappa, that perform worse than expected at very fast time controls.
If you want to exclude hyper-bullet time controls for consideration, that is fine by me. I think that a less extreme version of your statement is generally true at longer time controls.
Well, I quickly picked up some old differently scaling engines to show such an effect at hyper-bullet. Stockfish DD against Houdini 1.5a:
50ms/move:
Code: Select all
Games Completed = 2000 of 2000 (Avg game length = 6.056 sec)
Settings = Gauntlet/32MB/50ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 3244 sec elapsed, 0 sec remaining
1. Stockfish DD 64 SSE4.2 1011.5/2000 744-721-535 (L: m=0 t=0 i=0 a=721) (D: r=287 i=53 f=41 s=3 a=151) (tpm=55.1 d=14.37 nps=2041982)
2. Houdini 1.5a x64 988.5/2000 721-744-535 (L: m=1 t=0 i=0 a=743) (D: r=287 i=53 f=41 s=3 a=151) (tpm=45.2 d=11.42 nps=2948782)
Score of SF DD:
50.6% (4 Elo points difference).
200ms/move:
Code: Select all
Games Completed = 2000 of 2000 (Avg game length = 26.064 sec)
Settings = Gauntlet/32MB/200ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 13258 sec elapsed, 0 sec remaining
1. Stockfish DD 64 SSE4.2 1089.5/2000 765-586-649 (L: m=0 t=0 i=0 a=586) (D: r=355 i=44 f=43 s=10 a=197) (tpm=208.9 d=17.63 nps=1949095)
2. Houdini 1.5a x64 910.5/2000 586-765-649 (L: m=2 t=0 i=0 a=763) (D: r=355 i=44 f=43 s=10 a=197) (tpm=198.1 d=13.71 nps=2715919)
Score of SF DD:
54.5% (31 Elo points difference)
The enhancement of Elo difference with time control is outside 95% confidence interval. Moreover, Stockfish DD was beating even Houdini 4 at LTC, therefore is at least 50 Elo points stronger than Houdini 1.5a at LTC, so the enhancement continues.
Generally, a handwaving argument would be that taking the draw rate as the simplest increasing function of time control (therefore WinRate + LossRate decreasing):
And WinRate/LossRate as pretty constant (regular case) or increasing (well scaling):
We get, what is not obvious, is that having W+L and W/L, with Score = W + (1-W-L)/2, the plot for the Score (Elo) as a function of time can be non-trivial:
Well scaling engine can have an increasing with time control stretch of performances. Sure, with very very long time controls, probably all will converge towards lowering Elo differences, as draw rate becomes very high. And engines can have more complicated behaviors, not like here, simplistically separated in "regular" and "well scaling".