Top Three Engines Are Essentially Equal Strength!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Leo
Posts: 1080
Joined: Fri Sep 16, 2016 6:55 pm
Location: USA/Minnesota
Full name: Leo Anger

Re: Top Three Engines Are Essentially Equal Strength!

Post by Leo »

yurikvelo wrote:
Leo wrote:I think on powerful hardware like that used in the TCEC the difference is greater.
Some claim that longer TC favours Komodo more than Stockfish.
Since at this point SF is over Komodo, longer TC decrease distance (it would increase if Komodo at this point was stronger than SF in CCRL conditions)

Also longer TC increase drawrate (under the same conditions).
I am thinking about the 44 cores allowing the engine to search and prune deeper then another engine a lot more than just longer time controls.
Advanced Micro Devices fan.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Top Three Engines Are Essentially Equal Strength!

Post by Laskos »

Milos wrote:
Laskos wrote:Well, if you are not satisfied with ultra-bullet, Stockfish DD was even stronger at LTC compared to Houdini 1.5a than at 40/4', and that would be a satisfiable to you example. Each individual error 2SD is 13 ELO points, for the difference 18 Elo points. The difference itself is 27 Elo points, larger tan that. So, there is >97.5% confidence SF DD improved comparatively to Houdini 1.5a.
Could you please post the bolded part?
Or at least the Elo difference at something like 2s per move (or longer) since at 200ms per move we have the difference to be 31Elo.
No, just from memory, SF DD at LTC (or super LTC like TCEC), was consistently beating Houdini 4 (maybe 30-40 Elo points difference), and Houdini 4 itself was 50 Elo points stronger at LTC than Houdini 1.5a. But at 40/4' the difference between SF DD and Houdini 1.5a was 50 or so Elo points. Ok, this is no proof of anything, but I remember some pretty sharp differences in scaling of some engines (Critter, Zappa, etc. come to mind, from opposite sides).

PS By the way, it seems an inversion is pretty obvious: Houdini 4 is stronger than Stockfish DD at blitz, weaker at LTC (TCEC). The LTC error margins are large, though.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Top Three Engines Are Essentially Equal Strength!

Post by Milos »

Laskos wrote:No, just from memory, SF DD at LTC (or super LTC like TCEC), was consistently beating Houdini 4 (maybe 30-40 Elo points difference), and Houdini 4 itself was 50 Elo points stronger at LTC than Houdini 1.5a. But at 40/4' the difference between SF DD and Houdini 1.5a was 50 or so Elo points. Ok, this is no proof of anything, but I remember some pretty sharp differences in scaling of some engines (Critter, Zappa, etc. come to mind, from opposite sides).

PS By the way, it seems an inversion is pretty obvious: Houdini 4 is stronger than Stockfish DD at blitz, weaker at LTC (TCEC). The LTC error margins are large, though.
Critter and Zappa are "known" to scale badly, but often that is a byproduct of much different SMP scalling, for Critter really bad one, for Zappa really good one.
I can see how some aggressive pruning can make EBF and tree shape very much different at depth 15 and depth 25. Some engines used to have very inconsistent EBF at lower depths. Still once depth increases (beyond 25-30) usually search stabilizes and then due to diminishing returns (more draws) any Elo difference between two engines should start reducing. I believe things like engine A has better eval than engine B therefore it scales better is total BS.
Back to H4 and Stockfish DD. TCEC is a bad measure because it is known that till Houdart implemented LazySMP H was really bad in scaling on many cores so most of its lost strength in TCEC and eventually inversion that you are talking about is a consequence of bad SMP scaling.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Top Three Engines Are Essentially Equal Strength!

Post by corres »

There are only a few engines with built-in opening book, for e.g. newer Strelka. It is sure Stockfish has no such a book and it is very probable that Houdini and Komodo have no it, too.
The really problem is there is no opening book without bias.
A commercial engine maker has a different expectancy from the book than an author of an open source engine or a hobby maker.
Moreover the viewpoints of user are very different.
Ones who like to watch engine matches want to see "exciting" games, ones who use engines for analysis want to know the weaker and stronger side of that engines.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Top Three Engines Are Essentially Equal Strength!

Post by Laskos »

Milos wrote:
Laskos wrote:No, just from memory, SF DD at LTC (or super LTC like TCEC), was consistently beating Houdini 4 (maybe 30-40 Elo points difference), and Houdini 4 itself was 50 Elo points stronger at LTC than Houdini 1.5a. But at 40/4' the difference between SF DD and Houdini 1.5a was 50 or so Elo points. Ok, this is no proof of anything, but I remember some pretty sharp differences in scaling of some engines (Critter, Zappa, etc. come to mind, from opposite sides).

PS By the way, it seems an inversion is pretty obvious: Houdini 4 is stronger than Stockfish DD at blitz, weaker at LTC (TCEC). The LTC error margins are large, though.
Critter and Zappa are "known" to scale badly, but often that is a byproduct of much different SMP scalling, for Critter really bad one, for Zappa really good one.
I can see how some aggressive pruning can make EBF and tree shape very much different at depth 15 and depth 25. Some engines used to have very inconsistent EBF at lower depths. Still once depth increases (beyond 25-30) usually search stabilizes and then due to diminishing returns (more draws) any Elo difference between two engines should start reducing. I believe things like engine A has better eval than engine B therefore it scales better is total BS.
Back to H4 and Stockfish DD. TCEC is a bad measure because it is known that till Houdart implemented LazySMP H was really bad in scaling on many cores so most of its lost strength in TCEC and eventually inversion that you are talking about is a consequence of bad SMP scaling.
I have a conclusive result (outside 2SD error margins) from ultra-bullet to bullet for Stockfish DD against Houdini 1.5a

200ms/move:

Code: Select all

Games Completed = 2000 of 2000 (Avg game length = 26.064 sec) 
Settings = Gauntlet/32MB/200ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000) 
Time = 13258 sec elapsed, 0 sec remaining 
 1.  Stockfish DD 64 SSE4.2   1089.5/2000   765-586-649     (L: m=0 t=0 i=0 a=586)   (D: r=355 i=44 f=43 s=10 a=197)   (tpm=208.9 d=17.63 nps=1949095) 
 2.  Houdini 1.5a x64          910.5/2000   586-765-649     (L: m=2 t=0 i=0 a=763)   (D: r=355 i=44 f=43 s=10 a=197)   (tpm=198.1 d=13.71 nps=2715919)
Score of SF DD: 54.5% (31 Elo points difference)

1000ms/move:

Code: Select all

Games Completed = 800 of 800 (Avg game length = 134.166 sec)
Settings = Gauntlet/32MB/1000ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 27052 sec elapsed, 0 sec remaining
 1.  Stockfish DD 64 SSE4.2   	483.0/800	350-184-266  	(L: m=0 t=0 i=0 a=184)	(D: r=139 i=15 f=15 s=2 a=95)	(tpm=1002.8 d=22.18 nps=1938539)
 2.  Houdini 1.5a x64         	317.0/800	184-350-266  	(L: m=0 t=0 i=0 a=350)	(D: r=139 i=15 f=15 s=2 a=95)	(tpm=996.1 d=16.45 nps=2629141)
Score of SF DD: 60.4% (72 Elo points difference)

Also, from LTC (40 moves in 2 hours) rating list of CEGT
http://www.cegt.net/40120new/40_120_rat ... liste.html
we have Stockfish DD rated 104 Elo points above Houdini 1.5a at LTC, another enhancement of Elo difference. All in all, SF DD does scale better, and increases its Elo gap with respect to Houdini 1.5a as time control increases. Probably this continues to very very long time controls after which the Elo difference diminishes.

It could be that Komodo scales better than Stockfish with time control on 1 core, but scales a bit worse with many cores. That is my impression, anyway.