I am thinking about the 44 cores allowing the engine to search and prune deeper then another engine a lot more than just longer time controls.yurikvelo wrote:Some claim that longer TC favours Komodo more than Stockfish.Leo wrote:I think on powerful hardware like that used in the TCEC the difference is greater.
Since at this point SF is over Komodo, longer TC decrease distance (it would increase if Komodo at this point was stronger than SF in CCRL conditions)
Also longer TC increase drawrate (under the same conditions).
Top Three Engines Are Essentially Equal Strength!
Moderators: hgm, Rebel, chrisw
-
- Posts: 1080
- Joined: Fri Sep 16, 2016 6:55 pm
- Location: USA/Minnesota
- Full name: Leo Anger
Re: Top Three Engines Are Essentially Equal Strength!
Advanced Micro Devices fan.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top Three Engines Are Essentially Equal Strength!
No, just from memory, SF DD at LTC (or super LTC like TCEC), was consistently beating Houdini 4 (maybe 30-40 Elo points difference), and Houdini 4 itself was 50 Elo points stronger at LTC than Houdini 1.5a. But at 40/4' the difference between SF DD and Houdini 1.5a was 50 or so Elo points. Ok, this is no proof of anything, but I remember some pretty sharp differences in scaling of some engines (Critter, Zappa, etc. come to mind, from opposite sides).Milos wrote:Could you please post the bolded part?Laskos wrote:Well, if you are not satisfied with ultra-bullet, Stockfish DD was even stronger at LTC compared to Houdini 1.5a than at 40/4', and that would be a satisfiable to you example. Each individual error 2SD is 13 ELO points, for the difference 18 Elo points. The difference itself is 27 Elo points, larger tan that. So, there is >97.5% confidence SF DD improved comparatively to Houdini 1.5a.
Or at least the Elo difference at something like 2s per move (or longer) since at 200ms per move we have the difference to be 31Elo.
PS By the way, it seems an inversion is pretty obvious: Houdini 4 is stronger than Stockfish DD at blitz, weaker at LTC (TCEC). The LTC error margins are large, though.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Top Three Engines Are Essentially Equal Strength!
Critter and Zappa are "known" to scale badly, but often that is a byproduct of much different SMP scalling, for Critter really bad one, for Zappa really good one.Laskos wrote:No, just from memory, SF DD at LTC (or super LTC like TCEC), was consistently beating Houdini 4 (maybe 30-40 Elo points difference), and Houdini 4 itself was 50 Elo points stronger at LTC than Houdini 1.5a. But at 40/4' the difference between SF DD and Houdini 1.5a was 50 or so Elo points. Ok, this is no proof of anything, but I remember some pretty sharp differences in scaling of some engines (Critter, Zappa, etc. come to mind, from opposite sides).
PS By the way, it seems an inversion is pretty obvious: Houdini 4 is stronger than Stockfish DD at blitz, weaker at LTC (TCEC). The LTC error margins are large, though.
I can see how some aggressive pruning can make EBF and tree shape very much different at depth 15 and depth 25. Some engines used to have very inconsistent EBF at lower depths. Still once depth increases (beyond 25-30) usually search stabilizes and then due to diminishing returns (more draws) any Elo difference between two engines should start reducing. I believe things like engine A has better eval than engine B therefore it scales better is total BS.
Back to H4 and Stockfish DD. TCEC is a bad measure because it is known that till Houdart implemented LazySMP H was really bad in scaling on many cores so most of its lost strength in TCEC and eventually inversion that you are talking about is a consequence of bad SMP scaling.
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Top Three Engines Are Essentially Equal Strength!
There are only a few engines with built-in opening book, for e.g. newer Strelka. It is sure Stockfish has no such a book and it is very probable that Houdini and Komodo have no it, too.
The really problem is there is no opening book without bias.
A commercial engine maker has a different expectancy from the book than an author of an open source engine or a hobby maker.
Moreover the viewpoints of user are very different.
Ones who like to watch engine matches want to see "exciting" games, ones who use engines for analysis want to know the weaker and stronger side of that engines.
The really problem is there is no opening book without bias.
A commercial engine maker has a different expectancy from the book than an author of an open source engine or a hobby maker.
Moreover the viewpoints of user are very different.
Ones who like to watch engine matches want to see "exciting" games, ones who use engines for analysis want to know the weaker and stronger side of that engines.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Top Three Engines Are Essentially Equal Strength!
I have a conclusive result (outside 2SD error margins) from ultra-bullet to bullet for Stockfish DD against Houdini 1.5aMilos wrote:Critter and Zappa are "known" to scale badly, but often that is a byproduct of much different SMP scalling, for Critter really bad one, for Zappa really good one.Laskos wrote:No, just from memory, SF DD at LTC (or super LTC like TCEC), was consistently beating Houdini 4 (maybe 30-40 Elo points difference), and Houdini 4 itself was 50 Elo points stronger at LTC than Houdini 1.5a. But at 40/4' the difference between SF DD and Houdini 1.5a was 50 or so Elo points. Ok, this is no proof of anything, but I remember some pretty sharp differences in scaling of some engines (Critter, Zappa, etc. come to mind, from opposite sides).
PS By the way, it seems an inversion is pretty obvious: Houdini 4 is stronger than Stockfish DD at blitz, weaker at LTC (TCEC). The LTC error margins are large, though.
I can see how some aggressive pruning can make EBF and tree shape very much different at depth 15 and depth 25. Some engines used to have very inconsistent EBF at lower depths. Still once depth increases (beyond 25-30) usually search stabilizes and then due to diminishing returns (more draws) any Elo difference between two engines should start reducing. I believe things like engine A has better eval than engine B therefore it scales better is total BS.
Back to H4 and Stockfish DD. TCEC is a bad measure because it is known that till Houdart implemented LazySMP H was really bad in scaling on many cores so most of its lost strength in TCEC and eventually inversion that you are talking about is a consequence of bad SMP scaling.
200ms/move:
Code: Select all
Games Completed = 2000 of 2000 (Avg game length = 26.064 sec)
Settings = Gauntlet/32MB/200ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 13258 sec elapsed, 0 sec remaining
1. Stockfish DD 64 SSE4.2 1089.5/2000 765-586-649 (L: m=0 t=0 i=0 a=586) (D: r=355 i=44 f=43 s=10 a=197) (tpm=208.9 d=17.63 nps=1949095)
2. Houdini 1.5a x64 910.5/2000 586-765-649 (L: m=2 t=0 i=0 a=763) (D: r=355 i=44 f=43 s=10 a=197) (tpm=198.1 d=13.71 nps=2715919)
1000ms/move:
Code: Select all
Games Completed = 800 of 800 (Avg game length = 134.166 sec)
Settings = Gauntlet/32MB/1000ms per move/M 600cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_v1.epd(32000)
Time = 27052 sec elapsed, 0 sec remaining
1. Stockfish DD 64 SSE4.2 483.0/800 350-184-266 (L: m=0 t=0 i=0 a=184) (D: r=139 i=15 f=15 s=2 a=95) (tpm=1002.8 d=22.18 nps=1938539)
2. Houdini 1.5a x64 317.0/800 184-350-266 (L: m=0 t=0 i=0 a=350) (D: r=139 i=15 f=15 s=2 a=95) (tpm=996.1 d=16.45 nps=2629141)
Also, from LTC (40 moves in 2 hours) rating list of CEGT
http://www.cegt.net/40120new/40_120_rat ... liste.html
we have Stockfish DD rated 104 Elo points above Houdini 1.5a at LTC, another enhancement of Elo difference. All in all, SF DD does scale better, and increases its Elo gap with respect to Houdini 1.5a as time control increases. Probably this continues to very very long time controls after which the Elo difference diminishes.
It could be that Komodo scales better than Stockfish with time control on 1 core, but scales a bit worse with many cores. That is my impression, anyway.