Thanks. I took the 1' and 4' lists, culled the bottom three engines as too far below the others, and divided the remainder into Ippo-cousins and other (other being Rybka, Komodol, and Stockfish). The "Other" group had the same average rating on both lists, 3046.7. The Ippo group had a drop of 11.5 elo (to 3050.5) at 4'. So based on this, the effect is real but smaller than I thought. So it suggests that your list only overrates the Ippo family by about ten elo or so if we consider normal blitz to be the gold standard.pohl4711 wrote:A good idea. But before doing so, you should check out the excellent testwork of Andreas Strangmüller: http://www.fastgm.de. Perhaps you find all answers there?lkaufman wrote:
Now that I have the hardware, I'm planning to get an answer once and for all to the question of bullet chess (like LS list) correlates well with blitz lists (like IPON and now the 5' + 3" CEGT list). I'm running a gauntlet for the new Komodo (against five top engines) at 2' + 1" (HT off, same book as LS uses, 36 cores running on it so 36 games at once. When I'm done, I'll cut the time in half and repeat, and if time permits I'll do 4' +2". I'll have enough games to be able to say once and for all how valid bullet testing is, if the goal is to predict results at 5' + 3" or so. Although I've often said that I think bullet testing favors Ippo related engines, I'm open-minded; if the results show otherwise I won't hesitate to admit I was wrong. Actually it would be very good news for the computer chess community if I am wrong, because it means that we can get much more reliable sample sizes just by playing faster games.
So far my result (for TCEC stage 3 version) against Houdini 3 is 47.1% out of 1900 games, about 20 elo down. If there really is no difference in relative strength of engines at different levels, I would expect something like 48% at 4'+2" and 46% at 1' + 30". The percentage should asymptotically approach 50% at super long time controls. But I claim that there is some reasonable level where Komodo actually will score over 50% in a long match. Maybe this will shed some light on the question. I may actually just run a fairly slow match on my quad to see if I get a plus score.
What we see there is, that with longer thinking times, the difference of the first and the last position of a ratinglist gets smaller. That happens, because the draw-ratio increases with longer thinking time and so head-to-head results can get closer to 50%.
But we see, too, that in all 3 rating-lists (the list with 3.75''+0.0375'' has a too short thinking time - I ignore this one, because with that short times, Windows-system-operations can distort (or engine-initialize-operations)) Houdini 3 is number 1 and Komodo CCT is number 2. Only Stockfish climbs a little bit with more time:
http://www.fastgm.de/15+0.15.html / http://www.fastgm.de/60+0.60.html / http://www.fastgm.de/240+2.40.html
Stefan
44 elo swing depending on hardware!
Moderator: Ras
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: 44 elo swing depending on hardware!
-
ThatsIt
- Posts: 992
- Joined: Thu Mar 09, 2006 2:11 pm
Re: 44 elo swing depending on hardware!
OmG!kranium wrote: [...snip...]
instead of being unfairly criticized by the outdated and entrenched establishment
[...snip...]
Decemb(e)rist, Comrade, Minstrel with Propaganda ...
-EoD-
-
lkaufman
- Posts: 6284
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: 44 elo swing depending on hardware!
Some new conclusions, now that I have had time to test using my new 20 core machine with hyperthreading off. On the two core machines I run two matches of 8 or 10 threads each, so total running matches the cores.
It seems that in direct match play there is still a substantial difference in the relative speeds of Houdini to Komodo depending on whether I'm running on a machine with two physical processors or just one. The difference is not as large as with HT off, but it is still substantial; the speed ratio is 1.43 on a normal third generation i7 quad, but it is only 1.28 on the big machines. As a consequence, the results are dramatically different. On the quad, at 30" +.25" (probably about equivalent to LS list), current Komodo is down by 21 elo after about 500 games so far. But on the big machines, Komodo leads by 24 elo after 2700 games! I didn't see this big discrepancy when I ran round robins. It's all very strange, but it seems that we have caught up with Houdini on the big machines but not quite on the ordinary ones yet.
It seems that in direct match play there is still a substantial difference in the relative speeds of Houdini to Komodo depending on whether I'm running on a machine with two physical processors or just one. The difference is not as large as with HT off, but it is still substantial; the speed ratio is 1.43 on a normal third generation i7 quad, but it is only 1.28 on the big machines. As a consequence, the results are dramatically different. On the quad, at 30" +.25" (probably about equivalent to LS list), current Komodo is down by 21 elo after about 500 games so far. But on the big machines, Komodo leads by 24 elo after 2700 games! I didn't see this big discrepancy when I ran round robins. It's all very strange, but it seems that we have caught up with Houdini on the big machines but not quite on the ordinary ones yet.