lkaufman wrote:Now that I have the hardware, I'm planning to get an answer once and for all to the question of bullet chess (like LS list) correlates well with blitz lists (like IPON and now the 5' + 3" CEGT list). I'm running a gauntlet for the new Komodo (against five top engines) at 2' + 1" (HT off, same book as LS uses, 36 cores running on it so 36 games at once. When I'm done, I'll cut the time in half and repeat, and if time permits I'll do 4' +2". I'll have enough games to be able to say once and for all how valid bullet testing is, if the goal is to predict results at 5' + 3" or so. Although I've often said that I think bullet testing favors Ippo related engines, I'm open-minded; if the results show otherwise I won't hesitate to admit I was wrong. Actually it would be very good news for the computer chess community if I am wrong, because it means that we can get much more reliable sample sizes just by playing faster games.kranium wrote:the top engine's ratings in Stefan's Lightspeed list match fairly closely with CEGT, and other listsThatsIt wrote:Modesty is_not your matter, isn't it ?pohl4711 wrote: [...snip...]
Its all about playing a lot, lot, lot of games! and with 5'+3'' most
people dont play lot of games, but only 100 or 150 in the head-to-
head competition. And thats obviously not enough.
[...snip...]
Best - Stefan
My view:
better 50, 100 or 150 games with 5'+3" than thousands of ultra bullet scrap !
but Stefan's list has a couple big advantages:
it's unbiased and all-inclusive
and more importantly: he plays enough games to achieve a high level of accuracy
(an error margin of +-5 ELO compared to CEGT 40/20's +-15 ELO or more)
unlike CEGT, he does not play the games on different hardware and simply combine the results
(which may result in big ELO swing, as Larry points out in this topic)
that said, i'm not surprised to see just how popular his site has become...
and it may begin to explain your animosity towards him
So far my result (for TCEC stage 3 version) against Houdini 3 is 47.1% out of 1900 games, about 20 elo down. If there really is no difference in relative strength of engines at different levels, I would expect something like 48% at 4'+2" and 46% at 1' + 30". The percentage should asymptotically approach 50% at super long time controls. But I claim that there is some reasonable level where Komodo actually will score over 50% in a long match. Maybe this will shed some light on the question. I may actually just run a fairly slow match on my quad to see if I get a plus score.
Larry,
fact is: CEGT has recently gotten your wholehearted thumbs up as they enthusiastically adopted your preferred TC...
but they use various hardware and LS provides a consistent hardware platform
so, i fail to understand why this/your topic: "44 elo swing depending on hardware!"
is now turning into a referendum on the validity of lightning speeds (i.e. -> the LS rating list)
(which have been used with great success by engine developers for many many years, Bob H., Vas R. especially)
it seems especially inappropriate after the severe LS put-downs by CEGT and IPON, (whom you have recently praised)
you're a great guy, and i believe you are fair (despite a commercial conflict of interest)...lkaufman wrote: Actually it would be very good news for the computer chess community if I am wrong, because it means that we can get much more reliable sample sizes just by playing faster games.
but with all due respect (and you deserve alot), IMO, your individual tests != empirical truth for the CC community on this issue
Norm