What do you find so wrong about these data?bob wrote:I don't accept that at all. That's why I suggested we run a test rather than using ratings that are very old and out of date. how many games has fritz 5.32 played _recently_ on the rating lists? That makes a huge difference and it might be better now since it is still going to beat the top programs on occasion, and with them so much higher its rating would likely drag up as well.michiguel wrote:Then you have to accept that Fritz 5 is 622 Elo points below Rybka in current hardware. That is a bit more than the 600 points you estimate harwdare provided in 10 years.bob wrote:This is going around in circles. It is easy to quantify the hardware. I'd suggest taking the best of today, the intel I7 (core-3) and the best of late 1998. Limit it to a single chip for simplicity, but no limit on how many cores per chip. I believe this is going to be about a 200:1 time handicap to emulate the difference between the 4-core core-3 from intel and the best of 1998, which was the PII/300 processor.Don wrote:So if Rybka loses with say a 32 to 1 handicap you are saying that we should give her even less time to see if she still loses?Dirt wrote:If the parallel search overhead means that the ratio should really be, say, 150:1 then I don't think Rybka losing really proves your point. If there should be such a reduction, and how large it should be, is a question I am asking.Don wrote:None of this will matter unless it's really a close match - so I would be prepared to simple test single processor Rybka vs whatever and see what happens. If Rybka loses we have a "beta cut-off" and can stop, otherwise we must test something a little more fair and raise alpha.Dirt wrote:Correct me if I'm wrong, but in moving to a time handicap you seem to be ignoring the parallel search inefficiency we were both just explaining to Louis Zulli. Shouldn't that be taken into account?bob wrote:First let's settle on a 10 year hardware period. The q6600 is two years old. If you want to use that as a basis, we need to return to early 1997 to choose the older hardware. The Pentium 2 (Klamath) came out around the middle of 1997, which probably means the best was the Pentium pro 200. I suspect we are _still_ talking about 200:1
This is not about simple clock frequency improvements, more modern architectures are faster for other reasons such as better speculative execution, more pipelines, register renaming, etc...
For comparison, crafty on a quad-core I7 runs at 20M nodes per second, while on the single-cpu PII/300 was running at not quite 100K nodes per second. A clean and simple factor of 200x faster hardware over that period (and again, those quoting moore's law are quoting it incorrectly, it does _not_ say processor speed doubles every 2 years, it says _density_ doubles every 2 years, which is a different thing entirely). Clock speeds have gone steadily upward, but internal processor design has improved even more. Just compare a 2.0ghz core2 cpu against a 4.0ghz older processor to see what I mean.)
so that fixes the speed differential over the past ten years with high accuracy. Forget the discussions about 50:1 or the stuff about 200:1 being too high. As Bill Clinton would say, "It is what it is." And what it is is 200x.
That is almost 8 doublings, which is in the range of +600 Elo. That is going to be a great "equalizer" in this comparison. 200x is a daunting advantage to overcome. And if someone really thinks software has produced that kind of improvement, we need to test it and put it to rest once and for all...
I will accept that a program today running on 4 cores will see some overhead due to the parallel search. But I don't think it is worth arguing about whether we should scale back the speed because of the overhead. That is simply a software issue as well, as it is theoretically possible to have very little overhead. If the software can't quite use the computing power available, that is a software problem, not a hardware limit.
Miguel
Code: Select all
CCRL 40/4 Rating List - Custom engine selection
388092 games played by 744 programs, run by 12 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz)
Computed on January 10, 2009 with Bayeselo based on 388'092 games
Tested by CCRL team, 2005-2009, http://computerchess.org.uk/ccrl/404/
Rank Engine ELO + - Score AvOp Games
1 Fritz 5.32 2642 +13 -13 53.2% -24.5 2132
So let's run the test rather than speculating...
I have some Crafty versions that should be right for that time frame. Crafty 15.0 was the first parallel search version. I suspect something in the 16.x versions or possibly 17.x versions was used at the end of 1998. Crafty ran on a quad pentium pro early in 1998 whe
n version 15.0 was done...