Don wrote:Bob,
Here is the big problem with what you say about Crafty: You go back in time with an ancient log file and compare it with a modern log file. Then you draw conclusions about hardware advances.
Yes. Because I am comparing something that is almost uniformly constant over the 15 years of Crafty development, in "what is a node?" That has not changed significantly. At times eval might be a bit slower, at other times it might be a bit faster. But in general, evaluation has not changed dramatically in terms of computational cost, at least for Crafty. It is done quite differently, in quite a few places. Move generation has not changed appreciably since rotated bitboards in version 6.0. Magic is no faster, but does offer additional flexibility that I take advantage of in a few software improvements. The basic structure of search/quiesce has not changed much. Reductions don't add a lot of code, so they have little effect on NPS. Ditto for extensions. Pruning.
I can't think of any better measure than to take a good program from 1995 on hardware from 1995, and compare that to a good program in 2010 on good hardware in 2010. And about the only thing that can be compared reasonably is NPS, since "node" has not changed significantly.
Another answer is to get a 1995 box, with a 1995 program, and play against a wide gauntlet to get a good rating. Then take that program to 2010 and use 2010 hardware and repeat. Significant error is included, because a lot of hardware advances will not help such a program until it is modified to take better advantage of them.
For me, the NPS is certainly the optimal way to measure hardware improvement. The 1995 program on 1995 hardware was a pretty well-optimized "system" (speaking of my program specifically, here). DItto for 2010 program on 2010 hardware. Hardware speed is really all about NPS and nothing more, from my perspective. Yes, we might elect to do something that drops NPS in return for more accuracy somewhere else, but that's a programming issue, not a hardware issue. And it gives the "software" component a bit of an unfair boost, to be sure. But I don't see how to make it perfect unless we write two new programs, one for 1995 and one for 2010, and somehow agree on what to leave out of the 1995 program to make it representative...
I don't believe my simple NPS comparison is either (a) inaccurate; (b) exaggerated; (c) fictional; or anything else other than an accurate idea of what one pretty good programmer could do in 1995 vs 2010, in terms of NPS.
That is perfect from your point of view, because you claim there has been only minor software advances and it's been almost all hardware. So from your point of view this is the perfect test. Any software changes since then should have almost no impact on the nodes per second, or the strength or anything else.
You are including _way_ more than I said. I am suggesting using the NPS difference from 1995 for Crafty, compared to 2010 Crafty, as a benchmark number to give us the hardware improvement over that 15 year period of time.
To compare software, we need a 1995 program and 1995 hardware, and then the same 1995 program on 2010 hardware. But it doesn't need to be a lousy example of a 1995 program. Parallel search was old news in 1995. Genius never had that. So run it today and you immediately throw away 5/6 of the hardware advantage of today. Not reasonable, IMHO.
My claim is that software has had roughly the same impact as hardware and such a test from my point of view would not reflect any software advances.
By the same token, omitting one of the _major_ hardware advances of the past few years (multiple cores) is also flawed. Badly flawed.
So take that NPS measure, which for crafty _is_ clearly 1000x faster today on a single-chip box compared to the P5/133. I'll even give you the extra 10K nps and just stick at 30K when it was closer to 20K on the P90. But I don't have any data since I never ran on one myself (I went from a Sparc to a P5/133 as my first Intel platform for ICC (ICS at the time). If we used a P90 speed, it becomes more like 1500x, which is only worse.
And I am not willing to take your word for it that nothing has changed that would have any impact on this. You are almost certainly going to respond by saying that you have always done things the same way or that any changes did not affect the basic speed. But even if you THINK that is the case, it's probably not and I'll tell you why in the next paragraph.
Recently I discovered that over a period of just a few months my program has changed more than I would have believed. I have re-written Komodo using the old Komodo as template - in affect I'm cloning my own program.
When the new program failed to play as well by a substantial margin I started to look for any major differences, ignoring the small things that I didn't think made any difference. Wow, how wrong I was. I came to appreciate that Komodo was packed with minor speedups and minor ELO things that by themselves hardly made any difference. The things I kept discovering seemed like a bottomless pit. In my mind I had greatly simplified the entire progress I had made over the last 3 years or so and had boiled it down to perhaps half a dozen big things and few things of little consequence, but that was grossly in error.
When you are talking about a well developed program, you just cannot compare them over time - so I reject your premise that Crafty has not improved in the software or that it's still the same basic program.
Reject what you want. But you are rejecting something _I_ have not said. I said that a "node" in Crafty has not changed significantly in 15 years. I'll go even further and say that a "node" in my chess programs dating back to 1978 where we first used brute-force has not changed significantly. Prior to that nodes were drastically different, because the search was drastically different, etc. But the same basic negamax search framework is quite similar.
I'd be happy to send you the late 1995 evaluation from Crafty so that you can compare it to today. You could run 'em in parallel to see what the difference in speed is. I doubt it is significant. You can compare move generation the same way. Or Swap() (SEE). Or Make/Unmake. Those just haven't changed very much at all. I spent a ton of time to get rid of black/white duplication. Made it no faster, just easier to change later.
In short, a node for Crafty today is comparable to a node in Crafty in 1995. If I can get this damned thing to quit seg faulting because of confusion about what "long" means on 64 bit hardware (was always 32 on 32 bit boxes, but is breaking things here and there on 64 bits) I will be able to answer this particular question exactly. What if the NPS for 1995 Crafty is going to be roughly 30M on the same hardware. Or to avoid SMP strangeness, suppose I run 1995 and 2010 crafty on a single CPU and measure NPS and they are fairly close. Will that be convincing that NPS is comparable and useful as a measure of hardware speed???
The single-cpu test would be easier, because I did not release a SMP version of Crafty until late 1996, a year later. I had one much earlier as we ran Crafty on the Cray's for a bit since they were a good 64 bit box which I needed.
I'll at least post the numbers for p5/133 and the E5345 boxes, and an equivalent run of current crafty on the E5345 for comparison. It is going to take a bit more time to get the 1995 version running. Any guesses on the NPS today? 1995 p5/133 was 30K. Today, on a single-cpu E5345, current Crafty hits about 3.5M-4.0M nps. Take a guess at what 1995 Crafty is going to do. I'm expecting something within a factor of two at worst, probably closer. Might even be faster if the old eval and stuff is simpler, might be slower if the new version is better tuned to the new hardware. I suspect pretty close, but might be wrong.
I will continue working and post the 1995 on new hardware numbers, assuming I can eventually stop the crashes and make sure it is working correctly without any ugly unseen bugs...
More later, as surely you'd have to agree that the above comparison would be "dead on". It is not going to be 1000x since I will initially not use SMP, but I am going to test that as well once I get past the current issues.
If you compare a 15 year old Crafty to a modern Crafty you cannot possibly call this a pure hardware comparison. Even if you THINK you can, it just means that you forgot a lot of the little things because none of them by itself made a substantial difference. And don't say this doesn't apply to nodes per second - if you say that I'm not allowed to improve the nodes per second either (except by hardware only) then I throw my hands in the air.
What about my proposal above? 1995 crafty data (30K nps average) against _same_ program on 2006 hardware (my E5345 box). No SMP to start with although I clearly had parallel search in 1995, or 1985, or even 1978. I'm expecting something in the range of 160x, since it will be giving up 7/8 of the hardware. Will need, at some point, to get some good 6-core i7 numbers since the E5345 is a 2006 processor, some 4 years old now.
So it's my belief that the Rebel time comparison, as FLAWED as it probably is, AT LEAST has the merit of keeping one thing constant. When I'm developing Komodo I try to follow the same principle - I don't change 10 unrelated things at once and then test - if I get a large improvement I have no clue which subset of those 10 things made a difference.
I think a reasonable way to do the hardware comparison test is to find a version of Crafty that is BETWEEN the 16 year span - perhaps something that is 8 years old and will run on both platforms but was written for neither. Remove any special assembly language optimizations and then test it on an ACTUAL P90 as well as an i7. Then we have something that is about as fair as we can make it. It's certainly more fair than taking an old program and comparing it a completely different program from a different era.
I _believe_ I can do better. I have a real 1995 version that I am hell-bent on making work on my cluster hardware. It does have some ASM, but then again, so does the current version (to get to bsf/bsr/etc.) I'd love to just "steal" my current inline code to solve part of this, but it won't work. I renumbered the bits a few years ago, which makes this impossible.
By the way, please tell me why we have to pretend parallel programs did not exist in the mid 90's?
Don. Get a grip and relax... _YOU_ have been saying we can't use SMP today. I have been saying we should. _YOU_ have been saying that the SMP of today is a lot of software. I have been saying it is not, it already existed prior to our baseline year of 1995. Only problem is, in 1995 the pentium was a one-core processor. Today's I7 has up to 6. Possibly 8 by now although I have only run on a prototype, and am not sure they are ever going to release one for unknown reasons. So 1-core p5 in 1995. 6-core i7 in 2010. Makes genius/rebel moot since they won't use 2010 hardware effectively. That has been my issue with either (a) counting parallel search as a software improvement since 1995. It is not. DTS was running in 1988, the year I finished my dissertation. (b) using a non-SMP 1995 program is a poor measure. Yes it will give a good (possibly) single-cpu speed comparison, but it will be off by 6x.
If we do that can I also pretend that commercial programs didn't exist? In both cases we are talking about more cutting edge stuff. Neither is going to make much of a difference because quad MP is only worth about 100 ELO and Crafty was probably close in ELO to the best commercial programs back then.
quad-mp should be worth at _least_ 100. I'd figure a speedup of something in the 3.1-3.4 range based on lots of testing. And we are talking about 6-way at least. Something in the 4.5x range. 2+ plies. Very significant. Very definitely not software.