harware vs software advances

Dann Corbit · Post by **Dann Corbit** » Fri Sep 10, 2010 1:37 am

I remember when the DEC Alpha first came out, it smoked the Intel 80x86 and Mac 68000 architectures. I compiled a 64 bit program on it, and no other machine at my disposal even came close. As I recall, Intel bought or copied or in some other way came up with the same technology that DEC had to get the same throughput shortly afterwards, but for a while, the Alpha chips had a real edge.

It seems to me that technology often goes in fits and starts instead of a smooth transition, but when you plot the advances as an exponential chart, they always seem to track pretty smoothly because of the log nature of the y axis.

bob · Post by **bob** » Fri Sep 10, 2010 2:53 am

Dann Corbit wrote:I remember when the DEC Alpha first came out, it smoked the Intel 80x86 and Mac 68000 architectures. I compiled a 64 bit program on it, and no other machine at my disposal even came close. As I recall, Intel bought or copied or in some other way came up with the same technology that DEC had to get the same throughput shortly afterwards, but for a while, the Alpha chips had a real edge.

It seems to me that technology often goes in fits and starts instead of a smooth transition, but when you plot the advances as an exponential chart, they always seem to track pretty smoothly because of the log nature of the y axis.

It would likely still smoke them had it been continued. It was a well-done architecture that was superior in every way to X86...

Don · Post by **Don** » Fri Sep 10, 2010 3:07 am

bob wrote: First, I have seen several doubles. With Crafty, a P5/133 hit 30K nps. The P6/200 went to 75K. More than 2x faster. I've seen similar things comparing core2 to the original crappy core intel (their first 64 bit stuff was just ugly).

I did manage to locate some P5/133 printed output. Not much, but at least a reference, and as I had thought, we were hitting 30K nodes per second on this box in 1995. I just did some testing on my 3-year-old dual quad intel (2.33ghz E5345 xeon) and hit 30M on quite a few positions, 22-24 was the average.

For me, that is a clean factor of 1,000 (30K to 30M) if you restrict things to just one physical CPU (my i7 6-core numbers were over 30M).

These numbers have no meaning unless they are put in some kind of context. Saying that you "hit 30M on quite a few positions means nothing" Also, I don't see how you can possibly be doing 30 million nodes per second on a 3 year old computer on 1 core, that is just not believable. Are you talking about some kind of simple endgame or something?

Your numbers do not even begin to reconcile against the benchmarks that have been published for Rebel. Also, the 55 to 1 number I published is going from the P90 to a slower core 2 duo notebook computer, but I doubled that number because I am comparing to an i7-980x which is currently the fastest clocked Intel and best chip too. The difference in those are 1.95 to 1. So my rounded number going from P90 to the fast i7-980x is 100 to 1 considering a single core. I think you are either missing a zero or adding a zero because if they are this far in disagreement it is pretty hard to explain. I vote that your figures are the most suspect because I'm comparing apples to apples (same exactly program, same exact test) and you are looking a 15 year old log files of a significantly different version of Crafty and estimating the nodes per second based on a spot check of these 15 year log files. In other words it's really difficult to really know what it is you are actually seeing and plus you have a reputation for big time exaggeration (I think you are honest, but you have a gift for hyperbole and exaggeration.)

I don't think you can ignore the 6-core issue since that is clearly a hardware advance over the old processors that had 1 core and no cache on the chip back in the case of the P5. Some of that speed gain is 64 bit stuff, some is multiple-core stuff, some is improved cache/OOE stuff, but 30K to 30M can't be ignored.

I think you are trying to claim some of the software engineering gains by waving your hands and saying that improvements in MP implementation do not count.

I believe a big part of the gain in chess software IS engineering improvements which are not of a pure chess nature. In another post you said, "alpha beta works for chess, checkers, othello, go etc." and that is your justification for saying that the software aspects of MP do not contribute the quality of a chess program. That is total nonsense, I'm sorry. If alpha beta is the justification for throwing it completely out of the software side of the equation then what about search enhancement that would apply to any game? Are you going to argue that this is just a technique to better utilize the hardware?

Clearly, any search enhancement such as LMR and null move pruning should not be considered software improvement because they are not chess specific? I don't see how hard work on the MP implementation is not part of what makes some programs better than any other. I always thought Crafty did this better than most other programs, are saying that is totally irrelevant?

And that means if we have this same discussion in 10 years and people have figured out how to get a lot more out of the MP implementation, we are not allowed to consider it a software improvement? I think you are being pretty unreasonable here.

If it's just a matter of us thinking about it differently, then it was just a misunderstanding. Let me clarify my belief to set it apart from what you are trying to dilute it to mean. I believe that the enormous hard work of software engineering by the program authors (including you and I) in every apsect of an actual chess program (including SEARCH or OPTIMIZATIONS designed to get more out the CPU or compiler) contributes enormously to the success of the programs. If you take this to mean something else then we are talking about totally different things.

So there is no possible way I can win this "argument" if you systematically remove all aspects of software engineering from the equation except for the "evaluation function" which even you would probably have a difficult time claiming is not chess specific.

I don't think it is sensible to compare a 1995 program running on modern hardware to get the hardware speedup. A program is always optimized to whatever it runs on, as opposed to what it might run on in 20 years.

I have already conceded this point, but this is not worth an order of magnitude difference. But some of this should be considered software advances. If the compilers are better is that a software or hardware thing? You want the software side to take a hit for this too don't you? If I hand optimize a piece of my code should that be thrown out too because the old program didn't have that? It's not wonder we disagree if you consider most actual software engineering aspects of this irrelevant.

In principle however I agree with you on this point - running a 16 year old program on new hardware is a handicap, just as it would be a handicap if it were done in the other direction. I know you probably think this is worth a few doublings but it isn't.

The 64 versus 32 bit thing is almost completely irrelevant. Some of our very best program are still 32 bit programs. This is an example of something that you probably think we should give an extra "double" the handicap for, but again that is nonsense. If I had written Komodo for 32 bit hardware specifically, I would have done it differently and it MIGHT be a few percent slower. Even if you run a full blown 64 bit program on 32 bit hardware it does not take a 2 to 1 hit or even close to it. (Komodo takes a pretty big hit because I never gave it even passing consideration but Stockfish payed attention and doesn't take very much of a hit.)

By the way, that is one of the things that probably distorted your 1000 to 1 figure but it only explains perhaps 30 percent (I don't know how optimal the 32 bit Crafty stuff is.) Of course when you are this far off 30% is a lot.

Crafty was an active project in 1995, is still an active project today, so in 1995 it fit what was available pretty well, today it fits what is available pretty well. And gives a pretty good idea of what hardware has done, although obviously we are overlooking some types of hardware and are primarily talking about the microprocessor(s) from Intel. But it is the most popular thing going, of course.

numbers like 55x are simply wrong.

The number is 100x, not 55x as I explained earlier. I really needed to use my notebook machine for doing this test but I interpolated the result - the notebook is almost exactly 1.95 to 1 slower than the i7 (when considering 1 core chess.)

I actually believe that 200x is probably closer to reality, simply for the reasons you stated and I agreed with. Rebel was not designed to run on a modern computer and I would estimate that impact to be approximately 2 to 1. But we have no way of knowing for sure.

This whole business is inherently slippery and open to interpretation. I want to stick with the 100 to 1 value simply because it's the only value that I actually have hard numbers on. For example if we try to guess at a bunch of credits and debits to give for "this, that and the other thing" we are going to end up with an extremely contrived estimate.

So since we know exactly what we have, I can run a few games. If the results are about even I will stop. If Robbo wins too much, I will double the handicap again. What's important is the data and the fact that we understand how it was arrived at. We can always try to figure out what it means later. It will give us something to argue about

And if we go back 20 years rather than just 15, it would be much more significant. I only mentioned 1995 because I had some data from that period.

I went back to 1994 with Genius, only because it was what I could get working. One thing I learned is how fast software becomes obsolete. This is true even with Crafty where you have access to the sources! I think if I still had all the source code to Rexchess I would not be able to get it working.

Prior to that, my machines were a bit "bigger" if you know what I mean. In 1994 I broke 7M nps with Cray Blitz when we ran on the T932 (32 Cray cpus). No PC numbers back then. The first time I got a fortran compiler for a PC was around 1994, and Cray Blitz searched a blazing 100 nodes per second, because of the vector stuff we were doing that was just dog-slow on a PC platform with no memory bandwidth or vectorizing.

I never had this kind of hardware to play with, but I did have quads to play with in the mid 90's, they were DEC alphas and as Dann Corbitt points out, they blew away the Pentiums and PC's.

Don · Post by **Don** » Fri Sep 10, 2010 3:10 am

Dann Corbit wrote:I remember when the DEC Alpha first came out, it smoked the Intel 80x86 and Mac 68000 architectures. I compiled a 64 bit program on it, and no other machine at my disposal even came close. As I recall, Intel bought or copied or in some other way came up with the same technology that DEC had to get the same throughput shortly afterwards, but for a while, the Alpha chips had a real edge.

It seems to me that technology often goes in fits and starts instead of a smooth transition, but when you plot the advances as an exponential chart, they always seem to track pretty smoothly because of the log nature of the y axis.

I worked on the Alphas with Cilkchess and going back to my Sparc Desktop or my Pentium thinkpad was like going from a Lamborghini to a Ford Falcon!

bob · Post by **bob** » Fri Sep 10, 2010 4:24 am

Don wrote:
bob wrote: First, I have seen several doubles. With Crafty, a P5/133 hit 30K nps. The P6/200 went to 75K. More than 2x faster. I've seen similar things comparing core2 to the original crappy core intel (their first 64 bit stuff was just ugly).

I did manage to locate some P5/133 printed output. Not much, but at least a reference, and as I had thought, we were hitting 30K nodes per second on this box in 1995. I just did some testing on my 3-year-old dual quad intel (2.33ghz E5345 xeon) and hit 30M on quite a few positions, 22-24 was the average.

For me, that is a clean factor of 1,000 (30K to 30M) if you restrict things to just one physical CPU (my i7 6-core numbers were over 30M).

These numbers have no meaning unless they are put in some kind of context. Saying that you "hit 30M on quite a few positions means nothing" Also, I don't see how you can possibly be doing 30 million nodes per second on a 3 year old computer on 1 core, that is just not believable. Are you talking about some kind of simple endgame or something?

I just did some testing on my 3-year-old dual quad intel

Notice the "dual quad". Why would I want to use 1/8th of a fairly current machine to compare to 100% of what was available in 1995???

Your numbers do not even begin to reconcile against the benchmarks that have been published for Rebel.

Because your test is flawed. You are running a program optimized for old hardware. On new hardware. What would you expect, magic? I can't think of the number of things I have changed from 1995 to 2010, purely because of the hardware changes.

Also, the 55 to 1 number I published is going from the P90 to a slower core 2 duo notebook computer, but I doubled that number because I am comparing to an i7-980x which is currently the fastest clocked Intel and best chip too. The difference in those are 1.95 to 1. So my rounded number going from P90 to the fast i7-980x is 100 to 1 considering a single core. I think you are either missing a zero or adding a zero because if they are this far in disagreement it is pretty hard to explain.

The math is simple. 30K on a P5/133, which is actually a bit faster than the hardware we are talking about. Here's a position from the 8-core 3 year old box I am running right now:

time=18.65 mat=-3 n=469233374 fh=99% nps=25.2M
time=15.35 mat=-3 n=440407794 fh=99% nps=28.7M
time=9.33 mat=6 n=234446744 fh=92% nps=25.1M
time=9.34 mat=6 n=247113766 fh=98% nps=26.5M
time=1.72 mat=6 n=50326785 fh=99% nps=29.3M

Average is somewhere in the 27M range (those are 5 consecutive moves from a random game played on ICC today). How hard is it to correctly divide 27M by 30K? Real close to 1,000? Or just 900x if you want to be real accurate...

That's not imaginary numbers. And my NPS today is comparable to the NPS over the past 15 years, in terms of how it is computed. Search depths are not comparable, and I am not for certain how the searches compare. I know that depth N today would be absolutely crushed by depth N from 1995, but there would be one huge time advantage for the 1995 program, since it was way less selective...

I vote that your figures are the most suspect because I'm comparing apples to apples (same exactly program, same exact test) and you are looking a 15 year old log files of a significantly different version of Crafty and estimating the nodes per second based on a spot check of these 15 year log files.

I am not "estimating" anything. I am taking actual NPS numbers from the P5/133mhz box I used in 1995, and comparing them to the NPS numbers from today's Crafty running on 3 year old 2.33ghz dual quad-core xeon box. The nodes are comparable. Both were doing 64 bit arithmetic, but on 32 bit hardware back then. Similar in most regards, move generation, evaluation, structure of search/quiesce/etc. Yes there are differences, but a lot of similarities. Most of the differences don't affect nps, but do affect total nodes for a specific depth, drastically.

In other words it's really difficult to really know what it is you are actually seeing and plus you have a reputation for big time exaggeration (I think you are honest, but you have a gift for hyperbole and exaggeration.)

Strange comment since most of my "hyperbole" typically gets backed up by raw data rather than speculation. This 1000x is certainly true for Crafty, nothing to discuss or argue about. I've given the numbers. I suspect someone can find some performance numbers for a P5/133 version of crafty independently. I have a pretty good recall of NPS milestones for Crafty, most particularly the 30K to 75K from P5/133 to P6/200. Today's numbers anyone can verify and many have previously posted numbers that exceed mine since the machine I am quoting is 3 years old (processor was released 11/06 so it is almost 4 years old, actually).

I don't think you can ignore the 6-core issue since that is clearly a hardware advance over the old processors that had 1 core and no cache on the chip back in the case of the P5. Some of that speed gain is 64 bit stuff, some is multiple-core stuff, some is improved cache/OOE stuff, but 30K to 30M can't be ignored.

I think you are trying to claim some of the software engineering gains by waving your hands and saying that improvements in MP implementation do not count.

I believe a big part of the gain in chess software IS engineering improvements which are not of a pure chess nature. In another post you said, "alpha beta works for chess, checkers, othello, go etc." and that is your justification for saying that the software aspects of MP do not contribute the quality of a chess program. That is total nonsense, I'm sorry. If alpha beta is the justification for throwing it completely out of the software side of the equation then what about search enhancement that would apply to any game? Are you going to argue that this is just a technique to better utilize the hardware?

I am claiming _exactly_ that. Why? Because in 1995 I had a _better_ parallel search implementation than I do today. In fact, in 1989 I had that _same_ "better implementation." SO how does that get counted as something new between 1995 and 2010? That's not hand-waving. That is logic. Or are we going to compare 1995-2010 for hardware, but 1950-2010 for software? If we cut it off at 1995, parallel search was not new. Certainly not to me. First parallel search of mine ran on a dual-cpu univac box in 1978. Won the world championship using 2 cpus in 1983. Won it again using 4 cpus in 1986. By 1989 DTS was completed, as I finished my dissertation during the previous summer. So no, I don't see why we dredge up this as something new since 1995. By 1995 it was simply state-of-the-art...

Hard to understand my position now???

We _are_ talking about 1995 to 2010, correct? To set the record straight and keep the discussion on a rational basis...

Clearly, any search enhancement such as LMR and null move pruning should not be considered software improvement because they are not chess specific? I don't see how hard work on the MP implementation is not part of what makes some programs better than any other. I always thought Crafty did this better than most other programs, are saying that is totally irrelevant?

In this context, _totally_ irrelevant. This parallel search already existed prior to 1995... So why is it a software enhancement produced _since_ 1995? Shoot, it would not be included in 1985, as it was already done then. We had something about as good as what I do today in 1984, in fact.

And that means if we have this same discussion in 10 years and people have figured out how to get a lot more out of the MP implementation, we are not allowed to consider it a software improvement? I think you are being pretty unreasonable here

Not me. You. But not unreasonable either. Just not thinking this thru. 1995 - 2010. Any hardware improvements in that window count. Any software improvements in that window count. Anything prior to 1995 is irrelevant for this discussion. And parallel search (at least in my case) is certainly outside that window.

If it's just a matter of us thinking about it differently, then it was just a misunderstanding. Let me clarify my belief to set it apart from what you are trying to dilute it to mean. I believe that the enormous hard work of software engineering by the program authors (including you and I) in every apsect of an actual chess program (including SEARCH or OPTIMIZATIONS designed to get more out the CPU or compiler) contributes enormously to the success of the programs. If you take this to mean something else then we are talking about totally different things.

I don't disagree. It could be argued either way. BUT. Parallel search is not part of the discussion. That was old hat in 1995, already. So how can we call that something that was improved?

Other things. Rotated bitboards? Crafty 6.0. early Summer 1995. So it is on the boundary. We are in September 2010. That would make the cutoff September 1995? Even rotated bitboards won't count as an improvement.

For your other point, that is sticky. Although I still say multi-cores are in, SMP search is out. because it was old news in 1995. But when you factor in software and hardware together, how do you separate them? For example 64 byte cache fills makes the 4-bucket hash idea pretty efficient. Without that, the 64 byte cache fills is less effective. I think it a bit disingenuous to try to count this as a software improvement. It is more of a "software porting" issue, where you are trying to make the program fit the new hardware to take advantage of the faster hardware stuff, and then claiming a lot of that gain goes to the software. That is illogical, IMHO. If we went back further in time and tried to make the argument that bitboards were designed to take advantage of 64 bit stuff, that is not a port, that's a re-design and I would not argue the point, although the software clearly could not be credited with the entire gain. But, bitboards are out. They were used in the early 70's already.

So we end up here. Suppose we say that when hardware improves, we give the CPU 66% of the credit, and the programmer 33% for having to modify the program somewhat to take advantage of the new hardware? But when the programmer adds something new that he could not afford to do until new hardware came along, we give the hardware 33% of _that_ credit_ and the programmer only gets 66%.

Messy, eh?

You are wanting to take this into never-never-land. And we are not going to get anywhere.

My take is this:

(1) software includes improved chess-related algorithms. Search ideas. Evaluation ideas. Selectivity ideas. Endgame tablebases, bitbases, etc. Everything that is related to chess. These ideas are really machine-independent, although of course some might rely on a faster speed to keep from being too speculative or dangerous (try null-move R=3 with 1990 5 ply searches, for example).

(2) hardware includes anything related to the hardware improvements. If a programmer makes changes to take advantage of new hardware, he is making the changes because of the hardware, and that's just a hardware thing he is incorporating into his program. These program changes are solely made to use new hardware to go faster. The only qualitative change to the program is the increased depth/speed due to using some new hardware feature, as opposed to making the program better across many platforms.

To do it any other way makes this meaningless.

So there is no possible way I can win this "argument" if you systematically remove all aspects of software engineering from the equation except for the "evaluation function" which even you would probably have a difficult time claiming is not chess specific.

See above. This is not black and white. Unless we agree to use my definitions. Otherwise we can argue forever about what percent of a gain is from new hardware, and what percent goes to the programmer for having to change something to take advantage of that new hardware. Your way gives every advantage to the programmer, and very little to the hardware, since almost no hardware advance is uniformly faster without any software changes.

Your approach would do well in the 700mhz to 1.4ghz clock speed increase, but if you look at Hennesy/Patterson architecture book, there's a neat graph that shows how much of the hardware improvements have come from raw clock speed improvements, and how much has come from actual design changes, (OOE, multiple pipes, speculative execution, branch prediction, new instructions, etc.) Many of those latter things require some programming changes to use them optimally. Just doubling the clock speed requires nothing.

If we are going to use that metric, this is over before it starts. P5 at 133mhz, vs 4ghz today? Factor of 30 at best? Not even close to reality.

I don't think it is sensible to compare a 1995 program running on modern hardware to get the hardware speedup. A program is always optimized to whatever it runs on, as opposed to what it might run on in 20 years.
I have already conceded this point, but this is not worth an order of magnitude difference. But some of this should be considered software advances. If the compilers are better is that a software or hardware thing? You want the software side to take a hit for this too don't you? If I hand optimize a piece of my code should that be thrown out too because the old program didn't have that? It's not wonder we disagree if you consider most actual software engineering aspects of this irrelevant.

In principle however I agree with you on this point - running a 16 year old program on new hardware is a handicap, just as it would be a handicap if it were done in the other direction. I know you probably think this is worth a few doublings but it isn't.

The 64 versus 32 bit thing is almost completely irrelevant. Some of our very best program are still 32 bit programs. This is an example of something that you probably think we should give an extra "double" the handicap for, but again that is nonsense. If I had written Komodo for 32 bit hardware specifically, I would have done it differently and it MIGHT be a few percent slower. Even if you run a full blown 64 bit program on 32 bit hardware it does not take a 2 to 1 hit or even close to it. (Komodo takes a pretty big hit because I never gave it even passing consideration but Stockfish payed attention and doesn't take very much of a hit.)

My point is that I am not trying to _estimate_. The best idea I could come up with is to take a representative program from 1995, which Crafty certainly was, and take the NPS from a P5/133 since I have real numbers from that. We could extrapolate to the P5/90 since that was purely a clock speed change, no internal design change at all. And I have a not-so-representative program today that is significantly behind the best. But I have accurate NPS measurements on recent hardware, and we can get someone to provide absolutely current numbers for the fastest i7 we can buy.

If we take that as at least one measurement, 1000x is the magic number. In 1995 the P5/133 was the fastest thing available early on. Regardless of how much money you had to spend. Today you can certainly buy a 24 core box from several vendors, including some roll-it-yourself options if you prefer. If you want to keep it to single "chip" that's OK. 6-core i7 still will do 30M with Crafty. So we are _still_ at 1000:1. Using just one core throws away 83% of the hardware and drops me to 5M nps or so and a factor of 167x. That number is simply wrong. Yet it is reasonably close to the genius benchmark since it has no clue about the 5 extra cores...

By the way, that is one of the things that probably distorted your 1000 to 1 figure but it only explains perhaps 30 percent (I don't know how optimal the 32 bit Crafty stuff is.) Of course when you are this far off 30% is a lot.

There is no "distortion" when you take two actual measurements and divide the big number by the small number to see how many times bigger the big number is, is there? I can't begin to guess what part of that 1000x is due to 64 bits, what part is due to good bsf/bsr, what part is due to SMP, what part is due to new instructions (since P5) like CMOV and such.

Crafty was an active project in 1995, is still an active project today, so in 1995 it fit what was available pretty well, today it fits what is available pretty well. And gives a pretty good idea of what hardware has done, although obviously we are overlooking some types of hardware and are primarily talking about the microprocessor(s) from Intel. But it is the most popular thing going, of course.

numbers like 55x are simply wrong.
The number is 100x, not 55x as I explained earlier. I really needed to use my notebook machine for doing this test but I interpolated the result - the notebook is almost exactly 1.95 to 1 slower than the i7 (when considering 1 core chess.)

I actually believe that 200x is probably closer to reality, simply for the reasons you stated and I agreed with. Rebel was not designed to run on a modern computer and I would estimate that impact to be approximately 2 to 1. But we have no way of knowing for sure.

Suddenly, my "hyperbole" is becoming "truth" it seems. By my measurement, I got 166x if you only use one cpu. But that is a flawed way of measuring hardware speed. Why throw out 5/6 of recent hardware gains???

This whole business is inherently slippery and open to interpretation. I want to stick with the 100 to 1 value simply because it's the only value that I actually have hard numbers on. For example if we try to guess at a bunch of credits and debits to give for "this, that and the other thing" we are going to end up with an extremely contrived estimate.

I want a real number, but I want the full 6-core I7 numbers, not a single-core number. That is simply a flawed hardware measurement. And, as I explained, it is not a software issue at all as SMP search was already well-known and implemented by at least a few of us (dual-cpu machines were not common at the PC level, but they were around even with a 386 platform, I ran on a 30 cpu Sequent using the 386 chip for reference, in the 80's).

So since we know exactly what we have, I can run a few games. If the results are about even I will stop. If Robbo wins too much, I will double the handicap again. What's important is the data and the fact that we understand how it was arrived at. We can always try to figure out what it means later. It will give us something to argue about

And if we go back 20 years rather than just 15, it would be much more significant. I only mentioned 1995 because I had some data from that period.

I went back to 1994 with Genius, only because it was what I could get working. One thing I learned is how fast software becomes obsolete. This is true even with Crafty where you have access to the sources! I think if I still had all the source code to Rexchess I would not be able to get it working.

Prior to that, my machines were a bit "bigger" if you know what I mean. In 1994 I broke 7M nps with Cray Blitz when we ran on the T932 (32 Cray cpus). No PC numbers back then. The first time I got a fortran compiler for a PC was around 1994, and Cray Blitz searched a blazing 100 nodes per second, because of the vector stuff we were doing that was just dog-slow on a PC platform with no memory bandwidth or vectorizing.
I never had this kind of hardware to play with, but I did have quads to play with in the mid 90's, they were DEC alphas and as Dann Corbitt points out, they blew away the Pentiums and PC's.

CRoberson · Post by **CRoberson** » Fri Sep 10, 2010 5:28 am

Don wrote:
uaf wrote:I'm pretty sure the Pentium 90 was the fastest processor at that time. Intel introduced the Pentium Pro family (to which the P200 belongs) in late 1995.
Yes, I'll assume that circa 1994 CG3 was running on the P90.

Any idea of the speed? I don't suppose anybody has a Pentium 90 laying around do they?

Don

I have an operational Pentium 90.

rbarreira · Post by **rbarreira** » Fri Sep 10, 2010 2:15 pm

Regarding the comparisons with the DEC alpha, I think that even if it had continued, X86 would look much better in comparison today. Back then CISC had a big penalty in terms of hardware, the complex instruction decoding took a big portion of the chip.

These days it's a small portion of the chip, so the overhead of CISC is much smaller and it also has performance benefits such as smaller code size.

Don · Post by **Don** » Fri Sep 10, 2010 4:11 pm

CRoberson wrote:
Don wrote:
uaf wrote:I'm pretty sure the Pentium 90 was the fastest processor at that time. Intel introduced the Pentium Pro family (to which the P200 belongs) in late 1995.
Yes, I'll assume that circa 1994 CG3 was running on the P90.

Any idea of the speed? I don't suppose anybody has a Pentium 90 laying around do they?

Don

I have an operational Pentium 90.

Good. What we need is a comparison on 2 different platforms, the P90 and the newest thing you have.

Are you willing to try the following experiment: ?

Find the oldest Crafty you can that runs on both platforms and run some kind of time test on both machines. Then do the same with the newest Crafty you can find. I don't know which old Crafty's you can still get but even if it's not very old it would still be useful. The challenge will be to find a program that still runs on both platforms.

I know that is probably a bit of work - but if you are willing to do it I would like to see the results.

bob · Post by **bob** » Fri Sep 10, 2010 5:18 pm

rbarreira wrote:Regarding the comparisons with the DEC alpha, I think that even if it had continued, X86 would look much better in comparison today. Back then CISC had a big penalty in terms of hardware, the complex instruction decoding took a big portion of the chip.

These days it's a small portion of the chip, so the overhead of CISC is much smaller and it also has performance benefits such as smaller code size.

The alpha was not exactly RISC in that sort of comparison. It's advantage was that it actually had a decent number of registers and didn't have instructions that varied from 1 to N bytes long. Of course things like a 256 bit bus and such didn't hurt either.

Don · Post by **Don** » Fri Sep 10, 2010 5:46 pm

Bob,

Here is the big problem with what you say about Crafty: You go back in time with an ancient log file and compare it with a modern log file. Then you draw conclusions about hardware advances.

That is perfect from your point of view, because you claim there has been only minor software advances and it's been almost all hardware. So from your point of view this is the perfect test. Any software changes since then should have almost no impact on the nodes per second, or the strength or anything else.

My claim is that software has had roughly the same impact as hardware and such a test from my point of view would not reflect any software advances.

And I am not willing to take your word for it that nothing has changed that would have any impact on this. You are almost certainly going to respond by saying that you have always done things the same way or that any changes did not affect the basic speed. But even if you THINK that is the case, it's probably not and I'll tell you why in the next paragraph.

Recently I discovered that over a period of just a few months my program has changed more than I would have believed. I have re-written Komodo using the old Komodo as template - in affect I'm cloning my own program.

When the new program failed to play as well by a substantial margin I started to look for any major differences, ignoring the small things that I didn't think made any difference. Wow, how wrong I was. I came to appreciate that Komodo was packed with minor speedups and minor ELO things that by themselves hardly made any difference. The things I kept discovering seemed like a bottomless pit. In my mind I had greatly simplified the entire progress I had made over the last 3 years or so and had boiled it down to perhaps half a dozen big things and few things of little consequence, but that was grossly in error.

When you are talking about a well developed program, you just cannot compare them over time - so I reject your premise that Crafty has not improved in the software or that it's still the same basic program. If you compare a 15 year old Crafty to a modern Crafty you cannot possibly call this a pure hardware comparison. Even if you THINK you can, it just means that you forgot a lot of the little things because none of them by itself made a substantial difference. And don't say this doesn't apply to nodes per second - if you say that I'm not allowed to improve the nodes per second either (except by hardware only) then I throw my hands in the air.

So it's my belief that the Rebel time comparison, as FLAWED as it probably is, AT LEAST has the merit of keeping one thing constant. When I'm developing Komodo I try to follow the same principle - I don't change 10 unrelated things at once and then test - if I get a large improvement I have no clue which subset of those 10 things made a difference.

I think a reasonable way to do the hardware comparison test is to find a version of Crafty that is BETWEEN the 16 year span - perhaps something that is 8 years old and will run on both platforms but was written for neither. Remove any special assembly language optimizations and then test it on an ACTUAL P90 as well as an i7. Then we have something that is about as fair as we can make it. It's certainly more fair than taking an old program and comparing it a completely different program from a different era.

By the way, please tell me why we have to pretend parallel programs did not exist in the mid 90's? If we do that can I also pretend that commercial programs didn't exist? In both cases we are talking about more cutting edge stuff. Neither is going to make much of a difference because quad MP is only worth about 100 ELO and Crafty was probably close in ELO to the best commercial programs back then.

harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances