harware vs software advances

mhull · Post by **mhull** » Fri Sep 10, 2010 6:32 pm

Don wrote:Bob,

Here is the big problem with what you say about Crafty: You go back in time with an ancient log file and compare it with a modern log file. Then you draw conclusions about hardware advances.

That is perfect from your point of view, because you claim there has been only minor software advances and it's been almost all hardware. So from your point of view this is the perfect test. Any software changes since then should have almost no impact on the nodes per second, or the strength or anything else.

My claim is that software has had roughly the same impact as hardware and such a test from my point of view would not reflect any software advances.

And I am not willing to take your word for it that nothing has changed that would have any impact on this.

He said he's trying to get a 1995 version to compile on new hardware, so he could run cluster tests with it. Did you miss that?

http://www.talkchess.com/forum/viewtopi ... 493#369493

mhull · Post by **mhull** » Fri Sep 10, 2010 6:36 pm

Don wrote:By the way, please tell me why we have to pretend parallel programs did not exist in the mid 90's?

Are we reading the same posts? Bob was accusing you of treating parallel search as something new since 1995. Now you are accusing him of the same thing?

Don · Post by **Don** » Fri Sep 10, 2010 7:02 pm

mhull wrote:
Don wrote:Bob,

Here is the big problem with what you say about Crafty: You go back in time with an ancient log file and compare it with a modern log file. Then you draw conclusions about hardware advances.

That is perfect from your point of view, because you claim there has been only minor software advances and it's been almost all hardware. So from your point of view this is the perfect test. Any software changes since then should have almost no impact on the nodes per second, or the strength or anything else.

My claim is that software has had roughly the same impact as hardware and such a test from my point of view would not reflect any software advances.

And I am not willing to take your word for it that nothing has changed that would have any impact on this.
He said he's trying to get a 1995 version to compile on new hardware, so he could run cluster tests with it. Did you miss that?

http://www.talkchess.com/forum/viewtopi ... 493#369493

I'm addressing the point where he has already drawn conclusion based on 15 year old log entries of a different program. Did you miss that?

Having the actual 15 year old program would be a good thing then we could measure how much it has improved directly. That would still not be completely relevant because Crafty is over 300 ELO weaker than our best programs of today so it's still not representative.

I'm trying to avoid the "rounding" affect. If we just assume that Crafty represents the state of the art in both era's then we are probably already off by a factor of 2 to 1 (in the direction Bob wants it to be.) Bob wants to round everything in his direction. For instance if his quad does 4x more nodes per second, he see's that that as responsible for a 4 to 1 hardware improvement when that is not true. A 4x faster computer gives you a lot more ELO gain than 4 1 processor computers. The issue is how much did hardware affect CHESS, so if we run handicap games we should not give an additional 4 to 1 time odds and call it "close enough."

If you keep rounding and saying it's "close enough" you are being a sloppy engineer.

That's why I'm being stubborn to go with only what we know for sure. We can do number fudging LATER but we shouldn't take wild guesses.

If Bob can get the old program working and we can run it on the old and new hardware then we can use whatever numbers that generates as our hardware estimate. I'm happy to give him the best of the Rebel estimate or the Crafty estimate (provided it's the SAME Crafty) and we can pretend that the best one is the "true" one.

Don · Post by **Don** » Fri Sep 10, 2010 7:02 pm

mhull wrote:
Don wrote:By the way, please tell me why we have to pretend parallel programs did not exist in the mid 90's?
Are we reading the same posts? Bob was accusing you of treating parallel search as something new since 1995. Now you are accusing him of the same thing?

Show me where I said that.

mhull · Post by **mhull** » Fri Sep 10, 2010 7:12 pm

Don wrote:
mhull wrote:
Don wrote:By the way, please tell me why we have to pretend parallel programs did not exist in the mid 90's?
Are we reading the same posts? Bob was accusing you of treating parallel search as something new since 1995. Now you are accusing him of the same thing?
Show me where I said that.

Quoted above: "please tell me why we have to pretend parallel programs did not exist in the mid 90's?"

He went to great pains to say exactly the opposite.

mhull · Post by **mhull** » Fri Sep 10, 2010 7:18 pm

Don wrote:
mhull wrote:
Don wrote:Bob,

Here is the big problem with what you say about Crafty: You go back in time with an ancient log file and compare it with a modern log file. Then you draw conclusions about hardware advances.

That is perfect from your point of view, because you claim there has been only minor software advances and it's been almost all hardware. So from your point of view this is the perfect test. Any software changes since then should have almost no impact on the nodes per second, or the strength or anything else.

My claim is that software has had roughly the same impact as hardware and such a test from my point of view would not reflect any software advances.

And I am not willing to take your word for it that nothing has changed that would have any impact on this.
He said he's trying to get a 1995 version to compile on new hardware, so he could run cluster tests with it. Did you miss that?

http://www.talkchess.com/forum/viewtopi ... 493#369493
I'm addressing the point where he has already drawn conclusion based on 15 year old log entries of a different program. Did you miss that?

I saw that, but I didn't take that as his definitive answer, just as a ballpark estimate. He would rather run the cluster test on the old program for a more definitive answer. But it seems you are quibbling over statements taken out of the larger context.

bob · Post by **bob** » Fri Sep 10, 2010 8:37 pm

Don wrote:Bob,

Here is the big problem with what you say about Crafty: You go back in time with an ancient log file and compare it with a modern log file. Then you draw conclusions about hardware advances.

Yes. Because I am comparing something that is almost uniformly constant over the 15 years of Crafty development, in "what is a node?" That has not changed significantly. At times eval might be a bit slower, at other times it might be a bit faster. But in general, evaluation has not changed dramatically in terms of computational cost, at least for Crafty. It is done quite differently, in quite a few places. Move generation has not changed appreciably since rotated bitboards in version 6.0. Magic is no faster, but does offer additional flexibility that I take advantage of in a few software improvements. The basic structure of search/quiesce has not changed much. Reductions don't add a lot of code, so they have little effect on NPS. Ditto for extensions. Pruning.

I can't think of any better measure than to take a good program from 1995 on hardware from 1995, and compare that to a good program in 2010 on good hardware in 2010. And about the only thing that can be compared reasonably is NPS, since "node" has not changed significantly.

Another answer is to get a 1995 box, with a 1995 program, and play against a wide gauntlet to get a good rating. Then take that program to 2010 and use 2010 hardware and repeat. Significant error is included, because a lot of hardware advances will not help such a program until it is modified to take better advantage of them.

For me, the NPS is certainly the optimal way to measure hardware improvement. The 1995 program on 1995 hardware was a pretty well-optimized "system" (speaking of my program specifically, here). DItto for 2010 program on 2010 hardware. Hardware speed is really all about NPS and nothing more, from my perspective. Yes, we might elect to do something that drops NPS in return for more accuracy somewhere else, but that's a programming issue, not a hardware issue. And it gives the "software" component a bit of an unfair boost, to be sure. But I don't see how to make it perfect unless we write two new programs, one for 1995 and one for 2010, and somehow agree on what to leave out of the 1995 program to make it representative...

I don't believe my simple NPS comparison is either (a) inaccurate; (b) exaggerated; (c) fictional; or anything else other than an accurate idea of what one pretty good programmer could do in 1995 vs 2010, in terms of NPS.

That is perfect from your point of view, because you claim there has been only minor software advances and it's been almost all hardware. So from your point of view this is the perfect test. Any software changes since then should have almost no impact on the nodes per second, or the strength or anything else.

You are including _way_ more than I said. I am suggesting using the NPS difference from 1995 for Crafty, compared to 2010 Crafty, as a benchmark number to give us the hardware improvement over that 15 year period of time.

To compare software, we need a 1995 program and 1995 hardware, and then the same 1995 program on 2010 hardware. But it doesn't need to be a lousy example of a 1995 program. Parallel search was old news in 1995. Genius never had that. So run it today and you immediately throw away 5/6 of the hardware advantage of today. Not reasonable, IMHO.

My claim is that software has had roughly the same impact as hardware and such a test from my point of view would not reflect any software advances.

By the same token, omitting one of the _major_ hardware advances of the past few years (multiple cores) is also flawed. Badly flawed.

So take that NPS measure, which for crafty _is_ clearly 1000x faster today on a single-chip box compared to the P5/133. I'll even give you the extra 10K nps and just stick at 30K when it was closer to 20K on the P90. But I don't have any data since I never ran on one myself (I went from a Sparc to a P5/133 as my first Intel platform for ICC (ICS at the time). If we used a P90 speed, it becomes more like 1500x, which is only worse.

And I am not willing to take your word for it that nothing has changed that would have any impact on this. You are almost certainly going to respond by saying that you have always done things the same way or that any changes did not affect the basic speed. But even if you THINK that is the case, it's probably not and I'll tell you why in the next paragraph.

Recently I discovered that over a period of just a few months my program has changed more than I would have believed. I have re-written Komodo using the old Komodo as template - in affect I'm cloning my own program.

When the new program failed to play as well by a substantial margin I started to look for any major differences, ignoring the small things that I didn't think made any difference. Wow, how wrong I was. I came to appreciate that Komodo was packed with minor speedups and minor ELO things that by themselves hardly made any difference. The things I kept discovering seemed like a bottomless pit. In my mind I had greatly simplified the entire progress I had made over the last 3 years or so and had boiled it down to perhaps half a dozen big things and few things of little consequence, but that was grossly in error.

When you are talking about a well developed program, you just cannot compare them over time - so I reject your premise that Crafty has not improved in the software or that it's still the same basic program.

Reject what you want. But you are rejecting something _I_ have not said. I said that a "node" in Crafty has not changed significantly in 15 years. I'll go even further and say that a "node" in my chess programs dating back to 1978 where we first used brute-force has not changed significantly. Prior to that nodes were drastically different, because the search was drastically different, etc. But the same basic negamax search framework is quite similar.

I'd be happy to send you the late 1995 evaluation from Crafty so that you can compare it to today. You could run 'em in parallel to see what the difference in speed is. I doubt it is significant. You can compare move generation the same way. Or Swap() (SEE). Or Make/Unmake. Those just haven't changed very much at all. I spent a ton of time to get rid of black/white duplication. Made it no faster, just easier to change later.

In short, a node for Crafty today is comparable to a node in Crafty in 1995. If I can get this damned thing to quit seg faulting because of confusion about what "long" means on 64 bit hardware (was always 32 on 32 bit boxes, but is breaking things here and there on 64 bits) I will be able to answer this particular question exactly. What if the NPS for 1995 Crafty is going to be roughly 30M on the same hardware. Or to avoid SMP strangeness, suppose I run 1995 and 2010 crafty on a single CPU and measure NPS and they are fairly close. Will that be convincing that NPS is comparable and useful as a measure of hardware speed???

The single-cpu test would be easier, because I did not release a SMP version of Crafty until late 1996, a year later. I had one much earlier as we ran Crafty on the Cray's for a bit since they were a good 64 bit box which I needed.

I'll at least post the numbers for p5/133 and the E5345 boxes, and an equivalent run of current crafty on the E5345 for comparison. It is going to take a bit more time to get the 1995 version running. Any guesses on the NPS today? 1995 p5/133 was 30K. Today, on a single-cpu E5345, current Crafty hits about 3.5M-4.0M nps. Take a guess at what 1995 Crafty is going to do. I'm expecting something within a factor of two at worst, probably closer. Might even be faster if the old eval and stuff is simpler, might be slower if the new version is better tuned to the new hardware. I suspect pretty close, but might be wrong.

I will continue working and post the 1995 on new hardware numbers, assuming I can eventually stop the crashes and make sure it is working correctly without any ugly unseen bugs...

More later, as surely you'd have to agree that the above comparison would be "dead on". It is not going to be 1000x since I will initially not use SMP, but I am going to test that as well once I get past the current issues.

If you compare a 15 year old Crafty to a modern Crafty you cannot possibly call this a pure hardware comparison. Even if you THINK you can, it just means that you forgot a lot of the little things because none of them by itself made a substantial difference. And don't say this doesn't apply to nodes per second - if you say that I'm not allowed to improve the nodes per second either (except by hardware only) then I throw my hands in the air.

What about my proposal above? 1995 crafty data (30K nps average) against _same_ program on 2006 hardware (my E5345 box). No SMP to start with although I clearly had parallel search in 1995, or 1985, or even 1978. I'm expecting something in the range of 160x, since it will be giving up 7/8 of the hardware. Will need, at some point, to get some good 6-core i7 numbers since the E5345 is a 2006 processor, some 4 years old now.

So it's my belief that the Rebel time comparison, as FLAWED as it probably is, AT LEAST has the merit of keeping one thing constant. When I'm developing Komodo I try to follow the same principle - I don't change 10 unrelated things at once and then test - if I get a large improvement I have no clue which subset of those 10 things made a difference.

I think a reasonable way to do the hardware comparison test is to find a version of Crafty that is BETWEEN the 16 year span - perhaps something that is 8 years old and will run on both platforms but was written for neither. Remove any special assembly language optimizations and then test it on an ACTUAL P90 as well as an i7. Then we have something that is about as fair as we can make it. It's certainly more fair than taking an old program and comparing it a completely different program from a different era.

I _believe_ I can do better. I have a real 1995 version that I am hell-bent on making work on my cluster hardware. It does have some ASM, but then again, so does the current version (to get to bsf/bsr/etc.) I'd love to just "steal" my current inline code to solve part of this, but it won't work. I renumbered the bits a few years ago, which makes this impossible.

By the way, please tell me why we have to pretend parallel programs did not exist in the mid 90's?

Don. Get a grip and relax... _YOU_ have been saying we can't use SMP today. I have been saying we should. _YOU_ have been saying that the SMP of today is a lot of software. I have been saying it is not, it already existed prior to our baseline year of 1995. Only problem is, in 1995 the pentium was a one-core processor. Today's I7 has up to 6. Possibly 8 by now although I have only run on a prototype, and am not sure they are ever going to release one for unknown reasons. So 1-core p5 in 1995. 6-core i7 in 2010. Makes genius/rebel moot since they won't use 2010 hardware effectively. That has been my issue with either (a) counting parallel search as a software improvement since 1995. It is not. DTS was running in 1988, the year I finished my dissertation. (b) using a non-SMP 1995 program is a poor measure. Yes it will give a good (possibly) single-cpu speed comparison, but it will be off by 6x.

If we do that can I also pretend that commercial programs didn't exist? In both cases we are talking about more cutting edge stuff. Neither is going to make much of a difference because quad MP is only worth about 100 ELO and Crafty was probably close in ELO to the best commercial programs back then.

quad-mp should be worth at _least_ 100. I'd figure a speedup of something in the 3.1-3.4 range based on lots of testing. And we are talking about 6-way at least. Something in the 4.5x range. 2+ plies. Very significant. Very definitely not software.

bob · Post by **bob** » Fri Sep 10, 2010 9:06 pm

Don wrote:
mhull wrote:
Don wrote:By the way, please tell me why we have to pretend parallel programs did not exist in the mid 90's?
Are we reading the same posts? Bob was accusing you of treating parallel search as something new since 1995. Now you are accusing him of the same thing?
Show me where I said that.

Don wrote: I think you are trying to claim some of the software engineering gains by waving your hands and saying that improvements in MP implementation do not count.

I believe a big part of the gain in chess software IS engineering improvements which are not of a pure chess nature. In another post you said, "alpha beta works for chess, checkers, othello, go etc." and that is your justification for saying that the software aspects of MP do not contribute the quality of a chess program. That is total nonsense, I'm sorry. If alpha beta is the justification for throwing it completely out of the software side of the equation then what about search enhancement that would apply to any game? Are you going to argue that this is just a technique to better utilize the hardware?

I suspect that is at least part of what he was talking about. You certainly are implying that the "software aspects of MP" are significant. And they are. But they are _pre-1995_ and don't count. We could go back to pre-alpha/beta and the software would get a huge boost from adding alpha/beta. But in 1995 we already had that too. Yes, I am saying that post-1995 MP improvements don't amount to much. In fact, my MP implementation today seems to be as good as any SMP-approach around, and yet it is worse than my 1988 DTS implementation. I just refused to rewrite my search into an iterated (loop) approach and dump the recursion, because I like the simplicity of the recursive negamax approach.

So, assume, that in 1995, everyone was using "state-of-the-art" software paradigms. Of course, not everybody was doing parallel search. But it was available, had been used for 20 years already at that point. Ditto for null-move. Singular extensions. Some were experimenting with reductions but I don't recall any explicit information so we can leave that out. Forward pruning? Genius certainly did. Crafty had futility pruning in that time frame somewhere. In 1995 Crafty appeared to be pretty state-of-the-art, not having any program that it could not compete with. In the 1996 WMCCC event two copes were entered and finished 3rd and 4th, showing it was quite strong at the time. So that gives us at least a starting point. And I intend to measure the hardware benefit exactly by making that SOB run on my cluster hardware. Then I can run that program vs current crafty and see if I can find some way to measure both Elos accurately. yes, current crafty is 200-300 below Rybka. Can't do anything about that. Don't have a 1995 version of Rybka. In fact, there is no 1995 version of _any_ current program, except for Crafty. So that is about the only comparison that makes any sense, IMHO. Clearly we can take 2010 Crafty and compare to 1995 Crafty and get +X elo for 2010 Crafty. And quite reasonably say software improvements are X+300, since Rybka's advantage is purely software. And it would be quite difficult to argue with that.

I see two problems to pull this off. (a) getting 1995 SOB to run. (b) calculating an Elo since it is going to be way below my current gauntlet opponents. Too far below to get accurate Elo measurements. So I need to expand my current gauntlet significantly to fill in the gap between 1995 and 2010 Crafty, which will be non-trivial in itself.

But I am going to make this happen. The only issue left will be that I will know the hardware gain from 1995 to 2010 (I still believe 1000x is going to be accurate, we will see...). I will know the software gain from 1995 to 2010 since I will run both on the same hardware - any difference will be software-only. If that turns out to be +600, I suspect 1000x hardware gives more. I'm not quite sure how to measure that, yet.

I think this is going to be interesting. In 1995 Crafty was around 2400-2500 on ICC. It played in a few human tournaments and was in that same range there. It is not 4000 today, so hardware + software can't be an enormous jump each. So the final numbers might be revealing. I can do everything above, until I get to the 1995 hardware problem. Got to figure out how to slow it down to 30K nodes per second. Hard to do 1000x time handicaps at the fairly fast speeds I use, so this part will take a bit of thought... I probably have to go the other way, which is a can of worms. 1000x longer time controls to simulate today's hardware is a pain too.

So off to get the first part of this done. The software gain will be easy to measure once I have an old version to compare to new, both using equal (new) hardware. The hardware gain is easy to measure as a multiplier, more difficult to translate to +elo.

We could end up at something like +400 for software, 1000x faster hardware, is 1000x faster hardware more than +400? Logic says heck yes. 10 doublings? But first I need the software improvement...

more to follow...

Uri Blass · Post by **Uri Blass** » Fri Sep 10, 2010 10:55 pm

bob wrote:
To compare software, we need a 1995 program and 1995 hardware, and then the same 1995 program on 2010 hardware. But it doesn't need to be a lousy example of a 1995 program. Parallel search was old news in 1995. Genius never had that. So run it today and you immediately throw away 5/6 of the hardware advantage of today. Not reasonable, IMHO.

I think that it is not reasonable to take best of both worlds.

The combination of Genius and parallel search does not exist as software
so if you take parallel search as software from 1995 it means that you cannot take Genius3 and you need to find a program from 1995 when the programmer already used parallel search.

It means that you can use Crafty of 1995 that is weaker than Genius of 1995 if you want to give the old program time advantage of 1000:1

bob · Post by **bob** » Sat Sep 11, 2010 2:52 am

Uri Blass wrote:
bob wrote:
To compare software, we need a 1995 program and 1995 hardware, and then the same 1995 program on 2010 hardware. But it doesn't need to be a lousy example of a 1995 program. Parallel search was old news in 1995. Genius never had that. So run it today and you immediately throw away 5/6 of the hardware advantage of today. Not reasonable, IMHO.

I think that it is not reasonable to take best of both worlds.

The combination of Genius and parallel search does not exist as software
so if you take parallel search as software from 1995 it means that you cannot take Genius3 and you need to find a program from 1995 when the programmer already used parallel search.

that's exactly what I have been saying...

It means that you can use Crafty of 1995 that is weaker than Genius of 1995 if you want to give the old program time advantage of 1000:1

Actually Crafty was not really weaker than genius in 1995. If you back up to the very early 90's (before Crafty) genius was a significant threat. But by 1995 I don't remember Genius really being an issue on chess servers or in tournaments...

harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances

Re: harware vs software advances