Hardware vs Software

bob · Post by **bob** » Tue Jan 13, 2009 6:18 pm

Uri Blass wrote:
Don wrote:
Uri Blass wrote:
Don wrote:
Dirt wrote:
bob wrote:Rybka is NFG for this test as it has not been around long enough, and its roots have never been clearly defined...
Comparing a top program from the past, like Fritz 5.32, with a current program that is well down the list, like the current Fritz, is completely useless. That wouldn't show how much software has advanced, just Fritz. The only correct comparison is with Fritz or Hiarcs from then with Rybka 3.

While I see no reason to do it, if you want to use the same program it should be equally far down both the old and new rating lists. A candidate would be Shredder 2.0 from 1999 with the current Shredder, although I think it's much too highly rated in the 1999 list to be a fair test.

January 1999 SSDF rating list
It's ok with me to use Rybka as I believe this is good enough to make the point without any dispute.

I don't really remember what was the most common cutting edge hardware. I believe we should use the best hardware of the time that would have been used to evaluate the program on the rating lists. I guess today that would be 4 core machines but I don't know what it was back then. It may be that someone actually has that hardware still and we could rig some test.

- Don
It seems that rybka is using significant amount of memory for its internal arrays(something like 72 mbytes) so rybka needs at least computer with 128 mbytes to run.

Fortunately 128 mbytes were available even 10 years ago but it means that rybka probably cannot use more than 32 mbytes hash(it is not very important because the difference between 32 mbytes hash and bigger hash under 128 mbytes is not a big difference).

10 years ago the ssdf used P200 and they upgraded it to K6-450 in the middle of 1999.

I can agree to ssdf hardware so it means P200 against quad.
Now the problem is that the commercial versions of Fritz or Hiarcs of the beginning of 1999 did not support SMP.

Uri
I remember several programs running on dual core machines at the tournament in Padderborn - I forget which year. Are you sure nothing supported this 10 years ago?

If not, that is a problem. When it comes to software/hardware synergy you can make the argument in either direction. I could say that it's not fair because hardware isn't being utilized (and I claim most of the advance is hardware based) but you could claim that writing a parallel program is a software advance. That argument is wrong of course, because parallel programs have been around a very long time.

So this might come down to an odds match - can an older program that is crippled beat a modern uncommonly strong (and I believe not very representative) program? I think with this kind of unfair handicap this could be a relatively close match.

It would be real easy to run this match remotely via a shell, but Rybka is a Windows program and windows is not as flexible about this kind of stuff, so it would have to be conducted on a local setup somehow.
I also remember programs that used smp in tournaments but
I am afraid that not the commercial programs of that time.

It is possible to give time handicap to emulate the smp factor and in this case we need to use ponder off.

how much time advantage we have of Q6600 relative to P200?

SSDF had the following hardwares

P200
K6-450
A1200
Q6600

Uri

Q6600 is old news and there are _much_ better processors available today. Who cares what they were using, they were always 2-3 years behind leeading edge...

Uri Blass · Post by **Uri Blass** » Tue Jan 13, 2009 6:28 pm

Don wrote:
Uri Blass wrote:
Don wrote:
Uri Blass wrote:
Don wrote:
Dirt wrote:
bob wrote:Rybka is NFG for this test as it has not been around long enough, and its roots have never been clearly defined...
Comparing a top program from the past, like Fritz 5.32, with a current program that is well down the list, like the current Fritz, is completely useless. That wouldn't show how much software has advanced, just Fritz. The only correct comparison is with Fritz or Hiarcs from then with Rybka 3.

While I see no reason to do it, if you want to use the same program it should be equally far down both the old and new rating lists. A candidate would be Shredder 2.0 from 1999 with the current Shredder, although I think it's much too highly rated in the 1999 list to be a fair test.

January 1999 SSDF rating list
It's ok with me to use Rybka as I believe this is good enough to make the point without any dispute.

I don't really remember what was the most common cutting edge hardware. I believe we should use the best hardware of the time that would have been used to evaluate the program on the rating lists. I guess today that would be 4 core machines but I don't know what it was back then. It may be that someone actually has that hardware still and we could rig some test.

- Don
It seems that rybka is using significant amount of memory for its internal arrays(something like 72 mbytes) so rybka needs at least computer with 128 mbytes to run.

Fortunately 128 mbytes were available even 10 years ago but it means that rybka probably cannot use more than 32 mbytes hash(it is not very important because the difference between 32 mbytes hash and bigger hash under 128 mbytes is not a big difference).

10 years ago the ssdf used P200 and they upgraded it to K6-450 in the middle of 1999.

I can agree to ssdf hardware so it means P200 against quad.
Now the problem is that the commercial versions of Fritz or Hiarcs of the beginning of 1999 did not support SMP.

Uri
I remember several programs running on dual core machines at the tournament in Padderborn - I forget which year. Are you sure nothing supported this 10 years ago?

If not, that is a problem. When it comes to software/hardware synergy you can make the argument in either direction. I could say that it's not fair because hardware isn't being utilized (and I claim most of the advance is hardware based) but you could claim that writing a parallel program is a software advance. That argument is wrong of course, because parallel programs have been around a very long time.

So this might come down to an odds match - can an older program that is crippled beat a modern uncommonly strong (and I believe not very representative) program? I think with this kind of unfair handicap this could be a relatively close match.

It would be real easy to run this match remotely via a shell, but Rybka is a Windows program and windows is not as flexible about this kind of stuff, so it would have to be conducted on a local setup somehow.
I also remember programs that used smp in tournaments but
I am afraid that not the commercial programs of that time.

It is possible to give time handicap to emulate the smp factor and in this case we need to use ponder off.

how much time advantage we have of Q6600 relative to P200?

SSDF had the following hardwares

P200
K6-450
A1200
Q6600

Uri
For that matter, we could turn ponder off and I can see just how much handicap Rybka can tolerate - maybe the result would be obvious enough to give us an answer and we could extrapolate to estimate what might happen in an actual match.

Even though I have linux, I have my own tester which is pretty flexible and allows handicaps at any time control any depth, whatever and I have the 64 bit Rybka that runs on linux with the hack published in the Rybka forum. I'm not sure I have a 10 year old program however and we would need to agree on how much hash to give Rybka. Of course anyone else would be free to try to duplicate this test. Is there a 10 year public domain program that represents the very best of 10 years ago?

What I would do is just run matches - when one program gets ahead of another I would modify the time control slightly until I reach some kind of equilibrium. I think anyone could independently verify this result on most reasonable hardware and we would have an "H factor" - a multiplier that approximately gives us the handicap required in time.

I test with about 8000 shallow depth openings from a large games collection - each player plays both sides of each opening for a given opponent and we don't go any deeper than a few ply. And the openings are not tuned to any program. The opening had to occur a few times to make it into my shallow book.

- Don

For your question we have the CCRL list

http://computerchess.org.uk/ccrl/404/ra ... t_all.html

Fritz 5.32's rating there is 2642 and it seems to be 500 elo below rybka.
If you want free programs at similiar level then you can choose
Jonny 2.83 32-bit or Anaconda 2.0.1.

If you want a program that is using parallel search then
Crafty 21.5 PS 64-bit seems to be good enough to represent similiar level to the level of top programs of Fritz5.32.

Uri

Uri Blass · Post by **Uri Blass** » Tue Jan 13, 2009 6:31 pm

bob wrote:
Uri Blass wrote:
Don wrote:
Uri Blass wrote:
Don wrote:
Dirt wrote:
bob wrote:Rybka is NFG for this test as it has not been around long enough, and its roots have never been clearly defined...
Comparing a top program from the past, like Fritz 5.32, with a current program that is well down the list, like the current Fritz, is completely useless. That wouldn't show how much software has advanced, just Fritz. The only correct comparison is with Fritz or Hiarcs from then with Rybka 3.

While I see no reason to do it, if you want to use the same program it should be equally far down both the old and new rating lists. A candidate would be Shredder 2.0 from 1999 with the current Shredder, although I think it's much too highly rated in the 1999 list to be a fair test.

January 1999 SSDF rating list
It's ok with me to use Rybka as I believe this is good enough to make the point without any dispute.

I don't really remember what was the most common cutting edge hardware. I believe we should use the best hardware of the time that would have been used to evaluate the program on the rating lists. I guess today that would be 4 core machines but I don't know what it was back then. It may be that someone actually has that hardware still and we could rig some test.

- Don
It seems that rybka is using significant amount of memory for its internal arrays(something like 72 mbytes) so rybka needs at least computer with 128 mbytes to run.

Fortunately 128 mbytes were available even 10 years ago but it means that rybka probably cannot use more than 32 mbytes hash(it is not very important because the difference between 32 mbytes hash and bigger hash under 128 mbytes is not a big difference).

10 years ago the ssdf used P200 and they upgraded it to K6-450 in the middle of 1999.

I can agree to ssdf hardware so it means P200 against quad.
Now the problem is that the commercial versions of Fritz or Hiarcs of the beginning of 1999 did not support SMP.

Uri
I remember several programs running on dual core machines at the tournament in Padderborn - I forget which year. Are you sure nothing supported this 10 years ago?

If not, that is a problem. When it comes to software/hardware synergy you can make the argument in either direction. I could say that it's not fair because hardware isn't being utilized (and I claim most of the advance is hardware based) but you could claim that writing a parallel program is a software advance. That argument is wrong of course, because parallel programs have been around a very long time.

So this might come down to an odds match - can an older program that is crippled beat a modern uncommonly strong (and I believe not very representative) program? I think with this kind of unfair handicap this could be a relatively close match.

It would be real easy to run this match remotely via a shell, but Rybka is a Windows program and windows is not as flexible about this kind of stuff, so it would have to be conducted on a local setup somehow.
I also remember programs that used smp in tournaments but
I am afraid that not the commercial programs of that time.

It is possible to give time handicap to emulate the smp factor and in this case we need to use ponder off.

how much time advantage we have of Q6600 relative to P200?

SSDF had the following hardwares

P200
K6-450
A1200
Q6600

Uri
Q6600 is old news and there are _much_ better processors available today. Who cares what they were using, they were always 2-3 years behind leeading edge...

P200 was also not the top hardware of 1999.
If you are interested in top hardware of 1999 against top hardware of today then I guess that rybka3 can also use more than one processor
so I expect even better results for rybka3.

Uri

Don · Post by **Don** » Tue Jan 13, 2009 6:32 pm

bob wrote: Old on new hardware will be the most interesting as I would not be surprised at all to see the old program beat R3 with R3 on old hardware and old program on new hardware...

That's what I expect to see - but there is a big question mark about whether the old program can actually utilize the new hardware. Your test would help sort that out and I think it is a good test to do.

Uri Blass · Post by **Uri Blass** » Tue Jan 13, 2009 6:48 pm

Don wrote:
bob wrote: Old on new hardware will be the most interesting as I would not be surprised at all to see the old program beat R3 with R3 on old hardware and old program on new hardware...
That's what I expect to see - but there is a big question mark about whether the old program can actually utilize the new hardware. Your test would help sort that out and I think it is a good test to do.

I expect rybka to win at 120/40 time control even if we replace old programs by Crafty21.5 that has a similiar level to top programs of 1999(assuming that you give rybka the best hardware of 1999 that is a quad and give Crafty21.5 the best octal of today(I assume that programs of 1999 could not use efficiently something better than an octal).

Note that I suggest to use contempt=0 for rybka(the default contempt=15 is better against significantly weaker opponent but when the target is to win a match it is better to use contempt=0 based on the results that I read)

Uri

Don · Post by **Don** » Tue Jan 13, 2009 6:58 pm

Uri Blass wrote: P200 was also not the top hardware of 1999.
If you are interested in top hardware of 1999 against top hardware of today then I guess that rybka3 can also use more than one processor
so I expect even better results for rybka3.
Uri

I'm basically looking for a 32 to 1 advantage based on Moores law because we are talking about 10 years and a doubling every 2 years. So even if we run all the testing on a single processor this is the value I would expect to be fair.

The raw CPU speed hasn't fully kept up with Moores law, but we must also consider huge increase in memory and multi-processors. And this discussion did not start off being about the last 10 years but much further back. So however this test is constructed it should give the 10 year old top program a 32 to 1 (or equivalent) advantage, otherwise we are arguing about something else.

Even though the memory will be crippled, I don't think it will much of a disadvantage for Rybka. It would not NEED the same amount of memory running 32 times slower.

I think this test is going to be pretty unfair anyway because there are intangibles that we cannot easily take into consideration. Each program was optimized for the hardware it was designed to run on with the appropriate compiler optimizations, etc. Some benchmarking might help to resolve this however. Maybe it cancels out for each program.

Anyway, I would like to just gather the numbers for now and we can argue about what they mean later - sort it out then.

bob · Post by **bob** » Tue Jan 13, 2009 7:37 pm

Uri Blass wrote:
Michael Sherwin wrote:
Tord Romstad wrote:
bob wrote:
Uri Blass wrote:It may be interesting to do the same comparison for Glaurung to see how much rating Glaurung earns from LMR and null move.
if you have the time, go for it. I don't have the time to study the source to see what needs to be commented out.
I don't have time to run any tests, but if either of you (or somebody else) wants, I can make a special version where LMR, null move and the tapered super-qsearch (which I suspect is also worth a significant number of Elo points) can easily be switched on and off. Alternatively, if it makes testing easier, I can add some compile-time switches which makes it easy to disable the features you want.

By the way, it's amusing to see that LMR is now generally accepted as effective. Back when I started advocating it, the technique was largely abandoned since many years, and those few programmers I managed to convince to give it a try mostly reported that it didn't work for them.

Tord
This is just a guess, but:

Most tactics are very shallow (within say, 12 ply) and on slower hardware LMR may hurt, because too many tactics are missed. On faster hardware the vast majority of shallow tactics is seen anyway despite LMR and the extra plies gives sight of rare deep tactics as well as better 'strategical planning'.
I do not agree
You assume that LMR cause the program to be slower in finding simple tactics and I think that in most cases programs can see shallow tactics faster with LMR.

Uri

I disagree. The farther from the root an error occurs, the less damage it does to the overall result. Because you have more chances to recover. As the depth decreases due to hardware speed, the errors do, by definition, move closer to the root. And the closer to the root they happen, the less likely it is that you can find an alternative that will save you.

This was a null-move issue early on and why nobody used R=2 or R=3 when the typical micro depth was 5-6 plies at 2 minutes...

bob · Post by **bob** » Tue Jan 13, 2009 7:49 pm

Uri Blass wrote:
Don wrote:
bob wrote: Old on new hardware will be the most interesting as I would not be surprised at all to see the old program beat R3 with R3 on old hardware and old program on new hardware...
That's what I expect to see - but there is a big question mark about whether the old program can actually utilize the new hardware. Your test would help sort that out and I think it is a good test to do.
I expect rybka to win at 120/40 time control even if we replace old programs by Crafty21.5 that has a similiar level to top programs of 1999(assuming that you give rybka the best hardware of 1999 that is a quad and give Crafty21.5 the best octal of today(I assume that programs of 1999 could not use efficiently something better than an octal).

Note that I suggest to use contempt=0 for rybka(the default contempt=15 is better against significantly weaker opponent but when the target is to win a match it is better to use contempt=0 based on the results that I read)

Uri

I'd go for that test in a heartbeat because Crafty of 1999 could use a 16-way box just fine and those are easy to find today. Even 8-way.

In 1998 I ran on a Pentium II xeon at 300mhz. That's probably a good estimate of what was generally available in late 1998 which is about ten years ago since we just barely started 2009 and should probably consider 2008 as "the year". I ran on an i7 late in 2008 for comparison.

To keep this simple, I would suggest a single-chip machine, 300mhz P2 for 1998, single-chip quad-core I7 at nearly 3ghz for today's hardware. Yes you can do a dual I7 but you could also do a dual P2, and you could do quads in both as well, so a single chip test would be representative...

That is probably about a factor of 200x in computing. Crafty on the P2300 was under 100K nodes per second. On the quad i7 it is hitting around 20M.

So we could use some platform of today and play 200:1 time odds and see what happens. I believe I know what will happen but it will be interesting to try. I'd be more than happy to try this but do not have any commercial programs to test against and only run linux on everything I have...

Here's a test I can run trivially:

First, let's choose glaurung 2 rather than Rybka. You can decide how much better Rybka is in terms of Elo and we will always use Glaurung + Delta-x where Delta-x is the difference between R3 and G2.

Now I can run any sort of handicap match you want, at any time control, and we don't need tens of thousands of games which will help.

So for 1998 hardware we run either program at a 200:1 time handicap. For 2008 hardware we run that program straight up with no handicap.

Interested???

I can take current Crafty and Glaurung back to 1998 hardware. I can run them on current hardware (simulated via time). Only thing I can't do, which makes the experiment less interesting, is I can't take a 1998 program and its 2008 counterpart to test which would show what happened over that span with software, where the time handicap would show what happened over that span with hardware. But we could discover some interesting information about hardware improvements anyway.

bob · Post by **bob** » Tue Jan 13, 2009 7:52 pm

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Don wrote:
Uri Blass wrote:
Don wrote:
Dirt wrote:
bob wrote:Rybka is NFG for this test as it has not been around long enough, and its roots have never been clearly defined...
Comparing a top program from the past, like Fritz 5.32, with a current program that is well down the list, like the current Fritz, is completely useless. That wouldn't show how much software has advanced, just Fritz. The only correct comparison is with Fritz or Hiarcs from then with Rybka 3.

While I see no reason to do it, if you want to use the same program it should be equally far down both the old and new rating lists. A candidate would be Shredder 2.0 from 1999 with the current Shredder, although I think it's much too highly rated in the 1999 list to be a fair test.

January 1999 SSDF rating list
It's ok with me to use Rybka as I believe this is good enough to make the point without any dispute.

I don't really remember what was the most common cutting edge hardware. I believe we should use the best hardware of the time that would have been used to evaluate the program on the rating lists. I guess today that would be 4 core machines but I don't know what it was back then. It may be that someone actually has that hardware still and we could rig some test.

- Don
It seems that rybka is using significant amount of memory for its internal arrays(something like 72 mbytes) so rybka needs at least computer with 128 mbytes to run.

Fortunately 128 mbytes were available even 10 years ago but it means that rybka probably cannot use more than 32 mbytes hash(it is not very important because the difference between 32 mbytes hash and bigger hash under 128 mbytes is not a big difference).

10 years ago the ssdf used P200 and they upgraded it to K6-450 in the middle of 1999.

I can agree to ssdf hardware so it means P200 against quad.
Now the problem is that the commercial versions of Fritz or Hiarcs of the beginning of 1999 did not support SMP.

Uri
I remember several programs running on dual core machines at the tournament in Padderborn - I forget which year. Are you sure nothing supported this 10 years ago?

If not, that is a problem. When it comes to software/hardware synergy you can make the argument in either direction. I could say that it's not fair because hardware isn't being utilized (and I claim most of the advance is hardware based) but you could claim that writing a parallel program is a software advance. That argument is wrong of course, because parallel programs have been around a very long time.

So this might come down to an odds match - can an older program that is crippled beat a modern uncommonly strong (and I believe not very representative) program? I think with this kind of unfair handicap this could be a relatively close match.

It would be real easy to run this match remotely via a shell, but Rybka is a Windows program and windows is not as flexible about this kind of stuff, so it would have to be conducted on a local setup somehow.
I also remember programs that used smp in tournaments but
I am afraid that not the commercial programs of that time.

It is possible to give time handicap to emulate the smp factor and in this case we need to use ponder off.

how much time advantage we have of Q6600 relative to P200?

SSDF had the following hardwares

P200
K6-450
A1200
Q6600

Uri
Q6600 is old news and there are _much_ better processors available today. Who cares what they were using, they were always 2-3 years behind leeading edge...
P200 was also not the top hardware of 1999.
If you are interested in top hardware of 1999 against top hardware of today then I guess that rybka3 can also use more than one processor
so I expect even better results for rybka3.

Uri

\

The speed improvement from 1998 to today is about 200:1. I'll happily take those odds. 2:1 makes Crafty somewhat better than Glaurung 2 (most recent). What will another factor of 100 do? That is roughly 7 plies added on. Or using my recent rating test, about 490 Elo improvement. Or about 560 Elo improvement over 1998 hardware. From current games on ICC, I do not believe Rybka is _anywhere_ near 560 Elo better than current Crafty. Rybka is definitely better. But not _that_ much better...

bob · Post by **bob** » Tue Jan 13, 2009 7:54 pm

Don wrote:
Uri Blass wrote: P200 was also not the top hardware of 1999.
If you are interested in top hardware of 1999 against top hardware of today then I guess that rybka3 can also use more than one processor
so I expect even better results for rybka3.
Uri
I'm basically looking for a 32 to 1 advantage based on Moores law because we are talking about 10 years and a doubling every 2 years. So even if we run all the testing on a single processor this is the value I would expect to be fair.

The raw CPU speed hasn't fully kept up with Moores law, but we must also consider huge increase in memory and multi-processors. And this discussion did not start off being about the last 10 years but much further back. So however this test is constructed it should give the 10 year old top program a 32 to 1 (or equivalent) advantage, otherwise we are arguing about something else.

Even though the memory will be crippled, I don't think it will much of a disadvantage for Rybka. It would not NEED the same amount of memory running 32 times slower.

I think this test is going to be pretty unfair anyway because there are intangibles that we cannot easily take into consideration. Each program was optimized for the hardware it was designed to run on with the appropriate compiler optimizations, etc. Some benchmarking might help to resolve this however. Maybe it cancels out for each program.

Anyway, I would like to just gather the numbers for now and we can argue about what they mean later - sort it out then.

I believe it is better than that. Moore's law deals with chip density, not necessarily chip speed. In my old records, I found that in November 1998 I ran on a P2/300 box that was loaned to me for a month or so. Crafty was getting under 100K nodes per second on that box. Most recent testing on a 2.9?? ghz I7 was around 20M nodes per second using 4 cores, no hyperthreading. A factor of 200:1 roughly, which is wildly significant.

Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software