Final results - Crafty - hardware vs software

bob · Post by **bob** » Mon Sep 13, 2010 4:58 am

Since the other thread's title is downright incorrect, I'm going to give some data that as far as I am concerned pretty well nails this coffin shut, for the case of Crafty.

==========================================================
Software gain. I am comparing Crafty 23.4, the highest-rated version of Crafty so far, to Crafty 10.18 which was a 1995 program. First, the data.

Code: Select all

    Crafty-23.4-2        2851    3    3 30000   66%  2728   22% 
    Crafty-23.4-1        2848    3    3 30000   66%  2728   22% 
    Crafty-10.18-1       2491    4    4 30000   22%  2728   14% 
    Crafty-10.18-2       2490    4    4 30000   22%  2728   14%

If you subtract the two averages, you get +360 Elo from purely software changes. I ran both 23.14 and 10.18 on the same hardware, same set of opponents and positions, same time controls, so there's nothing to skew this data significantly.

==========================================================
Hardware gain. This part is easy. On a P5/90, we searched 20K nodes per second, today on the best hardware platform, a 6-core i7, the same version of Crafty (10.18) runs at 30M nodes per second using SMP search. With PGO it gains another 10%. But for

First, compute the actual speedup, since the SMP search speed scales perfectly with respect to NPS, but not that well when you measure time to solution. For SMP efficiency, after measuring thousands of positions over the years, a good estimate for parallel speedup is:

speedup = 1 + (NCPUS - 1) * 0.7

For the 6 core box, we get 4.5x increase (rather than 6x) for the above. If you take the 30M number and divide by 6 cpus, you get 5M nodes per second per CPU. Multiplying by the 4.5x SMP speedup, we get 22.5M nodes per second. Divide this by the 1995 speed of 20K and we are a little over the 1000x faster number I have quoted repeatedly. For simplicity, call this 1024x.

Now, what does this do? Historically we have had several estimates for the Elo gain for each doubling of CPU speed. Way back, it was 100. More recently I have been seeing 50-70, so I thought I would try to measure this as easily as possible. I decided to take 10.18, and run it thru the gauntlet again, but this time gave it 1/2 the normal time, leaving the opponents using the normal time. I then ran this two more times, but used 1/4 of the normal time.

Here's the data:

Code: Select all

   Crafty-10.18-1       2491    4    4 30000   22%  2728   14% 
   Crafty-10.18-2       2490    4    4 30000   22%  2728   14% 
   Crafty-10.18-3       2415    4    4 30000   16%  2728   11% 
   Crafty-10.18-4       2413    4    4 30000   15%  2728   11% 
   Crafty-10.18-5       2325    5    5 30000   10%  2728    8% 
   Crafty-10.18         2325    5    5 28840   10%  2728    8%

10.18-1 and -2 are the original 10.18 at normal speed. -3 and -4 are running at 1/2 speed. -5 and (eventually) -6 are running at 1/4 speed. The last run is not quite complete but the numbers are identical and I decided to go ahead and post the results.

If you look carefully, -3 and -4 are 76 elo lower (average) than -1/-2. Then -5 and -6 are 88 lower. I suspect this trend will continue, in that if there is a diminishing return for going faster and faster, then there must be a steadily increasing loss for going slower and slower. If we pick a pretty rough average of 80 for each doubling for 10.18, and if you compute log2(1024) = 10, or 10 doublings in hardware speed running Crafty, since 1995. At 80 Elo per doubling, we get +800. I don't particularly trust the +360 or this +800, but I am reasonably certain that the ratio would not change by much.

This is roughly 2-1, 2 parts hardware to 1 part software. I wasn't quite sure where this would end up, but this is certainly within the high/low bounds I would have guessed.

While it is possible to take the hardware speeds all the way back to 1995, it is not easy. I need to find some very weak opponents because when I was trying to test the new skill settings, I could not get below 1600 or so, because the opposition I use is too strong. That is not an easy thing to do, and to be honest, it is a lot of effort for absolutely zero return. I've decided that this rough (but pretty accurate) assessment is close enough for the time being. Anyone else can test whatever program they want. More data would be interesting. But it is a lot of work for no return.

The rather ridiculous title "Crafty tests show that Software has advanced more." is totally misleading, and completely wrong. My results suggest a 1/3-2/3 split, 1/3 for software, 2/3 for hardware. I know that the arguments will continue. I've provided the best data I can. Hopefully we will see more real data and less speculation and hot air as this continues...

rbarreira · Post by **rbarreira** » Mon Sep 13, 2010 10:12 am

Thank you for posting the results. But the original question was not what contributed more to a particular program, it was what contributed more to chess strength in general.

I realize that the comparison isn't as easy to make when going from one program to another, but it shouldn't be that hard to approximate using the data you provided and some additional knowledge that you and others probably have. The additional needed data is, how close was Crafty to the state of the art back in 1995, and how close is it today?

If it was about as close to the best program of the era both in 1995 and in 2010, the 360 elo improvement from software should be about right, and that settles the matter. On the other extreme, if it was the state of the art back then and about 200-300 points weaker than the best current programs (going from rating lists), then the total software improvement would be around 600 since 1995.

You are very good at optimizing software, so I don't think the current best programs get their strength from better use of hardware. So I think the above should be a good way to get a good approximation to answer the original question.

bob · Post by **bob** » Mon Sep 13, 2010 3:36 pm

rbarreira wrote:Thank you for posting the results. But the original question was not what contributed more to a particular program, it was what contributed more to chess strength in general.

I don't think you can answer that without testing several programs and computing some sort of average.

I realize that the comparison isn't as easy to make when going from one program to another, but it shouldn't be that hard to approximate using the data you provided and some additional knowledge that you and others probably have. The additional needed data is, how close was Crafty to the state of the art back in 1995, and how close is it today?

I can give a reasonably accurate answer for both. In 1996 two versions of Crafty played in the WMCCC event that year and finished in 3rd and 4th places (don't ask why there were two versions, I only entered one, it's a long story.) It was competitive with anything. Today it is probably 300 or so behind Rybka, on the rating lists. I suspect it is more like 200 or so if you factor out books and use my cluster-testing approach of no book, reasonable starting positions, positions chosen randomly.

If it was about as close to the best program of the era both in 1995 and in 2010, the 360 elo improvement from software should be about right, and that settles the matter. On the other extreme, if it was the state of the art back then and about 200-300 points weaker than the best current programs (going from rating lists), then the total software improvement would be around 600 since 1995.

"could be" as opposed to "would be". My testing is really measuring engine strength only. No opening book, which can easily be a +100 to -100 change depending on whether it is a good or bad book. And there is the architecture issue as well, Another program might get a bigger boost from today's architecture than I do, which is why it is important to test a program carefully, if you want accurate numbers. If you just add another 200-300 to my numbers, hardware still has an edge, although it is not nearly as significant as what I found by testing Crafty carefully.

You are very good at optimizing software, so I don't think the current best programs get their strength from better use of hardware. So I think the above should be a good way to get a good approximation to answer the original question.

John Major · Post by **John Major** » Mon Sep 13, 2010 9:07 pm

How could Crafty beat GM's in '95 on ICC if it was 1100+ ELO weaker than today? According to some old rgcc posts it had a rating up to 2800.

mhull · Post by **mhull** » Mon Sep 13, 2010 9:13 pm

John Major wrote:How could Crafty beat GM's in '95 on ICC if it was 1100+ ELO weaker than today? According to some old rgcc posts it had a rating up to 2800.

Most programs of that vintage could not maintain a plus score at standard time controls. They fared better at blitz, I believe.

Dann Corbit · Post by **Dann Corbit** » Mon Sep 13, 2010 9:21 pm

mhull wrote:
John Major wrote:How could Crafty beat GM's in '95 on ICC if it was 1100+ ELO weaker than today? According to some old rgcc posts it had a rating up to 2800.
Most programs of that vintage could not maintain a plus score at standard time controls. They fared better at blitz, I believe.

The same general trend is true today.
The slower the time control, the more competitive humans will be.
At game in one second, humans simply cannot respond fast enough, so it is not even a fair comparison.
At game in 1 minute, humans have very little chance.
At 40moves/2hrs, computers have the upper hand (clearly).
At correspondence chess levels it's about even (just an opinion).
Eventually, due to both hardware and software improvements, the correspondence chess players will be at the same disadvantage as the game in one minute players are now.
It appears that *both* software and hardware have offered exponential improvement.

bob · Post by **bob** » Tue Sep 14, 2010 3:46 am

John Major wrote:How could Crafty beat GM's in '95 on ICC if it was 1100+ ELO weaker than today? According to some old rgcc posts it had a rating up to 2800.

Blitz games only. Cray Blitz was beating GMs in the early 80's. But at blitz.

ICC ratings were almost random numbers back then as you could play a 3 0 game, or a 2 12 game, and call it blitz. 3 0 is much harder for a human.

Also, as I mentioned, it is likely that in reality those numbers would be compressed somewhat as I had mentioned that I thought the range might be too broad as well. But it would also seem that hardware and software would be compressed equally, keeping the same ratio most likely.

Final results - Crafty - hardware vs software

Final results - Crafty - hardware vs software

Re: Final results - Crafty - hardware vs software

Re: Final results - Crafty - hardware vs software

Re: Final results - Crafty - hardware vs software

Re: Final results - Crafty - hardware vs software

Re: Final results - Crafty - hardware vs software

Re: Final results - Crafty - hardware vs software