harware vs software advances

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

Re: harware vs software advances

Post by Antonio Torrecillas »

OK here is the data:

first run:
12:13 g1f3 d7d5 d2d4 g8f6 c1f4 c8g4 h2h3 g4f5 e2e3

2º run:
12:59 g1f3 g8f6 d2d4 d7d5 c1f4 c8f5 e2e3 e7e6 b1c3 score 0.12

3º run:
16:46 g1f3 d7d5 d2d4 c8g4 b1c3 e7e6 f3e5 g4h5 score 0.12

I close and reopen after each run. Hash 384 Kb.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: harware vs software advances

Post by bob »

Antonio Torrecillas wrote:OK here is the data:

first run:
12:13 g1f3 d7d5 d2d4 g8f6 c1f4 c8g4 h2h3 g4f5 e2e3

2º run:
12:59 g1f3 g8f6 d2d4 d7d5 c1f4 c8f5 e2e3 e7e6 b1c3 score 0.12

3º run:
16:46 g1f3 d7d5 d2d4 c8g4 b1c3 e7e6 f3e5 g4h5 score 0.12

I close and reopen after each run. Hash 384 Kb.
Something is wrong. The program should be 100% deterministic with respect to searching the same tree to the same fixed depth using just one processor. Your scores are the same, but the PV's are different, which suggests that something is fishy... I don't see how three identical searches (with regard to depth) would produce anything but the same identical node count, same score, same exact PV... That is an absolute requirement during my testing changes to Crafty where all I change is something performance-related. The shape/size of the tree can't change just because the move generator is speeded up, unless I change the order the moves are generated. Of course if I use a second thread, all this goes to hell in a handbasket, but rebel/genius/etc are not SMP engines, which means something else is wrong. Could be a programming bug, of course, that uses uninitialized memory for something, but you'd think a commercial program would not have an error that glaring in it.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: harware vs software advances

Post by Don »

rbarreira wrote:
Don wrote:
rbarreira wrote:Interesting!

I just got MS-DOS 6.22 running from a CD, and I can access a USB pen drive as C:\ (thanks to an option in the BIOS). But when I tried running Rebel 10 it gave me an "insufficient memory" error. (4 GB not enough for you, eh? :D )

Your speedup factors make sense considering it's only using one core and that it's running in 32-bit mode. If you multiply your speedup of 80 by 6 and then by 2 for those two factors, that gives 960 which is pretty much spot-on with Moore's law!
I'm not going to adjust for cores as it makes the match almost impossible to conduct - I cannot give super heavy time handicap and hope to play lots of games. So we can basically interpret the results later and anyone who wants to is free to interpolate.

I don't see how 32 bit mode is a factor. Some of the very best programs today still run in 32 bits and even 64 bits programs do not deserve a double for this.

If this test is actually a contest where each side negotiates every possible advantage, I want no part of it. I want to see how well modern programs would run on older hardware and visa versa. I'm not trying to prove a point and even though I have been going round and round with Bob about this I don't care if I end up being wrong. Doubling the numbers for 64 bit and multiplying by 6 for a 6 core machine is not hardly going to give a realistic picture unless you just want to make up numbers to make the test come out a certain way and that's not what I am doing this for (if I actually do it that is :-)

I plan on using a 100 to 1 factor when testing genius because the benchmark for a single core shows right around 100 to 1. We could argue over all kinds of ways to adjust up and down to compensate for this and that, but we probably would not agree on it anyway. This will be test under known conditions that people can fee free to interpret any way they choose, but at least they will know the exact conditions of the test.

This test is going to be kind of like the computer language benchmarks - it won't prove much - it will just show how things worked out give a very specific test setup and will serve as a jumping off point for other tests if anyone has the patience.

I though of another interesting test setup. Suppose you actually had a Pentium 90 and could run Rybka or Robboltto on it (in 32 bit mode no doubt.) You probably COULD if you installed Linux on one of these old machines and used robbolitto, stockfish or something really good.

Then what you do is play Rybka on the old hardware in the best configuration you can find for it and Chess Genius on the new hardware using the best configuration you can find. At even time controls you have something that is (at least in principle) a fair match. Can Rybka's superior software overcome Genius on superior hardware?
My point was not that you need to adjust your testing for these factors. I was just saying that the speedup makes sense according to Moore's law. If it was 1000x faster on a single core that would be unbelievable (in the other thread I didn't realize you were talking about single core comparisons).
From my own personal experience I never seen a real doubling in even 2 years. On paper we got a doubling in performance about every 18 months to 2 years - but I upgrade every 2 or 3 years and have NEVER seen a full doubling in performance until my last purchase - and that was only because I went with 6 cores. Each time I upgrade I've been disappointed because the hype does not quite equal the reality. Of course each upgrade was still a nice step forward.

That's why when I did the numbers for the last 16 years, a doubling every 18 months, you get something that is way out of line with reality.

So regardless of how you slice it I think it's pretty clear that you cannot say that hardware is responsible for the "majority" of the advancement in computer chess. The only question left is which is more responsible and it could be a close call.
Antonio Torrecillas
Posts: 90
Joined: Sun Nov 02, 2008 4:43 pm
Location: Barcelona

Re: harware vs software advances

Post by Antonio Torrecillas »

4º run 12:27 score 0.18
g1f3 g8f6 d2d4 d7d5 c1f4 c8g4 h2h3 g4f5 e2e3

5º run 12:51 score 0.15
g1f3 g8f6 d2d4 d7d5 c1f4 c8g4 h2h3

with a hardware reset before each test.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: harware vs software advances

Post by Don »

bob wrote:
Antonio Torrecillas wrote:OK here is the data:

first run:
12:13 g1f3 d7d5 d2d4 g8f6 c1f4 c8g4 h2h3 g4f5 e2e3

2º run:
12:59 g1f3 g8f6 d2d4 d7d5 c1f4 c8f5 e2e3 e7e6 b1c3 score 0.12

3º run:
16:46 g1f3 d7d5 d2d4 c8g4 b1c3 e7e6 f3e5 g4h5 score 0.12

I close and reopen after each run. Hash 384 Kb.
Something is wrong. The program should be 100% deterministic with respect to searching the same tree to the same fixed depth using just one processor. Your scores are the same, but the PV's are different, which suggests that something is fishy... I don't see how three identical searches (with regard to depth) would produce anything but the same identical node count, same score, same exact PV... That is an absolute requirement during my testing changes to Crafty where all I change is something performance-related. The shape/size of the tree can't change just because the move generator is speeded up, unless I change the order the moves are generated. Of course if I use a second thread, all this goes to hell in a handbasket, but rebel/genius/etc are not SMP engines, which means something else is wrong. Could be a programming bug, of course, that uses uninitialized memory for something, but you'd think a commercial program would not have an error that glaring in it.
I think this was a characteristic of Genius. I got the same thing on my own machine - I restarted everything from scratch and did things in the same exact order and came up with different numbers.
Uri Blass
Posts: 10297
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: harware vs software advances

Post by Uri Blass »

Antonio Torrecillas wrote:OK here is the data:

first run:
12:13 g1f3 d7d5 d2d4 g8f6 c1f4 c8g4 h2h3 g4f5 e2e3

2º run:
12:59 g1f3 g8f6 d2d4 d7d5 c1f4 c8f5 e2e3 e7e6 b1c3 score 0.12

3º run:
16:46 g1f3 d7d5 d2d4 c8g4 b1c3 e7e6 f3e5 g4h5 score 0.12

I close and reopen after each run. Hash 384 Kb.
I remember that Genius3 could use more than 384 kb hash and if you use only 384 kb hash for it you make it weaker.

Note that even with more hash I fully expect Genius to lose with 100:1 time advantage if we do not talk about very fast time control(and I fully expect 5 minutes of rybka to be better than 500 minutes of Genius on the same one core of today).

Uri
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: harware vs software advances

Post by rbarreira »

Maybe Genius has some randomization.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: harware vs software advances

Post by Don »

Uri Blass wrote:
Antonio Torrecillas wrote:OK here is the data:

first run:
12:13 g1f3 d7d5 d2d4 g8f6 c1f4 c8g4 h2h3 g4f5 e2e3

2º run:
12:59 g1f3 g8f6 d2d4 d7d5 c1f4 c8f5 e2e3 e7e6 b1c3 score 0.12

3º run:
16:46 g1f3 d7d5 d2d4 c8g4 b1c3 e7e6 f3e5 g4h5 score 0.12

I close and reopen after each run. Hash 384 Kb.
I remember that Genius3 could use more than 384 kb hash and if you use only 384 kb hash for it you make it weaker.

Note that even with more hash I fully expect Genius to lose with 100:1 time advantage if we do not talk about very fast time control(and I fully expect 5 minutes of rybka to be better than 500 minutes of Genius on the same one core of today).

Uri
I think the faster the games, the better by far for the older programs. In the test I'm doing robbo is running at 40 moves in 57 seconds and Genius is running at 40/2 hours. I had to make an adjustment because i"m not actually running on the same machines for both programs, but this still represents a 100 to 1 advantage.

I'm actually playing the first game now and it's not over, but robbo has a pretty big advantage - it will probably be a win for robbolito.

If Robbo continues to win I will make adjustments in the time control in order to get a rough picture of where the break even point is.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: harware vs software advances

Post by Don »

Uri Blass wrote:
Antonio Torrecillas wrote:OK here is the data:

first run:
12:13 g1f3 d7d5 d2d4 g8f6 c1f4 c8g4 h2h3 g4f5 e2e3

2º run:
12:59 g1f3 g8f6 d2d4 d7d5 c1f4 c8f5 e2e3 e7e6 b1c3 score 0.12

3º run:
16:46 g1f3 d7d5 d2d4 c8g4 b1c3 e7e6 f3e5 g4h5 score 0.12

I close and reopen after each run. Hash 384 Kb.
I remember that Genius3 could use more than 384 kb hash and if you use only 384 kb hash for it you make it weaker.

Note that even with more hash I fully expect Genius to lose with 100:1 time advantage if we do not talk about very fast time control(and I fully expect 5 minutes of rybka to be better than 500 minutes of Genius on the same one core of today).

Uri

The first game is winding down now. It looks like this could be a fairly close match as at one point both programs were giving Genius a pretty big advantage - but now it looks like a very likely Robbo win. Both program see over 3 pawn advantage for Robbo.

Don
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: harware vs software advances

Post by bob »

Don wrote:
rbarreira wrote:
Don wrote:
rbarreira wrote:Interesting!

I just got MS-DOS 6.22 running from a CD, and I can access a USB pen drive as C:\ (thanks to an option in the BIOS). But when I tried running Rebel 10 it gave me an "insufficient memory" error. (4 GB not enough for you, eh? :D )

Your speedup factors make sense considering it's only using one core and that it's running in 32-bit mode. If you multiply your speedup of 80 by 6 and then by 2 for those two factors, that gives 960 which is pretty much spot-on with Moore's law!
I'm not going to adjust for cores as it makes the match almost impossible to conduct - I cannot give super heavy time handicap and hope to play lots of games. So we can basically interpret the results later and anyone who wants to is free to interpolate.

I don't see how 32 bit mode is a factor. Some of the very best programs today still run in 32 bits and even 64 bits programs do not deserve a double for this.

If this test is actually a contest where each side negotiates every possible advantage, I want no part of it. I want to see how well modern programs would run on older hardware and visa versa. I'm not trying to prove a point and even though I have been going round and round with Bob about this I don't care if I end up being wrong. Doubling the numbers for 64 bit and multiplying by 6 for a 6 core machine is not hardly going to give a realistic picture unless you just want to make up numbers to make the test come out a certain way and that's not what I am doing this for (if I actually do it that is :-)

I plan on using a 100 to 1 factor when testing genius because the benchmark for a single core shows right around 100 to 1. We could argue over all kinds of ways to adjust up and down to compensate for this and that, but we probably would not agree on it anyway. This will be test under known conditions that people can fee free to interpret any way they choose, but at least they will know the exact conditions of the test.

This test is going to be kind of like the computer language benchmarks - it won't prove much - it will just show how things worked out give a very specific test setup and will serve as a jumping off point for other tests if anyone has the patience.

I though of another interesting test setup. Suppose you actually had a Pentium 90 and could run Rybka or Robboltto on it (in 32 bit mode no doubt.) You probably COULD if you installed Linux on one of these old machines and used robbolitto, stockfish or something really good.

Then what you do is play Rybka on the old hardware in the best configuration you can find for it and Chess Genius on the new hardware using the best configuration you can find. At even time controls you have something that is (at least in principle) a fair match. Can Rybka's superior software overcome Genius on superior hardware?
My point was not that you need to adjust your testing for these factors. I was just saying that the speedup makes sense according to Moore's law. If it was 1000x faster on a single core that would be unbelievable (in the other thread I didn't realize you were talking about single core comparisons).
From my own personal experience I never seen a real doubling in even 2 years. On paper we got a doubling in performance about every 18 months to 2 years - but I upgrade every 2 or 3 years and have NEVER seen a full doubling in performance until my last purchase - and that was only because I went with 6 cores. Each time I upgrade I've been disappointed because the hype does not quite equal the reality. Of course each upgrade was still a nice step forward.

That's why when I did the numbers for the last 16 years, a doubling every 18 months, you get something that is way out of line with reality.

So regardless of how you slice it I think it's pretty clear that you cannot say that hardware is responsible for the "majority" of the advancement in computer chess. The only question left is which is more responsible and it could be a close call.
First, I have seen several doubles. With Crafty, a P5/133 hit 30K nps. The P6/200 went to 75K. More than 2x faster. I've seen similar things comparing core2 to the original crappy core intel (their first 64 bit stuff was just ugly).

I did manage to locate some P5/133 printed output. Not much, but at least a reference, and as I had thought, we were hitting 30K nodes per second on this box in 1995. I just did some testing on my 3-year-old dual quad intel (2.33ghz E5345 xeon) and hit 30M on quite a few positions, 22-24 was the average.

For me, that is a clean factor of 1,000 (30K to 30M) if you restrict things to just one physical CPU (my i7 6-core numbers were over 30M).

I don't think you can ignore the 6-core issue since that is clearly a hardware advance over the old processors that had 1 core and no cache on the chip back in the case of the P5. Some of that speed gain is 64 bit stuff, some is multiple-core stuff, some is improved cache/OOE stuff, but 30K to 30M can't be ignored. I don't think it is sensible to compare a 1995 program running on modern hardware to get the hardware speedup. A program is always optimized to whatever it runs on, as opposed to what it might run on in 20 years. Crafty was an active project in 1995, is still an active project today, so in 1995 it fit what was available pretty well, today it fits what is available pretty well. And gives a pretty good idea of what hardware has done, although obviously we are overlooking some types of hardware and are primarily talking about the microprocessor(s) from Intel. But it is the most popular thing going, of course.

numbers like 55x are simply wrong. And if we go back 20 years rather than just 15, it would be much more significant. I only mentioned 1995 because I had some data from that period. Prior to that, my machines were a bit "bigger" if you know what I mean. :) In 1994 I broke 7M nps with Cray Blitz when we ran on the T932 (32 Cray cpus). No PC numbers back then. The first time I got a fortran compiler for a PC was around 1994, and Cray Blitz searched a blazing 100 nodes per second, because of the vector stuff we were doing that was just dog-slow on a PC platform with no memory bandwidth or vectorizing.