Deep Blue vs Rybka

mhull · Post by **mhull** » Tue Sep 14, 2010 11:12 pm

Don wrote:If we go back far enough in time, Bob would say that hash tables is not a software improvement.

This is the sock puppet fallacy: "In days of yore, Bob would say...<insert straw man argument>".

Hey, I'm just saying it's not fair to supply both sides of the dialog and then declare victory.

mhull · Post by **mhull** » Tue Sep 14, 2010 11:16 pm

Milos wrote:
mhull wrote:Bob's tests reduce the error margin by playing more games with fewer unknown variables.

I'm not saying you're wrong by definition, I'm just saying yours is an opinion based on weaker data.
Bob's tens of thousands of games has nothing to do with accuracy. Simply his testing methodology is faulty. He could play million of games and his results would be still inaccurate.
I understand some ppl get easily impressed by this, but when different test methodologies, opening books etc., all show accordance to 10-15 elo accuracy (with 15-20 elo error margin) and Bob's result are off by almost 100 elo with his 4 elo margin everything you say is just holding for a straw.
Bob has a systematic error in his testing methodology which he (and some other ppl) are not willing to admit.
I hope you remember The Emperor's New Clothes tale...

You say his method is flawed but ignore the reasons why the other lists are flawed, e.g. arbitrary opening book, ponder off, not enough games, etc. Therefore, your argument does not persuade.

Milos · Post by **Milos** » Tue Sep 14, 2010 11:22 pm

mhull wrote:You say his method is flawed but ignore the reasons why the other lists are flawed, e.g. arbitrary opening book, ponder off, not enough games, etc. Therefore, your argument does not persuade.

"If one man calls you a donkey, ignore him. If two men call you a donkey, think about it. If three men call you a donkey, buy a saddle!"

bob · Post by **bob** » Tue Sep 14, 2010 11:27 pm

rbarreira wrote:
bob wrote:But one could argue that since Crafty's NPS would actually be 6x or 12x faster on the above hardware, that is a software shortcoming, and the hardware should get full credit. If we can't use it effectively, is that the engineer's fault?

But for my summary of results, I certainly used the effective speedup number, which was about 1024, as opposed to the theoretical max, which was 1500x.
The hardware shouldn't get full credit, if the question being asked is "how much does hardware contribute to chess strength".

What about slight re-wording: "How much _could_ the hardware contribute to chess strength?" If we omit the potential theoretical gain, could we not apply that same logic to parallel search in general, since not everyone is doing it, and just limit it to one CPU since they get nothing from the extra cores? Nothing says that we can't one day develop a perfect parallel algorithm.

If the algorithms being used can't take full advantage of the hardware, that means the hardware is contributing less. What matters are the facts of what the hardware is contributing in reality, not some pipe dream of what could be.

So we average this over all chess engines, and the ones without any parallel search drag this way back from where the ones with parallel search are, and the average then says that hardware contributes a lot less than it really does? It is not exactly black and white. One can make a case for either, and one can punch holes in either argument just as easily.

If alpha-beta was impossible to parallelize (fortunately it isn't), parallel hardware wouldn't contribute to chess strength, so it wouldn't get credit. That's clear as day to me...

What if we went back to 1976 or so before there was any thought of parallelizing alpha/beta, even though we had dual and quad-cpu systems around? So that hardware advance would not count in the "did" but would count in the "could" and we'd be having the same discussion back then as well. For the actual numbers, I fudged my calculations to account for the lost performance due to smp overhead. But that "loss" is still there, it still bugs me, and I still take a whack every now and then to reduce it. I do not believe it is an impossible task, just a very difficult one. But today, we are far enough from optimal that there is a _lot_ of room left to improve. That last 5% might be very difficult, but the next 10% from now is not.

But as you said, your results use this way of calculating, so all is well.

mhull · Post by **mhull** » Tue Sep 14, 2010 11:27 pm

Milos wrote:
mhull wrote:You say his method is flawed but ignore the reasons why the other lists are flawed, e.g. arbitrary opening book, ponder off, not enough games, etc. Therefore, your argument does not persuade.
"If one man calls you a donkey, ignore him. If two men call you a donkey, think about it. If three men call you a donkey, buy a saddle!"

If three bolshevists call you a pig dog, ignore them. So there.

Or better yet, three guys with weak data think your view is wrong. Ignore them.

bob · Post by **bob** » Tue Sep 14, 2010 11:31 pm

Don wrote:
rbarreira wrote:
bob wrote:But one could argue that since Crafty's NPS would actually be 6x or 12x faster on the above hardware, that is a software shortcoming, and the hardware should get full credit. If we can't use it effectively, is that the engineer's fault?

But for my summary of results, I certainly used the effective speedup number, which was about 1024, as opposed to the theoretical max, which was 1500x.
The hardware shouldn't get full credit, if the question being asked is "how much does hardware contribute to chess strength". If the algorithms being used can't take full advantage of the hardware, that means the hardware is contributing less. What matters are the facts of what the hardware is contributing in reality, not some pipe dream of what could be.

If alpha-beta was impossible to parallelize (fortunately it isn't), parallel hardware wouldn't contribute to chess strength, so it wouldn't get credit. That's clear as day to me...

But as you said, your results use this way of calculating, so all is well.
I'll say "fools errand" again. You hit the nail on the head here. There is this implicit assumption that it's not fair to bring an old program into the modern world without "reworking" it to be used on modern hardware.

But it's really quite impossible to separate hardware from software. From Bob's point of view everything is hardware and it's possible to take the point of view that everything is hardware and in some sense be absolutely correct. After all, EVERYTHING you do to improve a chess programs makes it work better on whatever hardware you are using.

It's quite impossible to make a clean separation. I am now of the opinion that we should just not try. We should just see how well an old program runs on new hardware. If the old program is not MP, then too bad.

We won't be answering the exact question we set out to answer, but it's a fools errand to try because in Bob's eyes the hardware is always going to be more important and be the thing that counts, even if the software was needed.

If we go back far enough in time, Bob would say that hash tables is not a software improvement. Hash tables require memory, and memory is hardware. I know that if it were 20 years ago this is probably what we would be arguing about. The problem is that once something becomes ubiquitous, such as hash tables, or the PC platform, it becomes the defining thing but doesn't address what MY original statement was all about that started this. In my view Bob just defined for himself what I meant then set about to disprove it.

I am being realistic. In 1995 parallel search was old hat to some of us. Already been there, done that, got the T-shirt. So that is not a "since 1995" software improvement. I therefore see nothing wrong with using parallel search today as part of the measurement for hardware gain. I even tried to do some analysis to at least ball-park the Elo gain for hardware advances from 1995 to present. No way to be dead accurate without finding a lot of weaker opponents to test against, and to use real 1995 hardware, + 2000, +2005 so that we can connect all the opponents from 1995 to 2010. Lot of work for no gain at all, other than in terms of information.

Seems clear to me that hardware is at _least_ as important as software, using beyond-worst-case numbers for hardware. And more likely, hardware is ahead overall. How much can be debated, but certainly 100-200 is within a ball-park of the right answer. No huge winner, but any win is good for the hardware guys...

Milos · Post by **Milos** » Tue Sep 14, 2010 11:35 pm

mhull wrote:If three bolshevists call you a pig dog, ignore them. So there.

Or better yet, three guys with weak data think your view is wrong. Ignore them.

Ignorance is a bliss. However, there is no point in further discussion with you since you only do bootlicking here and provide no argument...

bob · Post by **bob** » Tue Sep 14, 2010 11:36 pm

Milos wrote:
bob wrote:If you knew anything about parallel programming you would know the answer. The concept of "speedup" applies to a position, not to a game. How do you compute the "speedup" for a game? I know how to compute the speedup for a position. And, just to keep this on a sane level, this is the way _everybody_ has been reporting speedup for 40 years now.

I do not run test positions for the results I show here, however. I run 3,000 starting positions, taken 12 moves into GM games, all duplicates removed, and play the complete _game_ out from there. So what are you talking about? Other than clearly something you don't have a clue about.

All the Elo results I show here are from nothing bug games. All of my speedup data is from nothing but single positions, since the concept of speedup for a game is meaningless.

Jeez it is hard to discuss something when you are so far out in left field you are completely out of the stadium, and even beyond the parking lot.
No the problem is you don't (want to) understand the word of what I'm saying.
So let me summarize for you so that you can understand it this time.
You claim the following. If you take Crafty and run it on 32 nodes machine and than run it on 64 nodes machine you will get 1.7x speedup in the sense that an average time to reach fixed depth over several (hundreds) of position is 1.7 time shorter in case of 64 nodes machine compared to 32 nodes machine.
You further claim that this 1.7x speedup is equivalent to log(1.7)/log(2)*70=54 elo difference in strength (I even took here more conservative 70 instead of 80 elo for speedup doubling).

What I claim is that if you run Crafty on 32 nodes against your standard set of opponents in 30k games match at any reasonable TC and then repeat the same for Crafty on 64 nodes, the difference in elo you obtain will never reach 54 elo. As a matter of fact it will be smaller by a large margin.

I claim that is simply wrong. Because some are already reporting +100 for 1 to 4. And that matches my linear approximation pretty nicely. What exactly would cause a program to run 1.7x faster in terms of time to find reach a particular depth in a quiet position, in an endgame position, and in a wildly tactical position, and yet only see a +20 Elo improvement?

Your reasoning is flawed, and I can't begin to guess where it is coming from. But not from any accepted science, for sure.

This kind of test is very simple to run but you refuse to do it. My guess is that you already know the result but are too stubborn to admit it.

My guess is that you are too ignorant to realize that 32 core and 64 core machines are not exactly commodity items. It takes a while at 256 games at a time to compute results. On one machine, it would take months. If we had such a machine. I do have 8 core boxes, and I'd be more than happy to run the test for 1, 2, 4 and 8. But I have already done that, with no surprises at all. I don't run many 8-core tests because that machine has 70 nodes, and I can only run 70 games at a time, which is slow enough to make me not do it unless I am testing something new in the parallel search and want to see how it does.

rbarreira · Post by **rbarreira** » Tue Sep 14, 2010 11:37 pm

bob wrote:
rbarreira wrote:
bob wrote:But one could argue that since Crafty's NPS would actually be 6x or 12x faster on the above hardware, that is a software shortcoming, and the hardware should get full credit. If we can't use it effectively, is that the engineer's fault?

But for my summary of results, I certainly used the effective speedup number, which was about 1024, as opposed to the theoretical max, which was 1500x.
The hardware shouldn't get full credit, if the question being asked is "how much does hardware contribute to chess strength".
What about slight re-wording: "How much _could_ the hardware contribute to chess strength?" If we omit the potential theoretical gain, could we not apply that same logic to parallel search in general, since not everyone is doing it, and just limit it to one CPU since they get nothing from the extra cores? Nothing says that we can't one day develop a perfect parallel algorithm.

If the algorithms being used can't take full advantage of the hardware, that means the hardware is contributing less. What matters are the facts of what the hardware is contributing in reality, not some pipe dream of what could be.
So we average this over all chess engines, and the ones without any parallel search drag this way back from where the ones with parallel search are, and the average then says that hardware contributes a lot less than it really does? It is not exactly black and white. One can make a case for either, and one can punch holes in either argument just as easily.

If alpha-beta was impossible to parallelize (fortunately it isn't), parallel hardware wouldn't contribute to chess strength, so it wouldn't get credit. That's clear as day to me...
What if we went back to 1976 or so before there was any thought of parallelizing alpha/beta, even though we had dual and quad-cpu systems around? So that hardware advance would not count in the "did" but would count in the "could" and we'd be having the same discussion back then as well. For the actual numbers, I fudged my calculations to account for the lost performance due to smp overhead. But that "loss" is still there, it still bugs me, and I still take a whack every now and then to reduce it. I do not believe it is an impossible task, just a very difficult one. But today, we are far enough from optimal that there is a _lot_ of room left to improve. That last 5% might be very difficult, but the next 10% from now is not.

But as you said, your results use this way of calculating, so all is well.

If we're going to phrase the question as "how much could hardware contribute to chess strength", we're going to have to do some (potentially) pretty silly stuff like considering the use of all the floating point hardware, BCD instructions and all sort of stuff that sees little or no use for chess programs. I don't see the point of counting things that might or might not help.

And then we would also have to ask "how much could software contribute", which would be even harder to answer.

Punching any such holes in my argument doesn't really work, at least not in the way you put it because all I've been advocating is to take the top program (or let's say, the top open source program since that's the one we can study to see how it uses the hardware and how its software works).

Milos · Post by **Milos** » Tue Sep 14, 2010 11:47 pm

bob wrote:I do have 8 core boxes, and I'd be more than happy to run the test for 1, 2, 4 and 8. But I have already done that, with no surprises at all. I don't run many 8-core tests because that machine has 70 nodes, and I can only run 70 games at a time, which is slow enough to make me not do it unless I am testing something new in the parallel search and want to see how it does.

Well you only repeat enormous number of times you've done something, however, you never present any new results except what you had in your Cray Blitz paper from the last millennium.
I repeat I've never seen a grain of proof for what you are saying.

If what you claim is true taking even 4 and 8 core machine you would have (with your formula):
- on 4 core machine: speedup_a=1+3*0.7=3.1
- on 8 core machine: speedup_b=1+7*0.7=5.9

log(speedup_b/speedup_b)/log(2)*70=65 elo points

There is simply no way you can get 65 elo improvement. And to cite you, been there done that. You don't get even 40 elo.
Once you finally run this kind of test (using programs from this millennium) and realize you are wrong, then you might be able to start thinking of the reasons why...

Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka

Re: Deep Blue vs Rybka