Deep Blue vs Rybka

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Deep Blue vs Rybka

Post by Milos »

bob wrote:Nowhere in my discussion of this formula suggests anything about taking the speedup from 2 and comparing it to 4 and making any sense. This is not a formula that is intended to do anything other than suggest a speedup for a given number of processors. What does "3.1 / 1.7" mean? What does "more overhead higher speedup" mean/ If you double the number of processors, and don't double the overhead, of _course_ you can get a higher speedup. But you are trying to use numbers that are estimates, to compute something that doesn't mean anything at all. So keep trying. I'm not going to argue something so stupid over and over. You want to treat the formula like it is accurate to 3 decimal places and then use predictive values to compute something that is even more accurate. Good luck with that.
I'm not talking about your formula, its approximation or anything like that. You talk with yourself on that topic.
I'm talking about your published speedup results which claim that speedup when going from 4 to 8 is higher than the speedup when going from 2 to 4 which is again higher than speedup when going from 1 to 2.
This is absolutely impossible. There are no ifs and buts. It's impossible, period.
And it's rather simple and fundamental thing. You not being able to understand this is really shocking and sad.

You are saying others can verify you results. Nobody can. You intentionally don't want to provide sufficient material.
You don't provide your testing conditions (was it time to fix depth, what was the fixed depth, which machines were used, what was the size of the hash, which version of Crafty you used, how many consecutive runs you ran per position, etc.). More importantly you don't provide your results in any scientific form (position 1: time to depth 1, etc.), instead you only give final numbers. And the most important thing, you don't provide any test positions you used.

You do all this on purpose so that you could complain when others get results that are not in accordance with your own!
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Deep Blue vs Rybka

Post by Dann Corbit »

Milos wrote:
bob wrote:Nowhere in my discussion of this formula suggests anything about taking the speedup from 2 and comparing it to 4 and making any sense. This is not a formula that is intended to do anything other than suggest a speedup for a given number of processors. What does "3.1 / 1.7" mean? What does "more overhead higher speedup" mean/ If you double the number of processors, and don't double the overhead, of _course_ you can get a higher speedup. But you are trying to use numbers that are estimates, to compute something that doesn't mean anything at all. So keep trying. I'm not going to argue something so stupid over and over. You want to treat the formula like it is accurate to 3 decimal places and then use predictive values to compute something that is even more accurate. Good luck with that.
I'm not talking about your formula, its approximation or anything like that. You talk with yourself on that topic.
I'm talking about your published speedup results which claim that speedup when going from 4 to 8 is higher than the speedup when going from 2 to 4 which is again higher than speedup when going from 1 to 2.
This is absolutely impossible. There are no ifs and buts. It's impossible, period.
And it's rather simple and fundamental thing. You not being able to understand this is really shocking and sad.

You are saying others can verify you results. Nobody can. You intentionally don't want to provide sufficient material.
You don't provide your testing conditions (was it time to fix depth, what was the fixed depth, which machines were used, what was the size of the hash, which version of Crafty you used, how many consecutive runs you ran per position, etc.). More importantly you don't provide your results in any scientific form (position 1: time to depth 1, etc.), instead you only give final numbers. And the most important thing, you don't provide any test positions you used.

You do all this on purpose so that you could complain when others get results that are not in accordance with your own!
So the whole thing was a grand scheme by Dr. Hyatt so that he could complain.

Nothing quite so enjoyable as complaining. Who can blame him?
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: Deep Blue vs Rybka

Post by mhull »

Milos wrote:You are saying others can verify you results. Nobody can. You intentionally don't want to provide sufficient material.
Good point. He should follow your example and just talk with zero material.
Milos wrote:You don't provide your testing conditions (was it time to fix depth, what was the fixed depth, which machines were used, what was the size of the hash, which version of Crafty you used, how many consecutive runs you ran per position, etc.). More importantly you don't provide your results in any scientific form (position 1: time to depth 1, etc.), instead you only give final numbers. And the most important thing, you don't provide any test positions you used.
Ye have not because ye ask not.

Milos wrote:You do all this on purpose so that you could complain when others get results that are not in accordance with your own!
Right, like the time you posted your results. Good point, Milos.
Matthew Hull
frankp
Posts: 228
Joined: Sun Mar 12, 2006 3:11 pm

Re: Deep Blue vs Rybka

Post by frankp »

Dann Corbit wrote:
Milos wrote:
bob wrote:Nowhere in my discussion of this formula suggests anything about taking the speedup from 2 and comparing it to 4 and making any sense. This is not a formula that is intended to do anything other than suggest a speedup for a given number of processors. What does "3.1 / 1.7" mean? What does "more overhead higher speedup" mean/ If you double the number of processors, and don't double the overhead, of _course_ you can get a higher speedup. But you are trying to use numbers that are estimates, to compute something that doesn't mean anything at all. So keep trying. I'm not going to argue something so stupid over and over. You want to treat the formula like it is accurate to 3 decimal places and then use predictive values to compute something that is even more accurate. Good luck with that.
I'm not talking about your formula, its approximation or anything like that. You talk with yourself on that topic.
I'm talking about your published speedup results which claim that speedup when going from 4 to 8 is higher than the speedup when going from 2 to 4 which is again higher than speedup when going from 1 to 2.
This is absolutely impossible. There are no ifs and buts. It's impossible, period.
And it's rather simple and fundamental thing. You not being able to understand this is really shocking and sad.

You are saying others can verify you results. Nobody can. You intentionally don't want to provide sufficient material.
You don't provide your testing conditions (was it time to fix depth, what was the fixed depth, which machines were used, what was the size of the hash, which version of Crafty you used, how many consecutive runs you ran per position, etc.). More importantly you don't provide your results in any scientific form (position 1: time to depth 1, etc.), instead you only give final numbers. And the most important thing, you don't provide any test positions you used.

You do all this on purpose so that you could complain when others get results that are not in accordance with your own!
So the whole thing was a grand scheme by Dr. Hyatt so that he could complain.

Nothing quite so enjoyable as complaining. Who can blame him?
For years I have watched Bob engage in these antics... sticking to point, basing his arguments on facts and producing data to back his claims - or even testing hypotheses for heavens sake. Appalling behaviour designed to end arguments. He is famous for it :-)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Deep Blue vs Rybka

Post by bob »

ernest wrote:
bob wrote:Using my formula

Code: Select all

#cpus     speedup
      1             1.0
      2             1.7
      4             3.1
      8             5.9
Hi Bob,
I wouldn't dream to be rude (I'm always astonished by that Milos...), but for me, the following numbers have been given, for n-core efficiency (including by Vasik Rajlich):

Code: Select all

single core   x1
dual   core   x1.7
quad          x2.8
octal         x4.4
So I must say I am a little surprised by your x3.1 and x5.9

Is that something new, or are we talking of something else than "efficiency" (average time to solution)?
No idea what he means. About 4-5 years ago, AMD gave me access to a quad-dual-core box. I ran the Cray Blitz positions (Vincent wanted to see these specific positions to compare to DTS) a _bunch_ of times, using 1, 2, 4 and 8 processors. I then copied all the log files to my ftp box for everyone to look at. Martin F(not sure about the spelling of his last name today) took them all (there was one log-file per 40+ position run, a couple of runs with 2 processors 4 runs with 4 processors, and I think 16 runs with 16 cpus. I still have the old data and can look to confirm this, if needed. Someone back then (Vincent I believe) was saying my approximation was very optimistic. What Martin found, and published here, and others confirmed by running their own tests, was that for 2, 4 and 8 it was pessimistic. The main number I remember (and perhaps someone can dig up his old post) he found the average speedup for 4 cpus was 3.3 where my formula suggested 3.1. 8 was also a bit higher than the formula.

Pretty well ended the debate.

I have no idea what Vas did for his parallel search. I can only report on mine, and there's no point in my making up numbers since everyone can go back and rerun any test I report on since I make the source available. Will today's program match those numbers? No idea, but I would expect it to. The more aggressive pruning, and the reduction stuff might even require some more tuning, since I have not tested nor tuned SMP search since the more aggressive stuff was added this past year or so.

The test was simply time to a specific depth, for the same program, same hardware, same positions. SMP speedup is always pretty "noisy" for chess, and the more cpus involved, the more noise there is. As a result, I always run each test multiple times and average the results. The more CPUs used, the more times I rerun the test to try to smooth out the noise.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Deep Blue vs Rybka

Post by bob »

Milos wrote:
bob wrote:Nowhere in my discussion of this formula suggests anything about taking the speedup from 2 and comparing it to 4 and making any sense. This is not a formula that is intended to do anything other than suggest a speedup for a given number of processors. What does "3.1 / 1.7" mean? What does "more overhead higher speedup" mean/ If you double the number of processors, and don't double the overhead, of _course_ you can get a higher speedup. But you are trying to use numbers that are estimates, to compute something that doesn't mean anything at all. So keep trying. I'm not going to argue something so stupid over and over. You want to treat the formula like it is accurate to 3 decimal places and then use predictive values to compute something that is even more accurate. Good luck with that.
I'm not talking about your formula, its approximation or anything like that. You talk with yourself on that topic.
I'm talking about your published speedup results which claim that speedup when going from 4 to 8 is higher than the speedup when going from 2 to 4 which is again higher than speedup when going from 1 to 2.
This is absolutely impossible. There are no ifs and buts. It's impossible, period.
And it's rather simple and fundamental thing. You not being able to understand this is really shocking and sad.

You are saying others can verify you results. Nobody can. You intentionally don't want to provide sufficient material.
You don't provide your testing conditions (was it time to fix depth, what was the fixed depth, which machines were used, what was the size of the hash, which version of Crafty you used, how many consecutive runs you ran per position, etc.). More importantly you don't provide your results in any scientific form (position 1: time to depth 1, etc.), instead you only give final numbers. And the most important thing, you don't provide any test positions you used.

You do all this on purpose so that you could complain when others get results that are not in accordance with your own!
Please show me the speedup results that I have posted that show this. I have, of late, posted only the formula numbers. Which are _not_ speedup numbers. They are "approximate speedup numbers". If you don't get that, you are pretty dense. If you do get that, then you are just being an ass. Either way, I'm not going to continue this much longer.

As far as the test positions go, did you read the DTS paper? The positions were included. Oh, that's right, you don't read. Did you go back thru the archives to see this discussion about 4-5 years ago when we had access to an AMD 4x2 system for a couple of months? Right, again, you don't read. Otherwise you would have found that the positions were posted here, and put on my ftp box. The raw log files were put on my web site. In fact, it seems you know _absolutely_ nothing about this subject at all.

As far as "others" go, to date no one has gotten different results from what was in the log files, unless they just run 2-3 positions and stop. Takes some time to get a good speedup number when a program is non-deterministic to begin with (sorry for using big words, look 'em up to see what I mean) and parallel search has a lot of noise because of this. But I ran the tests and made the results public without even knowing what the numbers showed until Martin F. did the calculations...

BTW, if you want to take your ass suit off for a minute, I can easily post the positions here if you are too lazy to find them. They are not "top secret" and were published in the JICCA paper. Actually I just looked and rather than publishing each individual position in that game, I published the entire game and told where I started the analysis. I stopped after the game was effectively over, because for easy positions, parallel speedup is often super-linear and that distorts the results favorably. I wanted representative data, not overly optimistic results. go to www.cis.uab.edu, click <about us>, then click <faculty>, then click <hyatt> then scroll down to "online publications" and click that. Then scroll down to the DTS paper and click that. And lo and behold, there is the paper, with the game, and instructions as to which positions were used. The "secret positions" that I don't tell anyone about.

I looked thru _my_ archives, and I found 4 logs that Martin sent to me during the discussion. These were for Crafty version 19.10, Looking at that source code (also hidden in plain sight on my ftp box over the years) quite a few of those files were modified in 2004, so that gives a good indication that this test was run in the first half of 2004. The primary issue back then was that my formula predicted 3.1, Vincent said "impossible" (sound familiar). Martin took my mt=0 log, one mt=2 log, and 2 mt=4 logs, and did the speedup analysis by hand, He took the max depth the 1 cpu version reached, and then found that depth in the 2/4 cpu log files, and computed the speedup exactly as any normal person would. He then averaged all the 4 cpu speedups together, and posted here in CCC that the final answer was 3.3... I still have those logs if you want to see them. The entire set used to be on the ftp box, somewhere as we moved from one platform to another, I removed things that were not being used. I can probably find the files, buried in a stack of DVDs a few hundred deep, if absolutely necessary.

So, you up to the task of puttin' up or shuttin' up? You can have the hidden data and the hidden positions that others have seen for years, and you can do the speedup calculations for yourself. And I might take the time to re-run the tests to see if anything has changed since 04. Don't know and am perfectly willing to admit it.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Deep Blue vs Rybka

Post by Milos »

bob wrote:Please show me the speedup results that I have posted that show this. I have, of late, posted only the formula numbers. Which are _not_ speedup numbers. They are "approximate speedup numbers".
You say:
I ran the test positions Vincent asked for, and posted the results on my ftp box. Several looked at them, Martin F. took the time to go thru several hundred positions to compute the speedup and found that my estimation formula was a bit off. For example, real was 3.3, predicted was 3.1.. for 8, real was 6.3 (I believe) and predicted was 5.9.
So if speedup from 1 to 2 did not miraculously become 2 (instead of 1.7), speedups from 2 to 4 and especially 4 to 8 are much larger.
BTW, if you want to take your ass suit off for a minute, I can easily post the positions here if you are too lazy to find them. They are not "top secret" and were published in the JICCA paper. Actually I just looked and rather than publishing each individual position in that game, I published the entire game and told where I started the analysis. I stopped after the game was effectively over, because for easy positions, parallel speedup is often super-linear and that distorts the results favorably. I wanted representative data, not overly optimistic results. go to www.cis.uab.edu, click <about us>, then click <faculty>, then click <hyatt> then scroll down to "online publications" and click that. Then scroll down to the DTS paper and click that. And lo and behold, there is the paper, with the game, and instructions as to which positions were used. The "secret positions" that I don't tell anyone about.
Your enormously exaggerated sarcasm is actually funny :). But I don't mind.
On more serious note, I really thought you went for more representative sample than only 40 consecutive moves from extremely week chess game (probably not even FM level) played couple of decades ago. Ok Vincent asked that. However, this is completely unserious.
More serious test would be including at least something like STS positions.
Regarding number of runs per position (for example 4 for 4 cores) it is an extremely small number. I don't know if you ever tried to make the statistics of randomness (or level of indeterminism) of time to fixed depth on modern SMP machines. Values differ for machines, programs and number of cores but I can give you a number of 30% for relative sigma, which is quite huge and which would demand at least 20 runs per position. For number of actual positions anything below 1000 is just ridiculously small.
On the other hand you didn't yet provide the really important number - what was your fixed depth??
The number is important because it determines how long the test would take.
I see no point with todays hardware to use anything over 15 plies which should be able to be reached in under 5 seconds on average.
So 20 runs x 1000 positions x 4 SMP configuration x 5 seconds should end up in 400000sec which is less than 5 days. Which is not much for a serious test.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Deep Blue vs Rybka

Post by bob »

Milos wrote:
bob wrote:Please show me the speedup results that I have posted that show this. I have, of late, posted only the formula numbers. Which are _not_ speedup numbers. They are "approximate speedup numbers".
You say:
I ran the test positions Vincent asked for, and posted the results on my ftp box. Several looked at them, Martin F. took the time to go thru several hundred positions to compute the speedup and found that my estimation formula was a bit off. For example, real was 3.3, predicted was 3.1.. for 8, real was 6.3 (I believe) and predicted was 5.9.
So if speedup from 1 to 2 did not miraculously become 2 (instead of 1.7), speedups from 2 to 4 and especially 4 to 8 are much larger.

Just try to read. 1.8x for 2 sounds right, but I won't claim to have perfect recall from 6 years ago. 3.3 vs 3.1 was certainly correct, because that was the number that was being contested. Since you obviously won't be willing to expend the data to look at the logs, I'll see if I can find Martin's email where he sent me the numbers extracted from the logs and post those, Then you can wave and sqawk and make all the noise you want..

BTW, if you want to take your ass suit off for a minute, I can easily post the positions here if you are too lazy to find them. They are not "top secret" and were published in the JICCA paper. Actually I just looked and rather than publishing each individual position in that game, I published the entire game and told where I started the analysis. I stopped after the game was effectively over, because for easy positions, parallel speedup is often super-linear and that distorts the results favorably. I wanted representative data, not overly optimistic results. go to www.cis.uab.edu, click <about us>, then click <faculty>, then click <hyatt> then scroll down to "online publications" and click that. Then scroll down to the DTS paper and click that. And lo and behold, there is the paper, with the game, and instructions as to which positions were used. The "secret positions" that I don't tell anyone about.
Your enormously exaggerated sarcasm is actually funny :). But I don't mind.
On more serious note, I really thought you went for more representative sample than only 40 consecutive moves from extremely week chess game (probably not even FM level) played couple of decades ago. Ok Vincent asked that. However, this is completely unserious.
More serious test would be including at least something like STS positions.
Regarding number of runs per position (for example 4 for 4 cores) it is an extremely small number. I don't know if you ever tried to make the statistics of randomness (or level of indeterminism) of time to fixed depth on modern SMP machines. Values differ for machines, programs and number of cores but I can give you a number of 30% for relative sigma, which is quite huge and which would demand at least 20 runs per position. For number of actual positions anything below 1000 is just ridiculously small.
On the other hand you didn't yet provide the really important number - what was your fixed depth??
Your ignorance never ceases to amaze me. There was no "fixed depth". Some positions can go to depth N, some fall 3-4 plies below that. Some go 3-4 plies beyond that. Have you ever actually run a program over a _set_ of positions?

The reason these positions were used is that before I wrote the DTS paper, as opposed to my dissertation, I had been asked "OK, we see what kind of speedup you get on random positions, but how does that translate to games, where you have the hash table carrying information from one position to the next, etc?" I didn't have an answer, so I undertook the challenge to answer that question. I can look thru any longish game Crafty plays on ICC, and find over the course of 10-15 moves search depths that vary by 10 plies.

There are two obvious ways to run a test like this...

(1) choose some fixed depth that is not ridiculous for a 1 cpu search, and then repeat with 2, 4 and 8. Works well

(2) choose some max time you want to use for a single move on the 1 cpu test, and run all the positions using that same time limit for 1, 2, 4 and 8. Then for each position, look at the 1 cpu test and extract the last useful time (either when last full iteration was completed, or the last move change on the PV. The look at the 2, 4 and 8 cpu runs and find that same point (same depth, same move chosen).

Either works. (1) seems easier but is not. It takes several runs to get the depth right, because you can't use constant depth across a set of positions, some will go too fast, some will take too long. Once you get the depths adjusted, you can run the test at will. Until you change the search and do something that lets it search deeper than it used to.

As far as how many runs, I addressed that in my dissertation. I ran everything on a 30-processor sequent, and did statistical analysis to determine how many repeats were needed for each number of processors to get the SD down to something acceptable. For 2 and 4 it doesn't take very many at all. 4 for 4 cpus is reasonable. For 8 processors, at least 8 for a _good_ average. And for 16, double again and the numbers are good. You can go to 32 or 64 runs but the accuracy doesn't improve very much... And the time required becomes unmanagable, unless you have a cluster to sic on this.

The number is important because it determines how long the test would take.
I see no point with todays hardware to use anything over 15 plies which should be able to be reached in under 5 seconds on average.
You don't see any points at all, obviously. Had you read any previous research papers on parallel search, you would have known that deeper gives more accurate numbers. 15 ply searches are a fraction of a second. Time measurement noise becomes larger than the total search time. So get serious and study this about before making ridiculous statements.

So 20 runs x 1000 positions x 4 SMP configuration x 5 seconds should end up in 400000sec which is less than 5 days. Which is not much for a serious test.
And not very useful with some searches taking milliseconds...

Ideally one wants each position to take about as long as the other positions, so that the speedups, when averaged, don't weigh some more heavily than others. But I suppose that is yet another thing you had no clue about?

Doing this is not new. I'm reminded of an old college teacher that kept giving us these tired old "Confucius said" things. But one was pretty good. Confucius says "man can not learn everything, until he first realizes that he doesn't know everything." Think about the implications of that... There is apparently a ton about this subject that you know nothing about.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Deep Blue vs Rybka

Post by Milos »

bob wrote:The reason these positions were used is that before I wrote the DTS paper, as opposed to my dissertation, I had been asked "OK, we see what kind of speedup you get on random positions, but how does that translate to games, where you have the hash table carrying information from one position to the next, etc?" I didn't have an answer, so I undertook the challenge to answer that question. I can look thru any longish game Crafty plays on ICC, and find over the course of 10-15 moves search depths that vary by 10 plies.
Uh where to start... so much nonsense just in one post, it's hard.
You biggest argument is to repeat like parrot to me how ignorant I am, which actually tells about how grounded your claims are.

Lets see, somebody told you you should follow positions from a single chess game without resetting the hash and that would bit a representative sample. My comment is that you really had idiots in your thesis comity. More problematic is you accepted their suggestions...

You don't see any points at all, obviously. Had you read any previous research papers on parallel search, you would have known that deeper gives more accurate numbers. 15 ply searches are a fraction of a second. Time measurement noise becomes larger than the total search time. So get serious and study this about before making ridiculous statements.

And not very useful with some searches taking milliseconds...
Lets see further, you laugh at 15 plies search. 15 plies (with reset hash) on 8 core takes how much? miliseconds?
You must be joking me, it takes at least 0.1s for all your 40 positions. For most of positions it takes about second. I know you can't judge things correctly any more (it obviously comes with age), but you could at least try to run it yourself before writing ridiculous things like this.

In addition in your DTS paper you hardly reached amazing depth of incredible 11plies. How ridiculous that must that be then, when 15 is ridiculously small for you?

And then you talk about accurate numbers by running total of 40 positions. A really good representative sample of a chess game. Oh dear...

There are two obvious ways to run a test like this...
.
.
.
Ideally one wants each position to take about as long as the other positions, so that the speedups, when averaged, don't weigh some more heavily than others. But I suppose that is yet another thing you had no clue about?
No, there are 2 Bob's ways. Bob of course thinks these are only 2 possible ways. This only speaks about Bob's arrogance.
But since Bob is not a god, and his writings are not the bible (even though most of his findings are as old as the later) there is absolutely no reason to follow them.
There is an infinite number of ways, and none of them is more correct then the other (specifically not because Bob says the opposite).
You could for example run up to fixed number of nodes each time.
By running to a fixed depth you get differently shaped (sized) trees you get more variety and this is certainly not worse than to run for always the same amount of time.
As far as how many runs, I addressed that in my dissertation. I ran everything on a 30-processor sequent, and did statistical analysis to determine how many repeats were needed for each number of processors to get the SD down to something acceptable. For 2 and 4 it doesn't take very many at all. 4 for 4 cpus is reasonable. For 8 processors, at least 8 for a _good_ average. And for 16, double again and the numbers are good.
Lol at some OS from the time when Linus Torvalds was not born. Try something from this millennium you might get surprised with results :D.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Deep Blue vs Rybka

Post by bob »

Milos wrote:
bob wrote:The reason these positions were used is that before I wrote the DTS paper, as opposed to my dissertation, I had been asked "OK, we see what kind of speedup you get on random positions, but how does that translate to games, where you have the hash table carrying information from one position to the next, etc?" I didn't have an answer, so I undertook the challenge to answer that question. I can look thru any longish game Crafty plays on ICC, and find over the course of 10-15 moves search depths that vary by 10 plies.
Uh where to start... so much nonsense just in one post, it's hard.
You biggest argument is to repeat like parrot to me how ignorant I am, which actually tells about how grounded your claims are.

Lets see, somebody told you you should follow positions from a single chess game without resetting the hash and that would bit a representative sample. My comment is that you really had idiots in your thesis comity. More problematic is you accepted their suggestions...

So, up until 1995 (or whenever that was published) no one had _ever_ considered the actual speedup produced over a real game. Just random positions chosen from here and there. Several had noted that "deeper is better" for parallel search, something still true today. And YOU think that trying to answer that question is not worthwhile? No point in knowing what the speedup is in a game? I mean we don't actually use these things to play games anyway, correct?

The only idiot here is arguing with me at the moment.


You don't see any points at all, obviously. Had you read any previous research papers on parallel search, you would have known that deeper gives more accurate numbers. 15 ply searches are a fraction of a second. Time measurement noise becomes larger than the total search time. So get serious and study this about before making ridiculous statements.

And not very useful with some searches taking milliseconds...
Lets see further, you laugh at 15 plies search. 15 plies (with reset hash) on 8 core takes how much? miliseconds?
You must be joking me, it takes at least 0.1s for all your 40 positions. For most of positions it takes about second. I know you can't judge things correctly any more (it obviously comes with age), but you could at least try to run it yourself before writing ridiculous things like this.

??? You have already run these positions? I can show you endgame searches that reach 40 plies in < .01 seconds, the smallest amount of time I can measure inside Crafty. Do you know anything about parallel tree search? You do realize there is overhead (computational, not extra nodes) to simply kick off parallel searches? And of course there is the extra node overhead caused by incorrect move ordering. And in a 15 ply search, in simple positions, there is more overhead involved in getting the split done than there is in doing the split search?

This really is pointless when you have no idea whatsoever about how a parallel alpha/beta search works...

In addition in your DTS paper you hardly reached amazing depth of incredible 11plies. How ridiculous that must that be then, when 15 is ridiculously small for you?
In the C90, in 1994, we were doing 11 ply searches in 3 minutes. Today, on an 8-core, I see 24-26-28 depending. 15 sounds pretty miniscule, eh? In 2004 when I ran the last set of tests using the 8-way opteron box, 15 plies was not exactly a long search either, although nowhere near as short as it is today. I just tried a 15 ply search on my 8-core intel box. starting position. Results:

Code: Select all

               15->   0.32   0.36   1. Nf3 Nc6 2. Nc3 Nf6 3. e4 e5 4. d4
                                    exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. Bd3
                                    O-O 8. O-O
              time=0.33  mat=0  n=2128730  fh=91%  nps=6.5M
6.5M nps

Then run a little longer to let the parallel search have a chance to overcome the overhead it incurs and:

Code: Select all

               23    41.31   0.22   1. e4 e5 2. Nf3 Nc6 3. Nc3 Nf6 4. d4
                                    exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. Bc4
                                    O-O 8. O-O d6 9. Bg5 Be6 10. Bxe6 fxe6
                                    11. Qf3 Bd4 12. Rfe1
               23->  48.53   0.22   1. e4 e5 2. Nf3 Nc6 3. Nc3 Nf6 4. d4
                                    exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. Bc4
                                    O-O 8. O-O d6 9. Bg5 Be6 10. Bxe6 fxe6
                                    11. Qf3 Bd4 12. Rfe1
              24    48.53   1/20*  1. e4      &#40;14.9Mnps&#41;             
              time=59.88  mat=0  n=912806093  fh=91%  nps=15.2M
So, now 2.5x faster.

See what I mean? There's a _lot_ you don't know about this that those of us doing this stuff do understand.


And then you talk about accurate numbers by running total of 40 positions. A really good representative sample of a chess game. Oh dear...
Funny. I've been playing chess for 50 years now. I don't play many 200 move games. In fact I don't recall having played _any_ games over 100 moves long in all the tournaments and skittles games I have played. I excluded the first 10 moves or so because those were book moves. Hard to get a parallel speedup there, eh? And I excluded that last 10-15 moves because the game was over and the search speedup was greatly inflated once mate is found. Seems like the only viable way to answer the question, given the difficulty of accessing a 70 million buck machine in 1994 when the actual game was played in the ACM tournament.


There are two obvious ways to run a test like this...
.
.
.
Ideally one wants each position to take about as long as the other positions, so that the speedups, when averaged, don't weigh some more heavily than others. But I suppose that is yet another thing you had no clue about?
No, there are 2 Bob's ways. Bob of course thinks these are only 2 possible ways. This only speaks about Bob's arrogance.
Or your ignorance. I'm waiting for an alternative. I'll bet this is going to be earth-shattering in its significance. Or stupidity. I'll read on to see.


But since Bob is not a god, and his writings are not the bible (even though most of his findings are as old as the later) there is absolutely no reason to follow them.
There is an infinite number of ways, and none of them is more correct then the other (specifically not because Bob says the opposite).
You could for example run up to fixed number of nodes each time.
First suggestion is stupidity. Now you factor out search overhead because you always stop after N nodes. Great parallel speedup to do that. Meaningless. But great. On to the next...

By running to a fixed depth you get differently shaped (sized) trees you get more variety and this is certainly not worse than to run for always the same amount of time.
So early in the game, your fixed depth searches take 2 minutes. Late in the game they take 2 seconds. And you are going to take the speedup for a 2 minute computation and treat it equally to the speedup for a 2 second computation? Which would help the program the most, a good speedup on the big tree, or a good speedup on the small tree. If you search the small tree in 0 seconds, you only save 2 seconds total to help you in the game. If you search the 2 minute position in only 1 minute, a paltry 2x speedup, you just saved a minute. In a timed game of chess, which is more important?

So stupid idea #2. Got any more of these?

As far as how many runs, I addressed that in my dissertation. I ran everything on a 30-processor sequent, and did statistical analysis to determine how many repeats were needed for each number of processors to get the SD down to something acceptable. For 2 and 4 it doesn't take very many at all. 4 for 4 cpus is reasonable. For 8 processors, at least 8 for a _good_ average. And for 16, double again and the numbers are good.
Lol at some OS from the time when Linus Torvalds was not born.
You do realize the recent data was on linux in 2004? The original was done on unix in 1994. No significant difference. In fact, for threads, the Cray was significantly better with no virtual memory, -much- more efficient semaphore mechanism, etc. Your ignorance might be approaching boundless...
Try something from this millennium you might get surprised with results :D.
Was not 2004 from "this millennium? quad socket dual core AMD opteron box. Running Linux.

Now we can definitely say "boundless". You have no clue what has been done, what has been written, or even what this experiment was about. Which is no less than I expected, of course...

Keep trying, so far batting average is 0.000, if you actually know what that means.