Milos wrote:bob wrote:The reason these positions were used is that before I wrote the DTS paper, as opposed to my dissertation, I had been asked "OK, we see what kind of speedup you get on random positions, but how does that translate to games, where you have the hash table carrying information from one position to the next, etc?" I didn't have an answer, so I undertook the challenge to answer that question. I can look thru any longish game Crafty plays on ICC, and find over the course of 10-15 moves search depths that vary by 10 plies.
Uh where to start... so much nonsense just in one post, it's hard.
You biggest argument is to repeat like parrot to me how ignorant I am, which actually tells about how grounded your claims are.
Lets see, somebody told you you should follow positions from a single chess game without resetting the hash and that would bit a representative sample. My comment is that you really had idiots in your thesis comity. More problematic is you accepted their suggestions...
So, up until 1995 (or whenever that was published) no one had _ever_ considered the actual speedup produced over a real game. Just random positions chosen from here and there. Several had noted that "deeper is better" for parallel search, something still true today. And YOU think that trying to answer that question is not worthwhile? No point in knowing what the speedup is in a game? I mean we don't actually use these things to play games anyway, correct?
The only idiot here is arguing with me at the moment.
You don't see any points at all, obviously. Had you read any previous research papers on parallel search, you would have known that deeper gives more accurate numbers. 15 ply searches are a fraction of a second. Time measurement noise becomes larger than the total search time. So get serious and study this about before making ridiculous statements.
And not very useful with some searches taking milliseconds...
Lets see further, you laugh at 15 plies search. 15 plies (with reset hash) on 8 core takes how much? miliseconds?
You must be joking me, it takes at least 0.1s for all your 40 positions. For most of positions it takes about second. I know you can't judge things correctly any more (it obviously comes with age), but you could at least try to run it yourself before writing ridiculous things like this.
??? You have already run these positions? I can show you endgame searches that reach 40 plies in < .01 seconds, the smallest amount of time I can measure inside Crafty. Do you know anything about parallel tree search? You do realize there is overhead (computational, not extra nodes) to simply kick off parallel searches? And of course there is the extra node overhead caused by incorrect move ordering. And in a 15 ply search, in simple positions, there is more overhead involved in getting the split done than there is in doing the split search?
This really is pointless when you have no idea whatsoever about how a parallel alpha/beta search works...
In addition in your DTS paper you hardly reached amazing depth of incredible 11plies. How ridiculous that must that be then, when 15 is ridiculously small for you?
In the C90, in 1994, we were doing 11 ply searches in 3 minutes. Today, on an 8-core, I see 24-26-28 depending. 15 sounds pretty miniscule, eh? In 2004 when I ran the last set of tests using the 8-way opteron box, 15 plies was not exactly a long search either, although nowhere near as short as it is today. I just tried a 15 ply search on my 8-core intel box. starting position. Results:
Code: Select all
15-> 0.32 0.36 1. Nf3 Nc6 2. Nc3 Nf6 3. e4 e5 4. d4
exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. Bd3
O-O 8. O-O
time=0.33 mat=0 n=2128730 fh=91% nps=6.5M
6.5M nps
Then run a little longer to let the parallel search have a chance to overcome the overhead it incurs and:
Code: Select all
23 41.31 0.22 1. e4 e5 2. Nf3 Nc6 3. Nc3 Nf6 4. d4
exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. Bc4
O-O 8. O-O d6 9. Bg5 Be6 10. Bxe6 fxe6
11. Qf3 Bd4 12. Rfe1
23-> 48.53 0.22 1. e4 e5 2. Nf3 Nc6 3. Nc3 Nf6 4. d4
exd4 5. Nxd4 Bc5 6. Nxc6 bxc6 7. Bc4
O-O 8. O-O d6 9. Bg5 Be6 10. Bxe6 fxe6
11. Qf3 Bd4 12. Rfe1
24 48.53 1/20* 1. e4 (14.9Mnps)
time=59.88 mat=0 n=912806093 fh=91% nps=15.2M
So, now 2.5x faster.
See what I mean? There's a _lot_ you don't know about this that those of us doing this stuff do understand.
And then you talk about accurate numbers by running total of 40 positions. A really good representative sample of a chess game. Oh dear...
Funny. I've been playing chess for 50 years now. I don't play many 200 move games. In fact I don't recall having played _any_ games over 100 moves long in all the tournaments and skittles games I have played. I excluded the first 10 moves or so because those were book moves. Hard to get a parallel speedup there, eh? And I excluded that last 10-15 moves because the game was over and the search speedup was greatly inflated once mate is found. Seems like the only viable way to answer the question, given the difficulty of accessing a 70 million buck machine in 1994 when the actual game was played in the ACM tournament.
There are two obvious ways to run a test like this...
.
.
.
Ideally one wants each position to take about as long as the other positions, so that the speedups, when averaged, don't weigh some more heavily than others. But I suppose that is yet another thing you had no clue about?
No, there are 2 Bob's ways. Bob of course thinks these are only 2 possible ways. This only speaks about Bob's arrogance.
Or your ignorance. I'm waiting for an alternative. I'll bet this is going to be earth-shattering in its significance. Or stupidity. I'll read on to see.
But since Bob is not a god, and his writings are not the bible (even though most of his findings are as old as the later) there is absolutely no reason to follow them.
There is an infinite number of ways, and none of them is more correct then the other (specifically not because Bob says the opposite).
You could for example run up to fixed number of nodes each time.
First suggestion is stupidity. Now you factor out search overhead because you always stop after N nodes. Great parallel speedup to do that. Meaningless. But great. On to the next...
By running to a fixed depth you get differently shaped (sized) trees you get more variety and this is certainly not worse than to run for always the same amount of time.
So early in the game, your fixed depth searches take 2 minutes. Late in the game they take 2 seconds. And you are going to take the speedup for a 2 minute computation and treat it equally to the speedup for a 2 second computation? Which would help the program the most, a good speedup on the big tree, or a good speedup on the small tree. If you search the small tree in 0 seconds, you only save 2 seconds total to help you in the game. If you search the 2 minute position in only 1 minute, a paltry 2x speedup, you just saved a minute. In a timed game of chess, which is more important?
So stupid idea #2. Got any more of these?
As far as how many runs, I addressed that in my dissertation. I ran everything on a 30-processor sequent, and did statistical analysis to determine how many repeats were needed for each number of processors to get the SD down to something acceptable. For 2 and 4 it doesn't take very many at all. 4 for 4 cpus is reasonable. For 8 processors, at least 8 for a _good_ average. And for 16, double again and the numbers are good.
Lol at some OS from the time when Linus Torvalds was not born.
You do realize the recent data was on linux in 2004? The original was done on unix in 1994. No significant difference. In fact, for threads, the Cray was significantly better with no virtual memory, -much- more efficient semaphore mechanism, etc. Your ignorance might be approaching boundless...
Try something from this millennium you might get surprised with results
.
Was not 2004 from "this millennium? quad socket dual core AMD opteron box. Running Linux.
Now we can definitely say "boundless". You have no clue what has been done, what has been written, or even what this experiment was about. Which is no less than I expected, of course...
Keep trying, so far batting average is 0.000, if you actually know what that means.