Repetition detection structure.

hgm · Post by **hgm** » Sun Mar 02, 2008 8:17 pm

wgarvin wrote:An honest question for hgm.

I've seen a lot of posts from you written in this adversarial style, and I'm just wondering, why do you write like this? If the goal is to discuss things productively and share ideas and learn from each other, what is gained by the adversarial tone? It seems like most of your posts (or your posts to Bob at least) contain statements that have no purpose except to be inflammatory.
....

The problem is that when I write 1+1=2, and someone starts attacking it, there is zero to be learned from discussing it and sharing ideas. I am always willing to explain things in detail for those who are interested and might learn from it, but it has to stop somewhere. If after 500 lines of explaining how addition works, the person for whom I take the trouble of writing it insists that 1+1 is still 3, and continues to press this point in an harrassing way until long after I have indicated that I am no longer interested to continue the discussion, I see little alternatives to making that point with a bunker buster.

hristo · Post by **hristo** » Sun Mar 02, 2008 8:33 pm

hgm wrote:
In the case of SMP, it is not immediately obvious that your approach would be better. I'm sorry, but in this case I dare not speculate. I have spent way too much time debugging threads -- so, "show me the money/profiler output"!
Well, it will obviously depend on the SMP implementation. The way I plan to implement SMP (which will of course be much more efficient than the way Bob does it ), it wouldn't matter at all. The structure will never have to be passed from one thread to another. In fact nothing would ever be passed from one thread to another, except through hiding messages in the hash table.

This is fascinating!
I don't understand how you are going to encode the 'messages' so that they are thread safe, but that is probably because I'm missing something.

I did go back and reread all of the posts related to SMP (within this thread) and you outline a simple strategy that might be beneficial when searching for and setting up a split point. You said:

My idea was that the search almost entirely consists of refutation trees, where every all-node is a possible split point. So typically, there will be a split point at each even or odd level. So as soon as you run out of unsearched moves in the splitpoint you are currently in, just follow move two levels closer to the root, or two levels deeper in the tree, and you are likely to find another one. So why backup all the way to the root, only to return to nearly the same spot in the tree?

Indeed, this sounds interesting and demonstrates the possibility of doing differential updates to the history list.

What bothers me, however, are the following aspects:
1) Some data must be shared between the threads.
2) The shared data must be protected by a mutex, or semaphore, if it is used in a non-atomic operations.
3) What is the data that will have to be protected and how invasive is such protection? (!!)

In a general sense it is much better to reduce the contention for singular resources in an SMP environment. IOW, "lock -> divorce -> unlock -> split and after that the 'divorce' gets to run wild" is better than "checking for chastity too often".
If you don't copy any data during a split then it might become expensive to ensure that the split occurs without errors, unless you introduce lock/unlock in places that are not part of the split.

Now,
if you have a way of bypassing the "lock/unlock" requirement by simply using 'messages encoded inside the hash' then I suggest you consider patenting or selling this idea!!! That would be awesome!!!

hgm wrote: (Which should not cause any communication overhead, as hash probes are supposed to be cache misses in the first place. If the data is currently owned by another core, the bus-snooping unit will substitute it for the memory data, providing a shorter access time rather than a longer one.)

H.G.,
you might be correct about all of this ... but, it does sound like black-magic to me!

One more thing,
I watched quite a few more games on your server (thank you) and ... well, Joker8014K (or something like that) made an illegal move this morning -- it did move a pawn from c6 to d5 without anything being on d5 to be captured.
Then another program (not sure which one) made a move with the rook a1-c1 while there was a Knight on b1. (?)

Regards,
Hristo

hristo · Post by **hristo** » Mon Mar 03, 2008 1:38 am

hristo wrote: One more thing,
I watched quite a few more games on your server (thank you) and ... well, Joker8014K (or something like that) made an illegal move this morning -- it did move a pawn from c6 to d5 without anything being on d5 to be captured.

Oops ... make this from c8 to d7, if you number the verticals from 1 to 10. ("Regular chess" bias is to blame)

bob · Post by **bob** » Mon Mar 03, 2008 3:33 am

hristo wrote:
hgm wrote:
bob wrote:Let me rephrase my comment. "What exactly are you talking about?"
Well, read back and you will know...

In short: you are claiming that 1/x is always smaller than 10/y because 1 < 10, no matter what x and y are ("regardless of investment"). That says much for your abilities in arithmetic.
H.G.,
this was not Bob's argument, even though you could infer the above, it was certainly not the central part of his argument.

(I must admit that sometimes you guys argue about strange things -- even though both of you are certifiably smart.)

For instance you said: "As the table is so small that it always resides in the L1 cache, this must be faster than the linear-array method." which presumes that the linear array doesn't fit in the cache or that in the process of searching, one would have to hit L2 or main memory. The evidence presented, by Bob, shows that L1 would be sufficient in his case -- searching only a single entry, on average, should fit in L1 just fine.

Both of you started this discussion in a rather cordial tone and then you both zeroed-in on a particular aspect of what the other side was saying that could be argued against.

For the purpose of chess programming, however, I believe that your approach (if it is a bit faster, on average, compared to the list method) might yield better chess-engine results for single CPU implementations. Based on pure speculation, having 0.1% extra time for eval and search (etc.) would be more beneficial than handling the fringe cases of the 50 move rule.
Besides, your implementation seems rather cute, just like you.

In the case of SMP, it is not immediately obvious that your approach would be better. I'm sorry, but in this case I dare not speculate. I have spent way too much time debugging threads -- so, "show me the money/profiler output"!
hgm wrote: As for the relative speed: count instructions, and you will know. Or is counting too difficult too?
Even if your code is smaller and executes a bit faster, this is not sufficient to estimate its 'cost' in execution time. For instance if your code must be called more often (than the simple list search) and from different portions of the evaluation (or a function, program execution) it is possible that _it_ [your code] will cause extra cache misses ... so, just counting instructions is (quite often) not enough.

(BTW, why the vitriolic remark?)

Bob,
I agree that optimization must be considered from the top ("look into the highest time-consuming functions and algorithms, first"). However, it is perfectly reasonable to consider optimizations at other levels (including the bottom) so long as such optimizations can be regression tested and are opaque, with respect to the rest of the application. Always optimizing at the top can, unnecessarily, restricts one from gathering a low-hanging fruit elsewhere.

Regards to both of you,
Hristo

My point was simply that if the goal is to make the program measurably faster, then the place to start is at the top. I clearly showed that in my case, 0.51% over a pretty long game that ended by 50-move rule, was not the place to start. And, in fact, I questioned whether the hash approach was any faster at all, giving that 0.51% as "the target to beat". If hashing is no faster, then the discussion was stupid to start with. If it is faster, it still is not worth the effort of rewriting repetition detection until all the other things have been addressed. This is not exactly low-hanging fruit. It is more a "low-hanging molecule" that is not going to offer much for the effort of picking it off...

One certainly doesn't have to make his first programming effort on the procedure at the top of the profile. I often look at the top group and figure out where I might get the best speedup the quickest and start there. For example, the work on rotated bitboards years ago was not on the code that used the most time. But by that same reasoning, I am certainly not going to the bottom of the heap and try to speed up a 0.51% module when (a) it is not clear that the proposed improvement is any faster; (b) the proposed "improvement" might actually hurt later when it is "SMP time"...

hgm wrote: I think I explained everything crystal clear, and if it is still beyond you, you will simply have to live with that...

bob · Post by **bob** » Mon Mar 03, 2008 3:34 am

hristo wrote:
hgm wrote:
bob wrote:Let me rephrase my comment. "What exactly are you talking about?"
Well, read back and you will know...

In short: you are claiming that 1/x is always smaller than 10/y because 1 < 10, no matter what x and y are ("regardless of investment"). That says much for your abilities in arithmetic.
H.G.,
this was not Bob's argument, even though you could infer the above, it was certainly not the central part of his argument.

(I must admit that sometimes you guys argue about strange things -- even though both of you are certifiably smart.)

For instance you said: "As the table is so small that it always resides in the L1 cache, this must be faster than the linear-array method." which presumes that the linear array doesn't fit in the cache or that in the process of searching, one would have to hit L2 or main memory. The evidence presented, by Bob, shows that L1 would be sufficient in his case -- searching only a single entry, on average, should fit in L1 just fine.

Both of you started this discussion in a rather cordial tone and then you both zeroed-in on a particular aspect of what the other side was saying that could be argued against.

For the purpose of chess programming, however, I believe that your approach (if it is a bit faster, on average, compared to the list method) might yield better chess-engine results for single CPU implementations. Based on pure speculation, having 0.1% extra time for eval and search (etc.) would be more beneficial than handling the fringe cases of the 50 move rule.
Besides, your implementation seems rather cute, just like you.

In the case of SMP, it is not immediately obvious that your approach would be better. I'm sorry, but in this case I dare not speculate. I have spent way too much time debugging threads -- so, "show me the money/profiler output"!
hgm wrote: As for the relative speed: count instructions, and you will know. Or is counting too difficult too?
Even if your code is smaller and executes a bit faster, this is not sufficient to estimate its 'cost' in execution time. For instance if your code must be called more often (than the simple list search) and from different portions of the evaluation (or a function, program execution) it is possible that _it_ [your code] will cause extra cache misses ... so, just counting instructions is (quite often) not enough.

(BTW, why the vitriolic remark?)

Bob,
I agree that optimization must be considered from the top ("look into the highest time-consuming functions and algorithms, first"). However, it is perfectly reasonable to consider optimizations at other levels (including the bottom) so long as such optimizations can be regression tested and are opaque, with respect to the rest of the application. Always optimizing at the top can, unnecessarily, restricts one from gathering a low-hanging fruit elsewhere.

Regards to both of you,
Hristo

My point was simply that if the goal is to make the program measurably faster, then the place to start is at the top. I clearly showed that in my case, 0.51% over a pretty long game that ended by 50-move rule, was not the place to start. And, in fact, I questioned whether the hash approach was any faster at all, giving that 0.51% as "the target to beat". If hashing is no faster, then the discussion was stupid to start with. If it is faster, it still is not worth the effort of rewriting repetition detection until all the other things have been addressed. This is not exactly low-hanging fruit. It is more a "low-hanging molecule" that is not going to offer much for the effort of picking it off...

One certainly doesn't have to make his first programming effort on the procedure at the top of the profile. I often look at the top group and figure out where I might get the best speedup the quickest and start there. For example, the work on rotated bitboards years ago was not on the code that used the most time. But by that same reasoning, I am certainly not going to the bottom of the heap and try to speed up a 0.51% module when (a) it is not clear that the proposed improvement is any faster; (b) the proposed "improvement" might actually hurt later when it is "SMP time"...

hgm wrote: I think I explained everything crystal clear, and if it is still beyond you, you will simply have to live with that...

bob · Post by **bob** » Mon Mar 03, 2008 3:43 am

hgm wrote:
wgarvin wrote:An honest question for hgm.

I've seen a lot of posts from you written in this adversarial style, and I'm just wondering, why do you write like this? If the goal is to discuss things productively and share ideas and learn from each other, what is gained by the adversarial tone? It seems like most of your posts (or your posts to Bob at least) contain statements that have no purpose except to be inflammatory.
....
The problem is that when I write 1+1=2, and someone starts attacking it, there is zero to be learned from discussing it and sharing ideas. I am always willing to explain things in detail for those who are interested and might learn from it, but it has to stop somewhere. If after 500 lines of explaining how addition works, the person for whom I take the trouble of writing it insists that 1+1 is still 3, and continues to press this point in an harrassing way until long after I have indicated that I am no longer interested to continue the discussion, I see little alternatives to making that point with a bunker buster.

The _real_ problem is that you believe every statement you make is of the 1+1 = 2 type. And _that_ is something that could be debated regularly. Your hashing approach seems to be no faster than what I am doing. 1+1 = 2 notwithstanding. If you have two threads searching at the same time, they will either each have a separate repetition mechanism, or they will have a shared mechanism with significant locking/unlocking overhead. There is no other solution since both are searching different sub-trees at the same time and the repetitions can not interact unless they are supposed to. Again, 1+1 = 2 notwithstanding...

So as long as you believe _everything_ you stay is of the quality of an arithmetic "fact" you can expect arguments from time to time. Particularly since I have the experience in the area of SMP search, since my first parallel search was used at the 1978 ACM CCC event in washington DC. I've had just a "tad" of experience trying every algorithm discussed to date. You'll understand the repetition problem once you get there. Then suddenly you will see that 1+1 = 3 is going to draw fire...

hristo · Post by **hristo** » Mon Mar 03, 2008 4:08 am

bob wrote:
hristo wrote: Bob,
I agree that optimization must be considered from the top ("look into the highest time-consuming functions and algorithms, first"). However, it is perfectly reasonable to consider optimizations at other levels (including the bottom) so long as such optimizations can be regression tested and are opaque, with respect to the rest of the application. Always optimizing at the top can, unnecessarily, restricts one from gathering a low-hanging fruit elsewhere.

Regards to both of you,
Hristo
My point was simply that if the goal is to make the program measurably faster, then the place to start is at the top. I clearly showed that in my case, 0.51% over a pretty long game that ended by 50-move rule, was not the place to start.

Bob,
my comments were with respect to the general principles of optimization.
In all cases the programmer makes the judgment call and decides which portions of the application should be improved and it is a good idea to look at the top, first.

bob wrote: And, in fact, I questioned whether the hash approach was any faster at all, giving that 0.51% as "the target to beat". If hashing is no faster, then the discussion was stupid to start with.

Yes, you did offer a clear example and a target to beat.

(this is the beauty of the discussion .. thank you)

However, I presumed that the hash approach is faster (yes, you can rightfully tell me to "take a hike") and the reason is that taking this position makes the argument more meaningful, in the general sense, and gives a very interesting example of how difficult optimization issues can be.

bob wrote: If it is faster, it still is not worth the effort of rewriting repetition detection until all the other things have been addressed.

Here I disagree with you. The "other things" that can be improved are never-ending and it makes perfect sense to improve the little things that you are certain of, while still dwelling on the heavy/important optimization issues. This is a practical approach to improving software and it makes good sense, IMO.

bob wrote: This is not exactly low-hanging fruit. It is more a "low-hanging molecule" that is not going to offer much for the effort of picking it off...

hahahah ... this is funny, but I would take the molecule if I could, while having the brain spliced up, all the same.

bob wrote: One certainly doesn't have to make his first programming effort on the procedure at the top of the profile. I often look at the top group and figure out where I might get the best speedup the quickest and start there. For example, the work on rotated bitboards years ago was not on the code that used the most time. But by that same reasoning, I am certainly not going to the bottom of the heap and try to speed up a 0.51% module when (a) it is not clear that the proposed improvement is any faster; (b) the proposed "improvement" might actually hurt later when it is "SMP time"...

These are all valid objections and I do agree with your concerns!!!

Again,
my comments were with respect to the general issues of optimization and in this regard discouraging people from looking at the "bottom of the heap" is suboptimal. Sometimes the "bottom" is a perfectly reasonable place to enhance, the only problem is that developers might lose sight and spend copious amounts of time for little gain.

It is a tough balance and can happen at the "top of the heap" too, where someone spends years of dealing with problems that yield no real benefit.

(I've done this ... not sure about you)

Regards,
Hristo

wgarvin · Post by **wgarvin** » Mon Mar 03, 2008 7:42 pm

hristo wrote:Here I disagree with you. The "other things" that can be improved are never-ending and it makes perfect sense to improve the little things that you are certain of, while still dwelling on the heavy/important optimization issues. This is a practical approach to improving software and it makes good sense, IMO.

It's important not to lose sight of the fact that every optimization has a cost. That there's a tension between "optimization" and all the other factors that lead to good code. Optimizing tends to make your code harder to read, harder to debug, harder to maintain and harder to change.

Some times, the piece of code affected by the optimization is very small and localized and doesn't mess up the program very much, so its worth doing even if the gain is small. Or maybe it affects a lot of code, but you can get big performance gains by doing the optimization and so its worth dealing with the decreased readability and maintainability. (Just remember to put comments in the optimized code so that when you come back to fix a bug in it 6 months later you can figure out wtf it does!) And still other times, the gain from the optimization is small (like, less than 0.51% ?) and you are better off just writing the simple code that you will be able to maintain more easily later.

Simplicity is a very valuable thing when programming. A very valuable thing. Many people overlook it. Just remember that when you're optimizing some code and its growing to be 2x as long and 3x as complex as the original! Optimization has costs, and you only want to pay those costs if they are justified by real and significant performance gains.

bob · Post by **bob** » Mon Mar 03, 2008 7:53 pm

hristo wrote:
bob wrote:
hristo wrote: Bob,
I agree that optimization must be considered from the top ("look into the highest time-consuming functions and algorithms, first"). However, it is perfectly reasonable to consider optimizations at other levels (including the bottom) so long as such optimizations can be regression tested and are opaque, with respect to the rest of the application. Always optimizing at the top can, unnecessarily, restricts one from gathering a low-hanging fruit elsewhere.

Regards to both of you,
Hristo
My point was simply that if the goal is to make the program measurably faster, then the place to start is at the top. I clearly showed that in my case, 0.51% over a pretty long game that ended by 50-move rule, was not the place to start.
Bob,
my comments were with respect to the general principles of optimization.
In all cases the programmer makes the judgment call and decides which portions of the application should be improved and it is a good idea to look at the top, first.

bob wrote: And, in fact, I questioned whether the hash approach was any faster at all, giving that 0.51% as "the target to beat". If hashing is no faster, then the discussion was stupid to start with.
Yes, you did offer a clear example and a target to beat. (this is the beauty of the discussion .. thank you)

However, I presumed that the hash approach is faster (yes, you can rightfully tell me to "take a hike") and the reason is that taking this position makes the argument more meaningful, in the general sense, and gives a very interesting example of how difficult optimization issues can be.

bob wrote: If it is faster, it still is not worth the effort of rewriting repetition detection until all the other things have been addressed.
Here I disagree with you. The "other things" that can be improved are never-ending and it makes perfect sense to improve the little things that you are certain of, while still dwelling on the heavy/important optimization issues. This is a practical approach to improving software and it makes good sense, IMO.

Only if you ignore one key comment I made. The hashing approach has some issues in a parallel search. In light of the fact that (a) the repetition list is simpler; (b) appears to be equally as fast as hashing; (c) does not involve throwing away something that is already working (the original poster was already using the repetition list approach successfully) and, most importantly (d) switching to an approach that offers significant future issues that the repetition list does not, with respect to a parallel search; it doesn't make much sense to make the change...

I've already tried the hashing approach. Bruce used it in Ferret, and others have used it previously. What would be the point of changing from (a) to (b) if (b) is no faster and also offers some significant future issues that will have to be addressed when a parallel search is done?

That was my point. Early in a program's development, trying to optimize a 0.51% task is pointless. So much will be redesigned and rewritten, there is not much point in tackling something that may well be thrown out later anyway..

bob wrote: This is not exactly low-hanging fruit. It is more a "low-hanging molecule" that is not going to offer much for the effort of picking it off...
hahahah ... this is funny, but I would take the molecule if I could, while having the brain spliced up, all the same.

Even if you knew you would have to throw it back up later and replace it with the original? Or that you would have to later spend significant time figuring out how to make it efficient in a parallel search?

bob wrote: One certainly doesn't have to make his first programming effort on the procedure at the top of the profile. I often look at the top group and figure out where I might get the best speedup the quickest and start there. For example, the work on rotated bitboards years ago was not on the code that used the most time. But by that same reasoning, I am certainly not going to the bottom of the heap and try to speed up a 0.51% module when (a) it is not clear that the proposed improvement is any faster; (b) the proposed "improvement" might actually hurt later when it is "SMP time"...
These are all valid objections and I do agree with your concerns!!!

Again,
my comments were with respect to the general issues of optimization and in this regard discouraging people from looking at the "bottom of the heap" is suboptimal. Sometimes the "bottom" is a perfectly reasonable place to enhance, the only problem is that developers might lose sight and spend copious amounts of time for little gain.

The problem is this: You can only start at some point A and go up or down from there. Starting at the bottom is not the best way to do this. Unless you have already visited everything above that point and either fixed it, or decided it was optimal as is...

It is a tough balance and can happen at the "top of the heap" too, where someone spends years of dealing with problems that yield no real benefit.
(I've done this ... not sure about you)

Regards,
Hristo

I've not spend any substantial amount of time trying to optimize something that could not be optimized. Occasionally things that appeared to be more efficient turned out not to be, of course. But even a miniscule improvement on a "biggee" gives more performance than a big speedup on a 0.51% module...

michiguel · Post by **michiguel** » Mon Mar 03, 2008 8:36 pm

wgarvin wrote:
hristo wrote:Here I disagree with you. The "other things" that can be improved are never-ending and it makes perfect sense to improve the little things that you are certain of, while still dwelling on the heavy/important optimization issues. This is a practical approach to improving software and it makes good sense, IMO.
It's important not to lose sight of the fact that every optimization has a cost. That there's a tension between "optimization" and all the other factors that lead to good code. Optimizing tends to make your code harder to read, harder to debug, harder to maintain and harder to change.

Some times, the piece of code affected by the optimization is very small and localized and doesn't mess up the program very much, so its worth doing even if the gain is small. Or maybe it affects a lot of code, but you can get big performance gains by doing the optimization and so its worth dealing with the decreased readability and maintainability. (Just remember to put comments in the optimized code so that when you come back to fix a bug in it 6 months later you can figure out wtf it does!) And still other times, the gain from the optimization is small (like, less than 0.51% ?) and you are better off just writing the simple code that you will be able to maintain more easily later.

Simplicity is a very valuable thing when programming. A very valuable thing. Many people overlook it. Just remember that when you're optimizing some code and its growing to be 2x as long and 3x as complex as the original! Optimization has costs, and you only want to pay those costs if they are justified by real and significant performance gains.

If I were a faculty at a computer science dept., I would print this message and hand it to the students.

Putting things in terms of "costs" is an interesting way to look at it. Some costs are short term (programming time spent + debugging), but some are long term (readability, maintainability, time spent fixing bugs etc.).

Miguel

Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.

Re: Repetition detection structure.