Interesting null-move test

michiguel · Post by **michiguel** » Sat Mar 27, 2010 7:25 pm

Don wrote:
bob wrote:
Don wrote:
bob wrote:
jdart wrote:> If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change?

I can see this point of view, but the bad thing about null/move zugzwang positions is that if your program has such a problem it can be "blind" to the right move, even with a very deep search, and can go wrong in endings where a human player can see the right way. So it's ugly. If you intend the program to be used by humans for analysis or play, this will likely annoy them if it happens. I tend to think that makes it a priority to fix.

--Jon
What about all the _other_ positions where it provides _better_ analysis by going deeper? Everything we do is a little give and take.
Personally I usually hedge my bets under the theory that a chain is only as strong as it's weakest link. I hate to have some ridiculous weakness in the program even if it only shows up 1 in a million games or has less than 1 ELO impact. So if some change makes the program more correct but actually weakens it by some amount that is difficult to measure, I labor over the decision more.
None of that makes any sense. I'll bet that for _every_ feature in your chess program, I can come up with a position where it works poorly or not at all. Such is the nature of the game. Yet I doubt you would toss everything out. This idea of not doing something because it hurts a certain type of position is nothing but witchcraft/voodo/etc. If something causes a program to win more games when it is enabled than when it is disablede, and the sample size of games is sufficiently large to eliminate random effects, then that idea ought to be enabled. Nothing else makes any sense.

I am not talking about changes that look right and appear to have no effect on overall performance, I am talking about changes that improve Elo over tens of thousands of test games. I don't care how changes affect individual test positions. That idea is so badly flawed it is not worth discussing, except for programs whose sole purpose in life is to address a particular concept such as "chest" for checkmates.

Note that I am not proposing that this null-everywhere is a good one. Only that it is a very tiny bit worse, 1-2 Elo at worst (after about 100,000 games and still counting). And that was quite surprising (that it was so close to break-even).

What I am saying is that I error on the side of a change that makes the program stronger if there is a minor slowdown. For instance there are some changes that cannot possibly make the program play weaker (such as the kvsp database) but might slow it down enough that it actually does make the program play slightly weaker in time control games. If the change hurts the program less than 1 ELO I will tend to favor keeping the change anyway because this change is of the nature that it helps thousands of positions get scored correctly and NEVER causes one to get stored incorrectly. I would rather be correct. If the change was a definite slowdown and thus hurt by a more measurable amount, I would of course reject it. I'm not stupid - I'm only talking about really close calls here.

Some zugzwang issues can cause your program to NEVER see something it needs to see and thus affects the scalability of the program (it will never play perfect chess even with infinite hardware.) Of course that is not a practical issue, but it illustrates the point. It may make a difference with 10X more hardware for example.
That's a testable hypothesis. If it makes a difference at 10x, it will make a smaller (but still measurable) difference at 5x, which can be easily tested.

I am quite sure that you are wrong about this. You are basically claiming that all program features scale exactly the same - but I know this is not true. My claim is that not all features scale the same and this is equivalent to saying that a feature that my help a program now could hurt it later or visa versa.

I'm going to call this my "bottleneck theory" and this is how I believe it works:

I think of zugzwang kind of like what happens when you profile a terribly optimized program. Some routine may take 5% of the execution time so you will focus on 4 other routines that together take 95% of the execution time. After you make those optimizations you may suddenly find that your previous 5% routine is the bottleneck whereas before it was not worth doing anything about.

So things like zugzwang could be a "bottleneck." Zugzwang detection may hurt your program NOW because it is a minor slowdown and fixing it is not enough to make up the difference. But fast forward 100 years, pretend there is no limit to Moore's law, and you may find that most programs are not too far away from perfect play. In such a case it's possible that the program that does not have zugzwang detection is noticeably inferior. It might be the only substantial difference between 1 program over another.

The things that are serious weakness in your program become little "bottlenecks" when the program is doing everything else well, even if they are currently uninportant. Larry and I discovered that our king safety seems to work that way. If I were playing in a 5 ply tournament with Komodo (or anything roughly equivalent in time control settings) I have proved in testing that king safety just slows the program down. At these ridiculous levels king safety is just not the "bottleneck" when it cannot handle even handle relatively trivial tactics. So if Komodo were running on an ancient 8086 CPU I would probably run my tests and conclude (like you would) that king safety is too much of a slowdown to pay off.

So my bottleneck theory is that trivial things wrong with your program that probably make no difference now will hurt you later.

Another example. Piece square tables. Many years ago when most micro's were doing well to get depths of 4 or 5 ply, you could get away with pure piece square tables where the values that went into them were heavily pre-preprocessed. This gives a pretty large boost in speed at the sacrifice of not doing slow more dynamic evaluation such as proper mobility. But the bottleneck in those programs was tactics - mobility did not mean as much when it was only a matter of time before all programs made some serious tactical blunder. The programs that dominated were heavily speed oriented and the programs that tried to be really smart in evaluation suffered. But now if your program is missing a little knowledge you will get punished very brutally.

I'll even go so far as to hypothesize that over the last 4 decades program authors have "unconsciously" scaled their programs up to current hardware and we will continue to do so. Of course program quality has also improved enormously for other reasons too, but I'm saying that what used to not matter does matter now.

Another example: Do you think a program that is running on hardware capable of doing only 3 ply searches would benefit from hash tables? Even if the memory in such a machine were liberal, the benefit from transpositions would be almost non-existent and the slowdown due to the hash table operations would make the program play weaker.

I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.

I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference.

Why everybody assume there must be a slowdown? If things are done properly, you can do it without any noticeable slowdown, so if there is any effect it is always a positive one.

Miguel

But I don't imagine that it will every make a big difference because if programs speed up that much they will indeed play these ending better anyway. Imagine it this way though, in 50 years programs become so strong that almost every game is a draw and when one program beats another its more likely to be due to something like one of these endings (or zugzwang) or something of this nature.

All of this is pretty much theoretical however and programs over the years will automatically scale as needed as the best program authors will always tend to implement things that actually work on the hardware of the day.

bob · Post by **bob** » Sat Mar 27, 2010 10:32 pm

jwes wrote:
bob wrote:
Don wrote:
bob wrote:

I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.

I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference.
I agree, but I believe it will get _worse_ and not better. Disk speeds have not come even close to keeping up with CPU speed improvements. Even good SSDs are hopelessly slow when compared to the speed of the processor, and it is falling farther behind every month.

As memory sizes grow, it becomes possible to cache more of the egtbs, e.g. for crafty on a 16 gm machine, you can use 8gb for tt but what will you use the other 7+ gb of memory for?

The O/S uses that for filesystem cache. I could use it for EGTB cache just as easily, for a very slight gain. But this still doesn't address the issue of whether a feature scales with hardware or just suddenly "appears" at some threshold point.

bob · Post by **bob** » Sun Mar 28, 2010 5:35 pm

michiguel wrote:
Don wrote:
bob wrote:
Don wrote:
bob wrote:
jdart wrote:> If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change?

I can see this point of view, but the bad thing about null/move zugzwang positions is that if your program has such a problem it can be "blind" to the right move, even with a very deep search, and can go wrong in endings where a human player can see the right way. So it's ugly. If you intend the program to be used by humans for analysis or play, this will likely annoy them if it happens. I tend to think that makes it a priority to fix.

--Jon
What about all the _other_ positions where it provides _better_ analysis by going deeper? Everything we do is a little give and take.
Personally I usually hedge my bets under the theory that a chain is only as strong as it's weakest link. I hate to have some ridiculous weakness in the program even if it only shows up 1 in a million games or has less than 1 ELO impact. So if some change makes the program more correct but actually weakens it by some amount that is difficult to measure, I labor over the decision more.
None of that makes any sense. I'll bet that for _every_ feature in your chess program, I can come up with a position where it works poorly or not at all. Such is the nature of the game. Yet I doubt you would toss everything out. This idea of not doing something because it hurts a certain type of position is nothing but witchcraft/voodo/etc. If something causes a program to win more games when it is enabled than when it is disablede, and the sample size of games is sufficiently large to eliminate random effects, then that idea ought to be enabled. Nothing else makes any sense.

I am not talking about changes that look right and appear to have no effect on overall performance, I am talking about changes that improve Elo over tens of thousands of test games. I don't care how changes affect individual test positions. That idea is so badly flawed it is not worth discussing, except for programs whose sole purpose in life is to address a particular concept such as "chest" for checkmates.

Note that I am not proposing that this null-everywhere is a good one. Only that it is a very tiny bit worse, 1-2 Elo at worst (after about 100,000 games and still counting). And that was quite surprising (that it was so close to break-even).

What I am saying is that I error on the side of a change that makes the program stronger if there is a minor slowdown. For instance there are some changes that cannot possibly make the program play weaker (such as the kvsp database) but might slow it down enough that it actually does make the program play slightly weaker in time control games. If the change hurts the program less than 1 ELO I will tend to favor keeping the change anyway because this change is of the nature that it helps thousands of positions get scored correctly and NEVER causes one to get stored incorrectly. I would rather be correct. If the change was a definite slowdown and thus hurt by a more measurable amount, I would of course reject it. I'm not stupid - I'm only talking about really close calls here.

Some zugzwang issues can cause your program to NEVER see something it needs to see and thus affects the scalability of the program (it will never play perfect chess even with infinite hardware.) Of course that is not a practical issue, but it illustrates the point. It may make a difference with 10X more hardware for example.
That's a testable hypothesis. If it makes a difference at 10x, it will make a smaller (but still measurable) difference at 5x, which can be easily tested.

I am quite sure that you are wrong about this. You are basically claiming that all program features scale exactly the same - but I know this is not true. My claim is that not all features scale the same and this is equivalent to saying that a feature that my help a program now could hurt it later or visa versa.

I'm going to call this my "bottleneck theory" and this is how I believe it works:

I think of zugzwang kind of like what happens when you profile a terribly optimized program. Some routine may take 5% of the execution time so you will focus on 4 other routines that together take 95% of the execution time. After you make those optimizations you may suddenly find that your previous 5% routine is the bottleneck whereas before it was not worth doing anything about.

So things like zugzwang could be a "bottleneck." Zugzwang detection may hurt your program NOW because it is a minor slowdown and fixing it is not enough to make up the difference. But fast forward 100 years, pretend there is no limit to Moore's law, and you may find that most programs are not too far away from perfect play. In such a case it's possible that the program that does not have zugzwang detection is noticeably inferior. It might be the only substantial difference between 1 program over another.

The things that are serious weakness in your program become little "bottlenecks" when the program is doing everything else well, even if they are currently uninportant. Larry and I discovered that our king safety seems to work that way. If I were playing in a 5 ply tournament with Komodo (or anything roughly equivalent in time control settings) I have proved in testing that king safety just slows the program down. At these ridiculous levels king safety is just not the "bottleneck" when it cannot handle even handle relatively trivial tactics. So if Komodo were running on an ancient 8086 CPU I would probably run my tests and conclude (like you would) that king safety is too much of a slowdown to pay off.

So my bottleneck theory is that trivial things wrong with your program that probably make no difference now will hurt you later.

Another example. Piece square tables. Many years ago when most micro's were doing well to get depths of 4 or 5 ply, you could get away with pure piece square tables where the values that went into them were heavily pre-preprocessed. This gives a pretty large boost in speed at the sacrifice of not doing slow more dynamic evaluation such as proper mobility. But the bottleneck in those programs was tactics - mobility did not mean as much when it was only a matter of time before all programs made some serious tactical blunder. The programs that dominated were heavily speed oriented and the programs that tried to be really smart in evaluation suffered. But now if your program is missing a little knowledge you will get punished very brutally.

I'll even go so far as to hypothesize that over the last 4 decades program authors have "unconsciously" scaled their programs up to current hardware and we will continue to do so. Of course program quality has also improved enormously for other reasons too, but I'm saying that what used to not matter does matter now.

Another example: Do you think a program that is running on hardware capable of doing only 3 ply searches would benefit from hash tables? Even if the memory in such a machine were liberal, the benefit from transpositions would be almost non-existent and the slowdown due to the hash table operations would make the program play weaker.

I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.

I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference.

Why everybody assume there must be a slowdown? If things are done properly, you can do it without any noticeable slowdown, so if there is any effect it is always a positive one.

Simple answer. In no case is an I/O as efficient as accessing memory. So any initial read hurts performance, even if later references are handled via a caching strategy. If you drive the overhead toward zero, you drive the usefullness toward zero as well. It is easy to limit overhead by limiting probing, but limiting probing has its own (bad) side-effect.

I see no way around this. I don't want to probe only in the first few moves, to reduce overhead, as I now miss all the clever ways to trade into a won (or drawn) EGTB position. If I probe too deeply, I see all the clever ways to trade, if there are any, but if there are no winning plans, I might stumble into a losing plan because I am searching so slowly my opponent finds something tactically that I overlooked while wasting all the time on disk accesses.

I've seen both types of issues. It is one of the reasons I don't rely on EGTBs for the most part.

Miguel

But I don't imagine that it will every make a big difference because if programs speed up that much they will indeed play these ending better anyway. Imagine it this way though, in 50 years programs become so strong that almost every game is a draw and when one program beats another its more likely to be due to something like one of these endings (or zugzwang) or something of this nature.

All of this is pretty much theoretical however and programs over the years will automatically scale as needed as the best program authors will always tend to implement things that actually work on the hardware of the day.

michiguel · Post by **michiguel** » Sun Mar 28, 2010 6:01 pm

bob wrote:
michiguel wrote:
Don wrote:
bob wrote:
Don wrote:
bob wrote:
jdart wrote:> If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change?

I can see this point of view, but the bad thing about null/move zugzwang positions is that if your program has such a problem it can be "blind" to the right move, even with a very deep search, and can go wrong in endings where a human player can see the right way. So it's ugly. If you intend the program to be used by humans for analysis or play, this will likely annoy them if it happens. I tend to think that makes it a priority to fix.

--Jon
What about all the _other_ positions where it provides _better_ analysis by going deeper? Everything we do is a little give and take.
Personally I usually hedge my bets under the theory that a chain is only as strong as it's weakest link. I hate to have some ridiculous weakness in the program even if it only shows up 1 in a million games or has less than 1 ELO impact. So if some change makes the program more correct but actually weakens it by some amount that is difficult to measure, I labor over the decision more.
None of that makes any sense. I'll bet that for _every_ feature in your chess program, I can come up with a position where it works poorly or not at all. Such is the nature of the game. Yet I doubt you would toss everything out. This idea of not doing something because it hurts a certain type of position is nothing but witchcraft/voodo/etc. If something causes a program to win more games when it is enabled than when it is disablede, and the sample size of games is sufficiently large to eliminate random effects, then that idea ought to be enabled. Nothing else makes any sense.

I am not talking about changes that look right and appear to have no effect on overall performance, I am talking about changes that improve Elo over tens of thousands of test games. I don't care how changes affect individual test positions. That idea is so badly flawed it is not worth discussing, except for programs whose sole purpose in life is to address a particular concept such as "chest" for checkmates.

Note that I am not proposing that this null-everywhere is a good one. Only that it is a very tiny bit worse, 1-2 Elo at worst (after about 100,000 games and still counting). And that was quite surprising (that it was so close to break-even).

What I am saying is that I error on the side of a change that makes the program stronger if there is a minor slowdown. For instance there are some changes that cannot possibly make the program play weaker (such as the kvsp database) but might slow it down enough that it actually does make the program play slightly weaker in time control games. If the change hurts the program less than 1 ELO I will tend to favor keeping the change anyway because this change is of the nature that it helps thousands of positions get scored correctly and NEVER causes one to get stored incorrectly. I would rather be correct. If the change was a definite slowdown and thus hurt by a more measurable amount, I would of course reject it. I'm not stupid - I'm only talking about really close calls here.

Some zugzwang issues can cause your program to NEVER see something it needs to see and thus affects the scalability of the program (it will never play perfect chess even with infinite hardware.) Of course that is not a practical issue, but it illustrates the point. It may make a difference with 10X more hardware for example.
That's a testable hypothesis. If it makes a difference at 10x, it will make a smaller (but still measurable) difference at 5x, which can be easily tested.

I am quite sure that you are wrong about this. You are basically claiming that all program features scale exactly the same - but I know this is not true. My claim is that not all features scale the same and this is equivalent to saying that a feature that my help a program now could hurt it later or visa versa.

I'm going to call this my "bottleneck theory" and this is how I believe it works:

I think of zugzwang kind of like what happens when you profile a terribly optimized program. Some routine may take 5% of the execution time so you will focus on 4 other routines that together take 95% of the execution time. After you make those optimizations you may suddenly find that your previous 5% routine is the bottleneck whereas before it was not worth doing anything about.

So things like zugzwang could be a "bottleneck." Zugzwang detection may hurt your program NOW because it is a minor slowdown and fixing it is not enough to make up the difference. But fast forward 100 years, pretend there is no limit to Moore's law, and you may find that most programs are not too far away from perfect play. In such a case it's possible that the program that does not have zugzwang detection is noticeably inferior. It might be the only substantial difference between 1 program over another.

The things that are serious weakness in your program become little "bottlenecks" when the program is doing everything else well, even if they are currently uninportant. Larry and I discovered that our king safety seems to work that way. If I were playing in a 5 ply tournament with Komodo (or anything roughly equivalent in time control settings) I have proved in testing that king safety just slows the program down. At these ridiculous levels king safety is just not the "bottleneck" when it cannot handle even handle relatively trivial tactics. So if Komodo were running on an ancient 8086 CPU I would probably run my tests and conclude (like you would) that king safety is too much of a slowdown to pay off.

So my bottleneck theory is that trivial things wrong with your program that probably make no difference now will hurt you later.

Another example. Piece square tables. Many years ago when most micro's were doing well to get depths of 4 or 5 ply, you could get away with pure piece square tables where the values that went into them were heavily pre-preprocessed. This gives a pretty large boost in speed at the sacrifice of not doing slow more dynamic evaluation such as proper mobility. But the bottleneck in those programs was tactics - mobility did not mean as much when it was only a matter of time before all programs made some serious tactical blunder. The programs that dominated were heavily speed oriented and the programs that tried to be really smart in evaluation suffered. But now if your program is missing a little knowledge you will get punished very brutally.

I'll even go so far as to hypothesize that over the last 4 decades program authors have "unconsciously" scaled their programs up to current hardware and we will continue to do so. Of course program quality has also improved enormously for other reasons too, but I'm saying that what used to not matter does matter now.

Another example: Do you think a program that is running on hardware capable of doing only 3 ply searches would benefit from hash tables? Even if the memory in such a machine were liberal, the benefit from transpositions would be almost non-existent and the slowdown due to the hash table operations would make the program play weaker.

I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.

I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference.

Why everybody assume there must be a slowdown? If things are done properly, you can do it without any noticeable slowdown, so if there is any effect it is always a positive one.
Simple answer. In no case is an I/O as efficient as accessing memory. So any initial read hurts performance, even if later references are handled via a caching strategy. If you drive the overhead toward zero, you drive the usefullness toward zero as well. It is easy to limit overhead by limiting probing, but limiting probing has its own (bad) side-effect.

I see no way around this. I don't want to probe only in the first few moves, to reduce overhead, as I now miss all the clever ways to trade into a won (or drawn) EGTB position.

And if you do not probe, you miss them all. Better to get most of them, than none of them. And if you do it without loss of speed, it can only be positive. Not to mention, you will end up searching deeper, because the nicest effect of the TBs is that you prune entire branches of the tree. It is a very safe pruning technique.

Miguel

If I probe too deeply, I see all the clever ways to trade, if there are any, but if there are no winning plans, I might stumble into a losing plan because I am searching so slowly my opponent finds something tactically that I overlooked while wasting all the time on disk accesses.

I've seen both types of issues. It is one of the reasons I don't rely on EGTBs for the most part.

Miguel

But I don't imagine that it will every make a big difference because if programs speed up that much they will indeed play these ending better anyway. Imagine it this way though, in 50 years programs become so strong that almost every game is a draw and when one program beats another its more likely to be due to something like one of these endings (or zugzwang) or something of this nature.

All of this is pretty much theoretical however and programs over the years will automatically scale as needed as the best program authors will always tend to implement things that actually work on the hardware of the day.

bob · Post by **bob** » Sun Mar 28, 2010 6:16 pm

michiguel wrote:
bob wrote:
michiguel wrote:
Don wrote:
bob wrote:
Don wrote:
bob wrote:
jdart wrote:> If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change?

I can see this point of view, but the bad thing about null/move zugzwang positions is that if your program has such a problem it can be "blind" to the right move, even with a very deep search, and can go wrong in endings where a human player can see the right way. So it's ugly. If you intend the program to be used by humans for analysis or play, this will likely annoy them if it happens. I tend to think that makes it a priority to fix.

--Jon
What about all the _other_ positions where it provides _better_ analysis by going deeper? Everything we do is a little give and take.
Personally I usually hedge my bets under the theory that a chain is only as strong as it's weakest link. I hate to have some ridiculous weakness in the program even if it only shows up 1 in a million games or has less than 1 ELO impact. So if some change makes the program more correct but actually weakens it by some amount that is difficult to measure, I labor over the decision more.
None of that makes any sense. I'll bet that for _every_ feature in your chess program, I can come up with a position where it works poorly or not at all. Such is the nature of the game. Yet I doubt you would toss everything out. This idea of not doing something because it hurts a certain type of position is nothing but witchcraft/voodo/etc. If something causes a program to win more games when it is enabled than when it is disablede, and the sample size of games is sufficiently large to eliminate random effects, then that idea ought to be enabled. Nothing else makes any sense.

I am not talking about changes that look right and appear to have no effect on overall performance, I am talking about changes that improve Elo over tens of thousands of test games. I don't care how changes affect individual test positions. That idea is so badly flawed it is not worth discussing, except for programs whose sole purpose in life is to address a particular concept such as "chest" for checkmates.

Note that I am not proposing that this null-everywhere is a good one. Only that it is a very tiny bit worse, 1-2 Elo at worst (after about 100,000 games and still counting). And that was quite surprising (that it was so close to break-even).

What I am saying is that I error on the side of a change that makes the program stronger if there is a minor slowdown. For instance there are some changes that cannot possibly make the program play weaker (such as the kvsp database) but might slow it down enough that it actually does make the program play slightly weaker in time control games. If the change hurts the program less than 1 ELO I will tend to favor keeping the change anyway because this change is of the nature that it helps thousands of positions get scored correctly and NEVER causes one to get stored incorrectly. I would rather be correct. If the change was a definite slowdown and thus hurt by a more measurable amount, I would of course reject it. I'm not stupid - I'm only talking about really close calls here.

Some zugzwang issues can cause your program to NEVER see something it needs to see and thus affects the scalability of the program (it will never play perfect chess even with infinite hardware.) Of course that is not a practical issue, but it illustrates the point. It may make a difference with 10X more hardware for example.
That's a testable hypothesis. If it makes a difference at 10x, it will make a smaller (but still measurable) difference at 5x, which can be easily tested.

I am quite sure that you are wrong about this. You are basically claiming that all program features scale exactly the same - but I know this is not true. My claim is that not all features scale the same and this is equivalent to saying that a feature that my help a program now could hurt it later or visa versa.

I'm going to call this my "bottleneck theory" and this is how I believe it works:

I think of zugzwang kind of like what happens when you profile a terribly optimized program. Some routine may take 5% of the execution time so you will focus on 4 other routines that together take 95% of the execution time. After you make those optimizations you may suddenly find that your previous 5% routine is the bottleneck whereas before it was not worth doing anything about.

So things like zugzwang could be a "bottleneck." Zugzwang detection may hurt your program NOW because it is a minor slowdown and fixing it is not enough to make up the difference. But fast forward 100 years, pretend there is no limit to Moore's law, and you may find that most programs are not too far away from perfect play. In such a case it's possible that the program that does not have zugzwang detection is noticeably inferior. It might be the only substantial difference between 1 program over another.

The things that are serious weakness in your program become little "bottlenecks" when the program is doing everything else well, even if they are currently uninportant. Larry and I discovered that our king safety seems to work that way. If I were playing in a 5 ply tournament with Komodo (or anything roughly equivalent in time control settings) I have proved in testing that king safety just slows the program down. At these ridiculous levels king safety is just not the "bottleneck" when it cannot handle even handle relatively trivial tactics. So if Komodo were running on an ancient 8086 CPU I would probably run my tests and conclude (like you would) that king safety is too much of a slowdown to pay off.

So my bottleneck theory is that trivial things wrong with your program that probably make no difference now will hurt you later.

Another example. Piece square tables. Many years ago when most micro's were doing well to get depths of 4 or 5 ply, you could get away with pure piece square tables where the values that went into them were heavily pre-preprocessed. This gives a pretty large boost in speed at the sacrifice of not doing slow more dynamic evaluation such as proper mobility. But the bottleneck in those programs was tactics - mobility did not mean as much when it was only a matter of time before all programs made some serious tactical blunder. The programs that dominated were heavily speed oriented and the programs that tried to be really smart in evaluation suffered. But now if your program is missing a little knowledge you will get punished very brutally.

I'll even go so far as to hypothesize that over the last 4 decades program authors have "unconsciously" scaled their programs up to current hardware and we will continue to do so. Of course program quality has also improved enormously for other reasons too, but I'm saying that what used to not matter does matter now.

Another example: Do you think a program that is running on hardware capable of doing only 3 ply searches would benefit from hash tables? Even if the memory in such a machine were liberal, the benefit from transpositions would be almost non-existent and the slowdown due to the hash table operations would make the program play weaker.

I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.

I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference.

Why everybody assume there must be a slowdown? If things are done properly, you can do it without any noticeable slowdown, so if there is any effect it is always a positive one.
Simple answer. In no case is an I/O as efficient as accessing memory. So any initial read hurts performance, even if later references are handled via a caching strategy. If you drive the overhead toward zero, you drive the usefullness toward zero as well. It is easy to limit overhead by limiting probing, but limiting probing has its own (bad) side-effect.

I see no way around this. I don't want to probe only in the first few moves, to reduce overhead, as I now miss all the clever ways to trade into a won (or drawn) EGTB position.

And if you do not probe, you miss them all. Better to get most of them, than none of them. And if you do it without loss of speed, it can only be positive. Not to mention, you will end up searching deeper, because the nicest effect of the TBs is that you prune entire branches of the tree. It is a very safe pruning technique.

Theoretically it is not so clear. If you get a bunch of hits that don't help you find a win or draw, the reduced depth caused by the overhead might help you find a loss instead. Seen it happen plenty of times. If you can do it with no loss of speed, then it certainly works although there is still an overhead issue to get to the cached data. For 3/4/5 piece tables, I generally do a cat TB/* > /dev/null which loads 'em all into RAM. But 6 pieces eliminates that completely.

Miguel

If I probe too deeply, I see all the clever ways to trade, if there are any, but if there are no winning plans, I might stumble into a losing plan because I am searching so slowly my opponent finds something tactically that I overlooked while wasting all the time on disk accesses.

I've seen both types of issues. It is one of the reasons I don't rely on EGTBs for the most part.

Miguel

But I don't imagine that it will every make a big difference because if programs speed up that much they will indeed play these ending better anyway. Imagine it this way though, in 50 years programs become so strong that almost every game is a draw and when one program beats another its more likely to be due to something like one of these endings (or zugzwang) or something of this nature.

All of this is pretty much theoretical however and programs over the years will automatically scale as needed as the best program authors will always tend to implement things that actually work on the hardware of the day.

Interesting null-move test

Re: Interesting null-move test

Re: Interesting null-move test

Re: Interesting null-move test

Re: Interesting null-move test

Re: Interesting null-move test