Interesting null-move test

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Interesting null-move test

Post by bob »

jwes wrote:
bob wrote:
jwes wrote:
bob wrote:
hgm wrote:Let me get ths straight:

You use plain null move, no verification? So if you end up in KRK, it is now a draw. And if you end up in things like KRPKR it becomes almost impossible to win, as the opponent would simply sac his Rook for the remaining Pawn, and you would not know how to checkmate him with a Rook up.
why would that be? It finds KR vs K checkmates trivially. I just tried it in a simple position with kings and rook in center of the board. The evaluation knows to drive the losing king to the edge of the board, where it finds mate quick enough...
Since the standard method of checkmating with KR v k involves repeated zugzwangs, it is reasonable (but apparently wrong) to assume that a search with null moves would not find these mates.
I believe it is an issue of the eval taking you in the right direction and then any move that checks turns off null in the next ply. So eventually you get to where you see the mate. However, KRK is a very rare ending overall...


bob wrote:

Most won Pawn endings would also be bungled; KPK is no longer a win if the passer is not out of reach.
You are focusing on positions, I am focusing on entire games. All I can say at the moment is that null-everywhere is no worse than restricting it to only being used when the side on move has a piece or move. I can run the match to 1/4 million games to get a more accurate Elo measurement, which I will start right now. But it certainly is within a couple of Elo either way.

Seems to me that this should cost you quite a bit more than 2 Elo. Rook endings are very common (> 10% ?), and not being able to win any of those should at least cost you a few percent in score, and each % corrsponds to 7 Elo. So a 20-50 Elo drop would have to be expected.

The test must be flawed somehow...
The rook issue is not an issue apparently. Anyone can take current Crafty source, go to search.c, remove that single restriction on when null-move search is done (here is the code before/after):

if (do_null && alpha == beta - 1 && depth > 1 && !tree->inchk[ply] &&
TotalPieces(wtm, occupied)) {


if (do_null && alpha == beta - 1 && depth > 1 && !tree->inchk[ply]) {

Pretty obvious what was removed. Nothing else was changed. Same positions, same hardware, etc. I ran the above test twice and got within 2 Elo the second time around (second time was 1 elo better than the first run, not significant)... Can't say much more other than this appears to be another episode of "myth-busters"...
It certainly could be that gains in the middlegame balance losses in the endgame. It would be interesting to run a test starting with endgame positions to see if the results are different.
They probably would be different. But then again, so would tests that start with a few pieces left. Or tests that are based on positions with a forced mate. Failing in one type of position doesn't mean failing in all. That's why testing is so important, it removes bias and prejudice and lets you see the truth without any subjective lights shining.
The problem with your testing method is that if a change makes a significant difference in a small minority of positions while performing identically in most positions, your testing will not recognize it unless you play a very large number of games.
Which is exactly what I am doing. Test has just passed 100K games, with the error bar pushed down to +/- 2.
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Interesting null-move test

Post by michiguel »

bob wrote:
Daniel Shawul wrote:
As far as "how many times?" I have no idea. I'm only interested in the results, and so far it appears to be immaterial as to whether I restrict null to positions with at least one piece or more, or do them no matter what.
Well R05's result was on the decline and probably ends up weakest no matter small the margin is. Also since I know pawn endings (KN* and KB* also) are indeed screwed up by null move, so I would gladly still use that condition. Even if it ended up 2 or 3 elo better i would avoid Ro5, because it really burns me when I lose games on CCT due to bad null moving in the endgame :)
I myself have tried a few of similar subtle changes in the hope that I get some hidden elo. However the reality so far is that I have to make big changes to see any improvement at that magnitude of games. Like playing with the R value which indeed got me measurable elo.
Wrong way to think about it. If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change? Do you want to win just a few specific games, or win the largest number possible???
There is a chance that you are testing against (few) engines that are not capable to take advantage of the hole you are creating in yours. I would be very careful to toss out a theoretically better approach because of an apparent tiny increase in ELO. I agree that is interesting to think about, but I would think twice before introducing the change.

Miguel

that's one reason testing is critical, and then using the results is even more critical. :)

Crafty-23.2R04-3 2632 3 3 30000 61% 2549 22%
Crafty-23.2R04-5 2632 3 3 30000 61% 2549 23%
Crafty-23.2R01-1 2631 3 3 30000 61% 2549 23%
Crafty-23.2R04-0 2631 3 3 30000 61% 2549 23%
Crafty-23.2R05-1 2630 3 3 30000 60% 2549 23%

So it ended up two Elo down, with an error bar of +/-3, which says that the old idea of not using null in pawn-only endgames is not nearly as important as we have always believed when using test positions to evaluate such changes...

In real games, and a _lot_ of 'em, the effect is almost too small to measure. I need to run this test with about 1/4 million games to get down to the +/- 1 Elo error range
User avatar
hgm
Posts: 28386
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Interesting null-move test

Post by hgm »

Indeed, this is still an unaddressed aspect of this kind of testing. Taking the number of games to infinity might reduce the statistcal noise of the result to zero. But if you don't, at the same time, take the number of starting positions and the number of opponents to infinity as well, you will freeze in the sampling noise due to opponent selection and position selection. At some point the result will be dominated by that sampling noise, and it is not really meaningfull to suppress the statistical noise to a value much below that by playing more games. That would be tantamount to making a very precise measurement of an ill-defined and arbitrary quantity.

I would be surprised if the result changed by exactly the same amount against each of the opponents.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Interesting null-move test

Post by Don »

bob wrote:
Don wrote:
bob wrote:
jdart wrote:> If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change?

I can see this point of view, but the bad thing about null/move zugzwang positions is that if your program has such a problem it can be "blind" to the right move, even with a very deep search, and can go wrong in endings where a human player can see the right way. So it's ugly. If you intend the program to be used by humans for analysis or play, this will likely annoy them if it happens. I tend to think that makes it a priority to fix.

--Jon
What about all the _other_ positions where it provides _better_ analysis by going deeper? Everything we do is a little give and take.
Personally I usually hedge my bets under the theory that a chain is only as strong as it's weakest link. I hate to have some ridiculous weakness in the program even if it only shows up 1 in a million games or has less than 1 ELO impact. So if some change makes the program more correct but actually weakens it by some amount that is difficult to measure, I labor over the decision more.
None of that makes any sense. I'll bet that for _every_ feature in your chess program, I can come up with a position where it works poorly or not at all. Such is the nature of the game. Yet I doubt you would toss everything out. This idea of not doing something because it hurts a certain type of position is nothing but witchcraft/voodo/etc. If something causes a program to win more games when it is enabled than when it is disablede, and the sample size of games is sufficiently large to eliminate random effects, then that idea ought to be enabled. Nothing else makes any sense.

I am not talking about changes that look right and appear to have no effect on overall performance, I am talking about changes that improve Elo over tens of thousands of test games. I don't care how changes affect individual test positions. That idea is so badly flawed it is not worth discussing, except for programs whose sole purpose in life is to address a particular concept such as "chest" for checkmates.

Note that I am not proposing that this null-everywhere is a good one. Only that it is a very tiny bit worse, 1-2 Elo at worst (after about 100,000 games and still counting). And that was quite surprising (that it was so close to break-even).
What I am saying is that I error on the side of a change that makes the program stronger if there is a minor slowdown. For instance there are some changes that cannot possibly make the program play weaker (such as the kvsp database) but might slow it down enough that it actually does make the program play slightly weaker in time control games. If the change hurts the program less than 1 ELO I will tend to favor keeping the change anyway because this change is of the nature that it helps thousands of positions get scored correctly and NEVER causes one to get stored incorrectly. I would rather be correct. If the change was a definite slowdown and thus hurt by a more measurable amount, I would of course reject it. I'm not stupid - I'm only talking about really close calls here.

Some zugzwang issues can cause your program to NEVER see something it needs to see and thus affects the scalability of the program (it will never play perfect chess even with infinite hardware.) Of course that is not a practical issue, but it illustrates the point. It may make a difference with 10X more hardware for example.
That's a testable hypothesis. If it makes a difference at 10x, it will make a smaller (but still measurable) difference at 5x, which can be easily tested.
I am quite sure that you are wrong about this. You are basically claiming that all program features scale exactly the same - but I know this is not true. My claim is that not all features scale the same and this is equivalent to saying that a feature that my help a program now could hurt it later or visa versa.

I'm going to call this my "bottleneck theory" and this is how I believe it works:

I think of zugzwang kind of like what happens when you profile a terribly optimized program. Some routine may take 5% of the execution time so you will focus on 4 other routines that together take 95% of the execution time. After you make those optimizations you may suddenly find that your previous 5% routine is the bottleneck whereas before it was not worth doing anything about.

So things like zugzwang could be a "bottleneck." Zugzwang detection may hurt your program NOW because it is a minor slowdown and fixing it is not enough to make up the difference. But fast forward 100 years, pretend there is no limit to Moore's law, and you may find that most programs are not too far away from perfect play. In such a case it's possible that the program that does not have zugzwang detection is noticeably inferior. It might be the only substantial difference between 1 program over another.

The things that are serious weakness in your program become little "bottlenecks" when the program is doing everything else well, even if they are currently uninportant. Larry and I discovered that our king safety seems to work that way. If I were playing in a 5 ply tournament with Komodo (or anything roughly equivalent in time control settings) I have proved in testing that king safety just slows the program down. At these ridiculous levels king safety is just not the "bottleneck" when it cannot handle even handle relatively trivial tactics. So if Komodo were running on an ancient 8086 CPU I would probably run my tests and conclude (like you would) that king safety is too much of a slowdown to pay off.

So my bottleneck theory is that trivial things wrong with your program that probably make no difference now will hurt you later.

Another example. Piece square tables. Many years ago when most micro's were doing well to get depths of 4 or 5 ply, you could get away with pure piece square tables where the values that went into them were heavily pre-preprocessed. This gives a pretty large boost in speed at the sacrifice of not doing slow more dynamic evaluation such as proper mobility. But the bottleneck in those programs was tactics - mobility did not mean as much when it was only a matter of time before all programs made some serious tactical blunder. The programs that dominated were heavily speed oriented and the programs that tried to be really smart in evaluation suffered. But now if your program is missing a little knowledge you will get punished very brutally.

I'll even go so far as to hypothesize that over the last 4 decades program authors have "unconsciously" scaled their programs up to current hardware and we will continue to do so. Of course program quality has also improved enormously for other reasons too, but I'm saying that what used to not matter does matter now.

Another example: Do you think a program that is running on hardware capable of doing only 3 ply searches would benefit from hash tables? Even if the memory in such a machine were liberal, the benefit from transpositions would be almost non-existent and the slowdown due to the hash table operations would make the program play weaker.


I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.
I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference. But I don't imagine that it will every make a big difference because if programs speed up that much they will indeed play these ending better anyway. Imagine it this way though, in 50 years programs become so strong that almost every game is a draw and when one program beats another its more likely to be due to something like one of these endings (or zugzwang) or something of this nature.


All of this is pretty much theoretical however and programs over the years will automatically scale as needed as the best program authors will always tend to implement things that actually work on the hardware of the day.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Interesting null-move test

Post by bob »

Don wrote:
bob wrote:
Don wrote:
bob wrote:
jdart wrote:> If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change?

I can see this point of view, but the bad thing about null/move zugzwang positions is that if your program has such a problem it can be "blind" to the right move, even with a very deep search, and can go wrong in endings where a human player can see the right way. So it's ugly. If you intend the program to be used by humans for analysis or play, this will likely annoy them if it happens. I tend to think that makes it a priority to fix.

--Jon
What about all the _other_ positions where it provides _better_ analysis by going deeper? Everything we do is a little give and take.
Personally I usually hedge my bets under the theory that a chain is only as strong as it's weakest link. I hate to have some ridiculous weakness in the program even if it only shows up 1 in a million games or has less than 1 ELO impact. So if some change makes the program more correct but actually weakens it by some amount that is difficult to measure, I labor over the decision more.
None of that makes any sense. I'll bet that for _every_ feature in your chess program, I can come up with a position where it works poorly or not at all. Such is the nature of the game. Yet I doubt you would toss everything out. This idea of not doing something because it hurts a certain type of position is nothing but witchcraft/voodo/etc. If something causes a program to win more games when it is enabled than when it is disablede, and the sample size of games is sufficiently large to eliminate random effects, then that idea ought to be enabled. Nothing else makes any sense.

I am not talking about changes that look right and appear to have no effect on overall performance, I am talking about changes that improve Elo over tens of thousands of test games. I don't care how changes affect individual test positions. That idea is so badly flawed it is not worth discussing, except for programs whose sole purpose in life is to address a particular concept such as "chest" for checkmates.

Note that I am not proposing that this null-everywhere is a good one. Only that it is a very tiny bit worse, 1-2 Elo at worst (after about 100,000 games and still counting). And that was quite surprising (that it was so close to break-even).
What I am saying is that I error on the side of a change that makes the program stronger if there is a minor slowdown. For instance there are some changes that cannot possibly make the program play weaker (such as the kvsp database) but might slow it down enough that it actually does make the program play slightly weaker in time control games. If the change hurts the program less than 1 ELO I will tend to favor keeping the change anyway because this change is of the nature that it helps thousands of positions get scored correctly and NEVER causes one to get stored incorrectly. I would rather be correct. If the change was a definite slowdown and thus hurt by a more measurable amount, I would of course reject it. I'm not stupid - I'm only talking about really close calls here.

Some zugzwang issues can cause your program to NEVER see something it needs to see and thus affects the scalability of the program (it will never play perfect chess even with infinite hardware.) Of course that is not a practical issue, but it illustrates the point. It may make a difference with 10X more hardware for example.
That's a testable hypothesis. If it makes a difference at 10x, it will make a smaller (but still measurable) difference at 5x, which can be easily tested.
I am quite sure that you are wrong about this. You are basically claiming that all program features scale exactly the same - but I know this is not true. My claim is that not all features scale the same and this is equivalent to saying that a feature that my help a program now could hurt it later or visa versa.
The only way I am wrong is that if your change behaves like a mathematical unit step function. It is zero at one point, and instantly goes to 1 at the critical point. I don't believe that for a minute, and could not write code that behaved like that if someone gave me money, if I were doing serious code.

Nowhere did I say that all program features "scale the same". But I certainly did say that they "scale" and don't just suddenly "turn on". I have never seen a case where this is false, yet. And can't imagine the circumstances where it would be false unless you contrive something so that you turn a feature off below X nps, and turn it on if the speed reaches X+1, which is not exactly realistic for anything I have ever done.

I'm going to call this my "bottleneck theory" and this is how I believe it works:

I think of zugzwang kind of like what happens when you profile a terribly optimized program. Some routine may take 5% of the execution time so you will focus on 4 other routines that together take 95% of the execution time. After you make those optimizations you may suddenly find that your previous 5% routine is the bottleneck whereas before it was not worth doing anything about.

So things like zugzwang could be a "bottleneck." Zugzwang detection may hurt your program NOW because it is a minor slowdown and fixing it is not enough to make up the difference. But fast forward 100 years, pretend there is no limit to Moore's law, and you may find that most programs are not too far away from perfect play. In such a case it's possible that the program that does not have zugzwang detection is noticeably inferior. It might be the only substantial difference between 1 program over another.
However, back to the case at hand. How is this not going to scale? If it is important in 100 years, but immeasurable today, do you believe that at some magic point in the future it just becomes critical instantly, or does it scale up as hardware speed increases. I don't believe in the unit-step-function idea if it is applied to a chess engine.


The things that are serious weakness in your program become little "bottlenecks" when the program is doing everything else well, even if they are currently uninportant. Larry and I discovered that our king safety seems to work that way. If I were playing in a 5 ply tournament with Komodo (or anything roughly equivalent in time control settings) I have proved in testing that king safety just slows the program down. At these ridiculous levels king safety is just not the "bottleneck" when it cannot handle even handle relatively trivial tactics. So if Komodo were running on an ancient 8086 CPU I would probably run my tests and conclude (like you would) that king safety is too much of a slowdown to pay off.
Strange results, as we have tested with millions of games and find that such eval terms help more at fast time controls than at longer ones. At longer times per move, the search gets a better chance to "see the truth" while at shallow depths it has to rely on "intuition" aka static evaluation.

So my bottleneck theory is that trivial things wrong with your program that probably make no difference now will hurt you later.
I don't disagree with that, but I disagree with the idea that this degradation is sudden rather than scaled up as hardware speeds advance

Another example. Piece square tables. Many years ago when most micro's were doing well to get depths of 4 or 5 ply, you could get away with pure piece square tables where the values that went into them were heavily pre-preprocessed. This gives a pretty large boost in speed at the sacrifice of not doing slow more dynamic evaluation such as proper mobility. But the bottleneck in those programs was tactics - mobility did not mean as much when it was only a matter of time before all programs made some serious tactical blunder. The programs that dominated were heavily speed oriented and the programs that tried to be really smart in evaluation suffered. But now if your program is missing a little knowledge you will get punished very brutally.

I'll even go so far as to hypothesize that over the last 4 decades program authors have "unconsciously" scaled their programs up to current hardware and we will continue to do so. Of course program quality has also improved enormously for other reasons too, but I'm saying that what used to not matter does matter now.
Unfortunately, you are making _my_ point. RE: previous paragraph, keyword: "scaled".


Another example: Do you think a program that is running on hardware capable of doing only 3 ply searches would benefit from hash tables? Even if the memory in such a machine were liberal, the benefit from transpositions would be almost non-existent and the slowdown due to the hash table operations would make the program play weaker.
Actually, yes I do believe it would help. Do not forget "extensions". And the transposition/refutation table is not the only form of "hash tables" either.



I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.
I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference.
I agree, but I believe it will get _worse_ and not better. Disk speeds have not come even close to keeping up with CPU speed improvements. Even good SSDs are hopelessly slow when compared to the speed of the processor, and it is falling farther behind every month.
But I don't imagine that it will every make a big difference because if programs speed up that much they will indeed play these ending better anyway. Imagine it this way though, in 50 years programs become so strong that almost every game is a draw and when one program beats another its more likely to be due to something like one of these endings (or zugzwang) or something of this nature.




You are _still_ missing the point. What if the program earns some of those draws because of the feature we are evaluating, the feature that causes it to lose a rare game here or there also causes it to draw many games it might have lost, because of the speed improvement?

I believe that the idea of discussing a "failing position" is flawed, if the goal is to play complete games. If the goal is to solve positions, testing on complete games would also be flawed. I don't care how my program does on positions, so long as every year sees us catching or passing programs that were better than us in the past.


All of this is pretty much theoretical however and programs over the years will automatically scale as needed as the best program authors will always tend to implement things that actually work on the hardware of the day.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Interesting null-move test

Post by bob »

michiguel wrote:
bob wrote:
Daniel Shawul wrote:
As far as "how many times?" I have no idea. I'm only interested in the results, and so far it appears to be immaterial as to whether I restrict null to positions with at least one piece or more, or do them no matter what.
Well R05's result was on the decline and probably ends up weakest no matter small the margin is. Also since I know pawn endings (KN* and KB* also) are indeed screwed up by null move, so I would gladly still use that condition. Even if it ended up 2 or 3 elo better i would avoid Ro5, because it really burns me when I lose games on CCT due to bad null moving in the endgame :)
I myself have tried a few of similar subtle changes in the hope that I get some hidden elo. However the reality so far is that I have to make big changes to see any improvement at that magnitude of games. Like playing with the R value which indeed got me measurable elo.
Wrong way to think about it. If you play 30,000 games and it plays _better_, why on earth would you toss an idea out just because there is an occasional game it might lose due to the change? Do you want to win just a few specific games, or win the largest number possible???
There is a chance that you are testing against (few) engines that are not capable to take advantage of the hole you are creating in yours. I would be very careful to toss out a theoretically better approach because of an apparent tiny increase in ELO. I agree that is interesting to think about, but I would think twice before introducing the change.

Miguel
If you believe the programs you test against are representative, I don't see how this will be an issue. If you do believe your opponents are not representative, then I can't see why one would even be doing the testing in the first place. I've not seen any evidence of "steps backward" in 2 years now. And this shows up in independent testing by others in large tournaments and such

that's one reason testing is critical, and then using the results is even more critical. :)

Crafty-23.2R04-3 2632 3 3 30000 61% 2549 22%
Crafty-23.2R04-5 2632 3 3 30000 61% 2549 23%
Crafty-23.2R01-1 2631 3 3 30000 61% 2549 23%
Crafty-23.2R04-0 2631 3 3 30000 61% 2549 23%
Crafty-23.2R05-1 2630 3 3 30000 60% 2549 23%

So it ended up two Elo down, with an error bar of +/-3, which says that the old idea of not using null in pawn-only endgames is not nearly as important as we have always believed when using test positions to evaluate such changes...

In real games, and a _lot_ of 'em, the effect is almost too small to measure. I need to run this test with about 1/4 million games to get down to the +/- 1 Elo error range
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Interesting null-move test

Post by bob »

hgm wrote:Indeed, this is still an unaddressed aspect of this kind of testing. Taking the number of games to infinity might reduce the statistcal noise of the result to zero. But if you don't, at the same time, take the number of starting positions and the number of opponents to infinity as well, you will freeze in the sampling noise due to opponent selection and position selection. At some point the result will be dominated by that sampling noise, and it is not really meaningfull to suppress the statistical noise to a value much below that by playing more games. That would be tantamount to making a very precise measurement of an ill-defined and arbitrary quantity.

I would be surprised if the result changed by exactly the same amount against each of the opponents.
I don't have an infinite number of positions, but I do have 20,000 roughly. That gives 40,000 games per opponent. I long since quit repeating positions over and over as that introduces more noise than it eliminates.
User avatar
hgm
Posts: 28386
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Interesting null-move test

Post by hgm »

Well, that is definitely a good number. I don't know how much the results vary per position, but even if it was a lot, averaging over 40,000 positions reduces that varability by a factor 200. But you don't have 40,000 opponents.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Interesting null-move test

Post by bob »

hgm wrote:Well, that is definitely a good number. I don't know how much the results vary per position, but even if it was a lot, averaging over 40,000 positions reduces that varability by a factor 200. But you don't have 40,000 opponents.
No, that I don't have. Which is why I have not typically been running 100k+ game matches. As you said, I don't want to measure the noise.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Interesting null-move test

Post by jwes »

bob wrote:
Don wrote:
bob wrote:


I also have this feeling which I cannot prove that attention to some details may have a large affect on the ultimate scalability. I have a king and pawn vs king database compiled into the program and I have been unable to prove it helps, but it just seems like it should. My theory is that it probably helps more with depth because as you gradually approach perfect play these little things will come into play more often. (A lot less games decided by decisive tactics early in the game.) On the other hand with deeper searches the king and pawn endings will get played better anyway - so I'm not sure how all of this interacts.
I don't believe it helps at all. 3-4-5 piece endgame tables are a very small improvement in some positions, and hurt in others. I am one day going to precisely quantify the effect of tables on/off using the cluster. So far, testing has suggested no benefit at all unless you have monstrously fast disk drives. The slow-down hurts about as much as the increased accuracy helps.
I agree with you on this for now. But this is still probably a scalability issue. If the slowdown due to disk based endgame tables maintains the same ratio as today, I can easily imagine that in the future it might make a difference.
I agree, but I believe it will get _worse_ and not better. Disk speeds have not come even close to keeping up with CPU speed improvements. Even good SSDs are hopelessly slow when compared to the speed of the processor, and it is falling farther behind every month.
As memory sizes grow, it becomes possible to cache more of the egtbs, e.g. for crafty on a 16 gm machine, you can use 8gb for tt but what will you use the other 7+ gb of memory for?