An objective test process for the rest of us?

bob · Post by **bob** » Mon Sep 24, 2007 10:06 pm

diep wrote:Hi Bob,

Could you show us a picture from a tunneling electronic microscope, just so we know that you have at least seen a picture of it after writing your story about it.

Vincent

I don't have any pictures from one. I could get a couple of friends up the street to send me some if it is really important. I used to have a couple from IBM where they were using theirs to somehow drag atoms around on a surface in a paper I reviewed.

BTW what story did I "write" about it? I merely mentioned it as an invention that allowed us to see things we could not see before it came along... I have currently played over 8,000,000 test games on our cluster. Have you ever been able to run that many high-quality (not game in one second) games to analyze programming changes? I haven't until now. And the non-determinism was surprising to all of us that were looking at these results (Myself, Tracy, Mike and Peter, not to mention some faculty here that were interested).

If you are interested, you can certainly find enough information on the web to satisfy your curiosity I would think. It's just another tool, in a steady succession, that lets us see more and more of the world around us...

Or are you just trying to do your usual "let;s divert this into something that is more suitable for arguments" type thing???

hgm · Post by **hgm** » Mon Sep 24, 2007 10:13 pm

Slow vs fast is not the issue. You can compensate for the factor 6 speedup by playing at 6 times slower time control, and you would reach the same depth.

Do I understand that you will _never_ be satisfied that your SMP implementation is correct, and that you will continue to test it, even after many millions of games, out of fear that you would have missed that bug that would crash you during the tournament?

bob · Post by **bob** » Tue Sep 25, 2007 12:24 am

hgm wrote:Slow vs fast is not the issue. You can compensate for the factor 6 speedup by playing at 6 times slower time control, and you would reach the same depth.

Do I understand that you will _never_ be satisfied that your SMP implementation is correct, and that you will continue to test it, even after many millions of games, out of fear that you would have missed that bug that would crash you during the tournament?

1. Sure, I can test 6x longer for Crafty. But not every position gives a 6x speedup. Some are more, some are less. But the main thing is that I am testing the SMP code reasonably frequently.

2. I am not sure that there exist a thing called "a debugged SMP search". I have done this so long, I have come to realize that the bugs can be so subtle, or so well-hidden, that they might not ever happen, or they might happen every time you run. But the point is _any_ change to the program has the potential fo cause problems in the SMP version. Most don't but some do. And, of course, an SMP search is never finished anyway, there is always room for improvement. New architectural features (NUMA for example), new ideas, non-perfect time scaling, all cause me to make changes to this code on a regular basis.

So my testing is to help me accept/reject changes as well as to help me find bugs that were introduced. It is easy to prove a bug exists. It is impossible to prove there are no bugs.

jwes · Post by **jwes** » Tue Sep 25, 2007 1:30 am

bob wrote: But time allocation is a major component of how a program behaves. How does it allocate its time per move (some spend more time right out of book, some end up spending more time right before time control because they have saved up a surplus, some don't extend on fail lows, some do, etc.) Each and every one of those ideas exerts an influence on the program, and each can cause problems in unexpected ways. I have already mentioned that I did the "trivial test" of using a specific number of nodes, but that does not produce results consistent with using time at all. Because now there are no extended searches, no "easy moves" that save some of that wasted time to use later, no time variation (we use more time on the first few moves out of book), etc.

So I can easily produce reproducible results with the specific number of nodes approach, but the results are significantly different from normal playing results, which are also significantly different from real game results which would include book, pondering and SMP on top of the normal time variation...

I think you missed the point. If your search is deterministic, you can test with any timing algorithm and then use the number of nodes searched to recreate the search while debugging.

bob wrote:But the other stuff I wouldn't consider. Not carrying hash entries from one search to the next. Not pruning based on hash information. Clearing killer and history counters even though they contain useful information. Etc... I don't see any advantage to doing any of that at all. Because when I get a move from a real search in a real game, I am not going to be able to reproduce it anyway some of the time.

Another is "In what positions will the program make poor moves ?". Here, it is obviously valuable to be able to exactly recreate the search tree.
yes. But you are really asking "in what positions will it make poor moves when significant parts of the search are made inoperational?"

To a large extent, these search changes should not change the results of the search, only the time the search takes.

bob · Post by **bob** » Tue Sep 25, 2007 5:36 am

jwes wrote:
bob wrote: But time allocation is a major component of how a program behaves. How does it allocate its time per move (some spend more time right out of book, some end up spending more time right before time control because they have saved up a surplus, some don't extend on fail lows, some do, etc.) Each and every one of those ideas exerts an influence on the program, and each can cause problems in unexpected ways. I have already mentioned that I did the "trivial test" of using a specific number of nodes, but that does not produce results consistent with using time at all. Because now there are no extended searches, no "easy moves" that save some of that wasted time to use later, no time variation (we use more time on the first few moves out of book), etc.

So I can easily produce reproducible results with the specific number of nodes approach, but the results are significantly different from normal playing results, which are also significantly different from real game results which would include book, pondering and SMP on top of the normal time variation...
I think you missed the point. If your search is deterministic, you can test with any timing algorithm and then use the number of nodes searched to recreate the search while debugging.

No you can't. Here's why. You test with pondering on. You set a target time of 180 seconds. Your opponent fails low and uses more time. And searches for a total of 390.441 seconds and makes a move. You move instantly. You have _zero_ chance to re-create that timing the next time around.

That is just one example.

bob wrote:But the other stuff I wouldn't consider. Not carrying hash entries from one search to the next. Not pruning based on hash information. Clearing killer and history counters even though they contain useful information. Etc... I don't see any advantage to doing any of that at all. Because when I get a move from a real search in a real game, I am not going to be able to reproduce it anyway some of the time.

Another is "In what positions will the program make poor moves ?". Here, it is obviously valuable to be able to exactly recreate the search tree.
yes. But you are really asking "in what positions will it make poor moves when significant parts of the search are made inoperational?"
To a large extent, these search changes should not change the results of the search, only the time the search takes.

Again, that is wrong. Any tiny timing change in the search has several influences. from what is stored in the transposition table, to what is stored in the killer moves and history counters.

The problem is that the "deterministic requirement" is one that is useful when it is not available. I carefully watch games that are being played in tournaments, move by move, second by second, as the game progresses, and that is where I see the things that are the subject of analysis later. And there, I have _everything_ turned on. Including SMP search, perhaps on hardware I can't even use to test later when I have time.

I don't do a lot of that on these cluster matches I play. Too much data. There I am only interested in a quantitative good/bad indication for whatever I am testing...

I find it tough to consider modifying various parts of my search so that I can deterministically play moves, and then debug all of that to make sure it doesn't break something unexpectedly, and then realize that I won't have this stuff turned off during real games, which is where I am most likely going to notice something that I want to look at later...

Uri Blass · Post by **Uri Blass** » Tue Sep 25, 2007 8:44 am

bob wrote:
jwes wrote:
bob wrote: But time allocation is a major component of how a program behaves. How does it allocate its time per move (some spend more time right out of book, some end up spending more time right before time control because they have saved up a surplus, some don't extend on fail lows, some do, etc.) Each and every one of those ideas exerts an influence on the program, and each can cause problems in unexpected ways. I have already mentioned that I did the "trivial test" of using a specific number of nodes, but that does not produce results consistent with using time at all. Because now there are no extended searches, no "easy moves" that save some of that wasted time to use later, no time variation (we use more time on the first few moves out of book), etc.

So I can easily produce reproducible results with the specific number of nodes approach, but the results are significantly different from normal playing results, which are also significantly different from real game results which would include book, pondering and SMP on top of the normal time variation...
I think you missed the point. If your search is deterministic, you can test with any timing algorithm and then use the number of nodes searched to recreate the search while debugging.
No you can't. Here's why. You test with pondering on. You set a target time of 180 seconds. Your opponent fails low and uses more time. And searches for a total of 390.441 seconds and makes a move. You move instantly. You have _zero_ chance to re-create that timing the next time around.

That is just one example.

If you print into logfile the number of nodes that you searched in every move then it is possible to reproduce the same situation.

I agree that it is more complex relative to the case that different searches are independent.

I prefer to have different searches that are independent because I do not want to have a situation when there is a mistake that I cannot reproduce because in games from external source I do not get number of nodes and even with number of nodes it is harder to reproduce the move and
I may need to reproduce all the game that may take significant time

movei is not close to be a finished project and when I have hundreds of elo that I probably can get better I do not like to care about small improvementes that may make it harder to make improvements later.

Uri

jwes · Post by **jwes** » Tue Sep 25, 2007 8:55 am

bob wrote:
jwes wrote: I think you missed the point. If your search is deterministic, you can test with any timing algorithm and then use the number of nodes searched to recreate the search while debugging.
No you can't. Here's why. You test with pondering on. You set a target time of 180 seconds. Your opponent fails low and uses more time. And searches for a total of 390.441 seconds and makes a move. You move instantly. You have _zero_ chance to re-create that timing the next time around.

Then you look in your log file, see that you searched 453823632 nodes including pondering and set your program to terminate searching after 453823632 nodes. Why would that not be the same search tree if your program is deterministic and does not use information from prior searches?

bob wrote:But the other stuff I wouldn't consider. Not carrying hash entries from one search to the next. Not pruning based on hash information. Clearing killer and history counters even though they contain useful information. Etc... I don't see any advantage to doing any of that at all. Because when I get a move from a real search in a real game, I am not going to be able to reproduce it anyway some of the time.

Another is "In what positions will the program make poor moves ?". Here, it is obviously valuable to be able to exactly recreate the search tree.
yes. But you are really asking "in what positions will it make poor moves when significant parts of the search are made inoperational?"
To a large extent, these search changes should not change the results of the search, only the time the search takes.

bob wrote:Again, that is wrong. Any tiny timing change in the search has several influences. from what is stored in the transposition table, to what is stored in the killer moves and history counters.

The problem is that the "deterministic requirement" is one that is useful when it is not available. I carefully watch games that are being played in tournaments, move by move, second by second, as the game progresses, and that is where I see the things that are the subject of analysis later. And there, I have _everything_ turned on. Including SMP search, perhaps on hardware I can't even use to test later when I have time.

I don't do a lot of that on these cluster matches I play. Too much data. There I am only interested in a quantitative good/bad indication for whatever I am testing...

I find it tough to consider modifying various parts of my search so that I can deterministically play moves, and then debug all of that to make sure it doesn't break something unexpectedly, and then realize that I won't have this stuff turned off during real games, which is where I am most likely going to notice something that I want to look at later...

This is just your personal preference, as you play tens of thousands of test games for every tournament game and errors should be equally likely. (Except for the demo effect, where occurrence of bugs is directly related to the importance of the occasion, e.g. Windows crashing when Bill Gates first demoed it.)
What I was trying to say is that if there are problems with evaluation or extensions/reductions, they should show up roughly as often in deterministic searches as normal ones.

hgm · Post by **hgm** » Tue Sep 25, 2007 10:59 am

Indeed, it seems nothing but a personal preference. So far I have not heard a single valid argument in favor of it. While those on the other side of the discussion can precisely indicate what the advantages are that would make it worth it.

I would not hesitate to shut down 99% of my engine, if that speeds up testing the 1% part that I change (and thus have to test) orders of magnitude more often than the rest by a factor 2.

That is the scientific approach: isolate the components, and understand / develop those first. Don't test a new alloy for making nuts and bolts by putting one such a bolts in a Formula I car, and measure how many laps it can race before the wheels come off. In stead you put the lone bolts in a test bench that measures the force needed to break it, and vibrates it to follow the development of metal fatigue.

I once saw an interview with an athlete, a championof the high jump. At som point the interviewer asked him: "so what height do you typically jump over during your training". The interviewer was then completely baffled when the athlete answered: "I never do any jumps during training". It turned out is training consisted entirely of exercises with weights, for strengthening the various muscles that he needed to take off. It is a typical laymen's misconception that you can only get better in something by doing exactly the thing that you are training for.

So the often-repeated argument, "I want my test to be as close to tournament conditions as possible" carries just about zero weight. If not a negative one.

hristo · Post by **hristo** » Tue Sep 25, 2007 2:19 pm

hgm wrote: I once saw an interview with an athlete, a championof the high jump. At som point the interviewer asked him: "so what height do you typically jump over during your training". The interviewer was then completely baffled when the athlete answered: "I never do any jumps during training". It turned out is training consisted entirely of exercises with weights, for strengthening the various muscles that he needed to take off. It is a typical laymen's misconception that you can only get better in something by doing exactly the thing that you are training for.

Right?!
Long distance swimmers (800, 1500 meters) train by wiggling their toes in the bathtub (or under the shower), while stretching their arms and waving them about for approximately fifteen minutes and singing Nessun Dorma -- breathing is good, toes seem agile and the hands have no bugs, hence preparation is done.

hgm wrote: So the often-repeated argument, "I want my test to be as close to tournament conditions as possible" carries just about zero weight. If not a negative one.

Make sure your toes can move, hands can swing and you can inhale/exhale at a good rhythm for a long periods of time, this is all you need to be a good swimmer.

Regards and Good morning,
Hristo

hgm · Post by **hgm** » Tue Sep 25, 2007 2:30 pm

I thought they just took pills!? Or was this the weight lifters?

An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?

Re: An objective test process for the rest of us?