xboard compliance and new feature

jshriver · Post by **jshriver** » Fri May 22, 2009 10:14 pm

While writing cliboard and reviewing peoples recommendation I realized that something is missing in the argument list.

-depth but doesnt allow you to specify what depth for multiple engines.
Do you think this is necessary, if so, does -fdepth -sdepth sound ok?

-Josh

hgm · Post by **hgm** » Fri May 22, 2009 10:37 pm

The traditional way for specifying this kind of imbalanced engine options in WinBoard is by using the -firstInitString /-secondInitString opton, in which on can hide any WB-protocol command tobe sent to the engine at startup (such as sd N).

If you think unequal-depth games are common enough to create a dedicated options for this, I would recommend using the fully written out substrings 'first' and 'second' in them, so that the options can be used by PSWBTM with the WBopt option, to be installed wth the engine as an option that follows it in the tourneys it plays. (e.g. -firstDepth and -secondDepth).

Personally I consider fixed-depth games quite useless, and no longer needed since the introducton of the -firstNPS and -secondNPS options, which allow playing according to the number of nodes.

jshriver · Post by **jshriver** » Fri May 22, 2009 10:39 pm

Interesting

will have to look into the -firstNPS and -secondNPS as well.

-Josh

bob · Post by **bob** » Sun May 24, 2009 6:06 pm

jshriver wrote:Interesting will have to look into the -firstNPS and -secondNPS as well.

-Josh

The downside of this is that you can't measure _any_ improvement that is derived from speed enhancements. You still search the same number of nodes per search. There are hardly any significant changes to an engine that do not affect NPS at all. For those changes, this approach will fail to accurately inform you of the Elo gain or loss...

hgm · Post by **hgm** » Sun May 24, 2009 8:21 pm

Well, don't use it for measuring speed improvements, then... People weren't using results from fixed-depth games for measuring speed improvements either. Use tools for which they are meant. Using them for purposes they were not designed for can only lead to disappointment. Do you also use your car to do the laundry?

Speed improvements are so incredibly easy to move by other means than playing games. Use a stopwatch!

bob · Post by **bob** » Sun May 24, 2009 8:32 pm

hgm wrote:Well, don't use it for measuring speed improvements, then... People weren't using results from fixed-depth games for measuring speed improvements either. Use tools for which they are meant. Using them for purposes they were not designed for can only lead to disappointment. Do you also use your car to do the laundry?

Speed improvements are so incredibly easy to move by other means than playing games. Use a stopwatch!

You do realize that when you add a bit of knowledge to the eval, you influence the program in two planes? And if you then test with a tool that only measures in one plane, you make mistakes.

how can you determine whether a slightly slower/smarter version is better than a slightly faster/dumber version if you don't play games as you normally play them? Answer: you can't. You get a distorted view that says every addition to the engine makes it play better, because you are preventing the speed penalty from influencing the final results.

There are _no_ changes that I make that don't affect both speed and smarts at the same time, and I need to know the overall effect, not just the effect of the smarts, which might be more than offset by the lack of speed were I measuring that as well.

Chess is a game of time. Not search space.

ilari · Post by **ilari** » Sun May 24, 2009 9:52 pm

I think it's often useful to ignore speed when testing a new feature. I hate doing premature optimization, so when I implement a new feature it's likely to be slow at first. Testing the feature under normal conditions would then make it look worse than it actually is, so I prefer to use just a depth or node count limit in initial testing.

If the initial tests show promise, I can optimize the code and test again under different conditions. If they don't, I can throw the feature away without ever wasting time doing tedious optimization.

bob · Post by **bob** » Sun May 24, 2009 10:03 pm

ilari wrote:I think it's often useful to ignore speed when testing a new feature. I hate doing premature optimization, so when I implement a new feature it's likely to be slow at first. Testing the feature under normal conditions would then make it look worse than it actually is, so I prefer to use just a depth or node count limit in initial testing.

If the initial tests show promise, I can optimize the code and test again under different conditions. If they don't, I can throw the feature away without ever wasting time doing tedious optimization.

We are talking about two kinds of speed. Every line of code added slows the program down slightly. You hope that the added "smarts" more than offsets the loss in speed.

If I ignore speed, the big majority of changes I make are better. But when the cost is factored in by using time to limit the games, then things are much easier to analyze.

If you initially write something slow, then using nodes is a reasonable way of evaluating that change. However, at some point speed has to be factored in. 90% of the changes I test are not 100-200 line additions where speed can be improved later. Most of what I write is in its "final form" before testing starts...

And there, time is the right way to evaluate things.

hgm · Post by **hgm** » Sun May 24, 2009 10:27 pm

bob wrote:You do realize that when you add a bit of knowledge to the eval, you influence the program in two planes? And if you then test with a tool that only measures in one plane, you make mistakes.

how can you determine whether a slightly slower/smarter version is better than a slightly faster/dumber version if you don't play games as you normally play them? Answer: you can't. You get a distorted view that says every addition to the engine makes it play better, because you are preventing the speed penalty from influencing the final results.

There are _no_ changes that I make that don't affect both speed and smarts at the same time, and I need to know the overall effect, not just the effect of the smarts, which might be more than offset by the lack of speed were I measuring that as well.

Chess is a game of time. Not search space.

Almost every engine author realizes this very well. But the solution is totally trivial, as the nps of the programs are usually printed as well. And if it isn't, or is too variable, you can simply look at the average time per game and normalize the result on that.

This is what I always do. Make a change, measure the change in Elo, measure the extra time it took, (or extra characters), and decide if the Elo gain was worth it or not.

Of course your assumption that this would only be used to evaluate changes is already completely wrong. A major application would be to limit the opponents against which you test (but which you never change) to a certain average number of nodes per move, and play your own engine, which you did change, on CPU time (by setting /firstNPS=0)

bob · Post by **bob** » Sun May 24, 2009 11:01 pm

hgm wrote:
bob wrote:You do realize that when you add a bit of knowledge to the eval, you influence the program in two planes? And if you then test with a tool that only measures in one plane, you make mistakes.

how can you determine whether a slightly slower/smarter version is better than a slightly faster/dumber version if you don't play games as you normally play them? Answer: you can't. You get a distorted view that says every addition to the engine makes it play better, because you are preventing the speed penalty from influencing the final results.

There are _no_ changes that I make that don't affect both speed and smarts at the same time, and I need to know the overall effect, not just the effect of the smarts, which might be more than offset by the lack of speed were I measuring that as well.

Chess is a game of time. Not search space.
Almost every engine author realizes this very well. But the solution is totally trivial, as the nps of the programs are usually printed as well. And if it isn't, or is too variable, you can simply look at the average time per game and normalize the result on that.

This is what I always do. Make a change, measure the change in Elo, measure the extra time it took, (or extra characters), and decide if the Elo gain was worth it or not.

Of course your assumption that this would only be used to evaluate changes is already completely wrong. A major application would be to limit the opponents against which you test (but which you never change) to a certain average number of nodes per move, and play your own engine, which you did change, on CPU time (by setting /firstNPS=0)

(1) in complex programs, your suggestion does not work. Eval terms influence the NPS differently in different parts of the game. A static node counter will simply not be as accurate as I want in such cases.

(2) I can't see any reason to limit opponents to a specific tree space size. That also changes the way _they_ play. Some programs search 2-3x faster in endgames than in the opening. This kind of nonsense completely invalidates games where a program is artificially limited to some "average" speed that can be 2x-3x off in the right kinds of positions.

I'm not trying to somehow handicap my opponent. I'm trying to measure the best that I can do in a given amount of time against the best that they can do in a given amount of time, and then decide if my new "best" is better or worse than the previous best version.

If you want to test like that, that is certainly your choice. I'm more interested in realistic comparisons where both programs use the time as they see fit so that I don't draw an invalid conclusion because I have unintentionally finagled the timing.

NPS is not static. Testing in this way assumes that it is.

xboard compliance and new feature

xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature

Re: xboard compliance and new feature