Crafty and software development

jwes · Post by **jwes** » Sat Nov 28, 2009 11:07 pm

I realized that Bob is right about this. If you calculate the difference in nps between versions, you can get essentially the same information that you get with fixed node tests. It would be interesting to see tests of the same version running at different speeds (easiest would be with different optimization settings) to calibrate the effect of small differences of speed on elo.

bob · Post by **bob** » Sat Nov 28, 2009 11:45 pm

jwes wrote:I realized that Bob is right about this. If you calculate the difference in nps between versions, you can get essentially the same information that you get with fixed node tests. It would be interesting to see tests of the same version running at different speeds (easiest would be with different optimization settings) to calibrate the effect of small differences of speed on elo.

You can come pretty close to this. If you take the usual 2x faster = +50 Elo stronger, then each 1% change in speed is about 1/2 Elo. We've seen this pretty consistently. When we were doing the mobility cache stuff, the speed improvement was 5-6%, and it was worth right at +3 Elo...

Don · Post by **Don** » Sun Nov 29, 2009 1:41 am

jwes wrote:I realized that Bob is right about this. If you calculate the difference in nps between versions, you can get essentially the same information that you get with fixed node tests. It would be interesting to see tests of the same version running at different speeds (easiest would be with different optimization settings) to calibrate the effect of small differences of speed on elo.

Bob is doing more hand waving here because this is a very crude and inaccurate way to estimate speed.

Nodes per second or games based on fixed nodes are worthless because nodes isn't time. In fact it's possible (and even likely) that a change will decrease your nodes per second while actually increasing the speed of your program (not ELO, just speed.) Or visa versa. It's because anything that changes the shape of the tree is like random noise to the nodes to per second and not fully correlated to the actual performance of the program, even apart from ELO gain.

So what Bob is saying is complete nonsense unless you are only interested in an extremely crude approximation. I think in Bob's case he is only interested in the most significant bits and as such this is probably good enough to reveal that some change had a big impact on the nodes per second.

I would remind you that even if this is good enough for your needs you must measure the nodes per second over hundreds of games. Don't just time 10 positions with a stopwatch.

I run mostly time control games, but when I want to know the speed and the ELO impact as separate components, I time every game at some fixed depth. I think this is probably the most accurate possible way to measure the pure CPU performance impact of a change, and EVEN THIS is not accurate until you have played a LOT of games. Like any other kinds of testing you are dependent on the law of large numbers, you must run a lot of game to get an accurate reading, just like you have to run a lot of games to get an ELO estimate that is not too crude.

So I would have to say that this comes down to what you want to measure and how accurately you want to measure it. I think it gives you a real advantage if you are not oblivious to things that normal testing doesn't expose.

Don · Post by **Don** » Sun Nov 29, 2009 1:53 am

bob wrote:
jwes wrote:I realized that Bob is right about this. If you calculate the difference in nps between versions, you can get essentially the same information that you get with fixed node tests. It would be interesting to see tests of the same version running at different speeds (easiest would be with different optimization settings) to calibrate the effect of small differences of speed on elo.
You can come pretty close to this. If you take the usual 2x faster = +50 Elo stronger, then each 1% change in speed is about 1/2 Elo. We've seen this pretty consistently. When we were doing the mobility cache stuff, the speed improvement was 5-6%, and it was worth right at +3 Elo...

Yes, I use something like this formula. Unfortunately the formula changes over the strength curve! But of course nobody questions that if you get a free speedup you have improved your program.

What I have discovered (if you read my previous post) is that NPS is pretty unreliable. It doesn't always correlate even to general program speedup. For example if I get a 1% improvement in NPS it doesn't really mean my program is 1% faster. For an example of this, if I improve the move ordering of the program it reduces the nodes per second. It may only be by 1 or 2 percent but even if I could reliably measure a 1% nodes per second improvement it would have been useless for determining that this was a good change.

But what you WILL see is that if you run 1000 fixed depth positions, they will run faster even though the NPS is less. I have never seen something like this make the program play weaker.

That's why I don't believe in fixed node testing, or using fixed nodes for anything other than bragging rights. Many authors like to play games with nodes per second accounting but I don't even consider it valid for the SAME program, let along comparing 2 different programs!

bob · Post by **bob** » Sun Nov 29, 2009 6:40 pm

Don wrote:
bob wrote:
jwes wrote:I realized that Bob is right about this. If you calculate the difference in nps between versions, you can get essentially the same information that you get with fixed node tests. It would be interesting to see tests of the same version running at different speeds (easiest would be with different optimization settings) to calibrate the effect of small differences of speed on elo.
You can come pretty close to this. If you take the usual 2x faster = +50 Elo stronger, then each 1% change in speed is about 1/2 Elo. We've seen this pretty consistently. When we were doing the mobility cache stuff, the speed improvement was 5-6%, and it was worth right at +3 Elo...
Yes, I use something like this formula. Unfortunately the formula changes over the strength curve! But of course nobody questions that if you get a free speedup you have improved your program.

What I have discovered (if you read my previous post) is that NPS is pretty unreliable. It doesn't always correlate even to general program speedup. For example if I get a 1% improvement in NPS it doesn't really mean my program is 1% faster. For an example of this, if I improve the move ordering of the program it reduces the nodes per second. It may only be by 1 or 2 percent but even if I could reliably measure a 1% nodes per second improvement it would have been useless for determining that this was a good change.

But what you WILL see is that if you run 1000 fixed depth positions, they will run faster even though the NPS is less. I have never seen something like this make the program play weaker.

That's why I don't believe in fixed node testing, or using fixed nodes for anything other than bragging rights. Many authors like to play games with nodes per second accounting but I don't even consider it valid for the SAME program, let along comparing 2 different programs!

This is a bit of apples and oranges. If you change algorithms, but not the shape of the tree, NPS is perfectly correlated to speed. If you change ordering ideas, then the size of the tree is the thing to measure, not the NPS. Rather than NPS I use total nodes and total time for these tests. No sense in searching a 10% smaller tree using 20% more time, of course.

But for speed optimizations, NPS is the perfect measure, although time gives the same information of course.

bob · Post by **bob** » Sun Nov 29, 2009 6:51 pm

Don wrote:
jwes wrote:I realized that Bob is right about this. If you calculate the difference in nps between versions, you can get essentially the same information that you get with fixed node tests. It would be interesting to see tests of the same version running at different speeds (easiest would be with different optimization settings) to calibrate the effect of small differences of speed on elo.
Bob is doing more hand waving here because this is a very crude and inaccurate way to estimate speed.

Nodes per second or games based on fixed nodes are worthless because nodes isn't time. In fact it's possible (and even likely) that a change will decrease your nodes per second while actually increasing the speed of your program (not ELO, just speed.) Or visa versa. It's because anything that changes the shape of the tree is like random noise to the nodes to per second and not fully correlated to the actual performance of the program, even apart from ELO gain.

First, "bob" is a whole lot smarter than you apparently give him credit for. He fully understands that if you do purely programming optimizations, which does not change the node count whatsoever, that NPS is the _perfect_ speed measure. he also fully understands the idea that if you change anything that alters the shape of the tree, from move ordering to extensions or reductions, then NPS is "almost" completely irrelevant and time is the only measurement of interest.

I do not see why you keep coming up with these absurd ideas you think I have. I've only been doing this for 42 years now. I pretty well understand tree search, and program optimization, and how they interact (or not). I know _exactly_ how to test any change I make, whether it be a simple speed optimization, or a search space optimization... And I would bet that 90% of the chess programmers understand this concept if they are any good at all.

First you seem to think I can't figure out if my changes have made the program faster, using timed matches. Of course not, because I don't measure pure optimization speedups that way. If you make the tree smaller, my normal cluster testing will pick that up just fine because there will be a corresponding increase in search depth and playing skill. This is all "computer chess 101" and I don't see why you want to continue to imply that I don't know how to test or evaluate changes. We're doing pretty well as far as I can determine.

So what Bob is saying is complete nonsense unless you are only interested in an extremely crude approximation. I think in Bob's case he is only interested in the most significant bits and as such this is probably good enough to reveal that some change had a big impact on the nodes per second.

No, you are building up a strawman argument, by defining something I am _not_ doing, and then criticizing me for not doing it, as if I actually were doing it. Let's get this back to reality.

I would remind you that even if this is good enough for your needs you must measure the nodes per second over hundreds of games. Don't just time 10 positions with a stopwatch.

Totally unnecessary. If you have a reasonable set of positions, including opening, middlegame, endgame, tactical, non-tactical, you don't need hundreds of games to figure out if your speed optimization is better or not. Or at least _I_ don't. You may, for all I can determine from this discussion.

I run mostly time control games, but when I want to know the speed and the ELO impact as separate components, I time every game at some fixed depth. I think this is probably the most accurate possible way to measure the pure CPU performance impact of a change, and EVEN THIS is not accurate until you have played a LOT of games. Like any other kinds of testing you are dependent on the law of large numbers, you must run a lot of game to get an accurate reading, just like you have to run a lot of games to get an ELO estimate that is not too crude.

And that is utter nonsense. If you speed the program up, it will play stronger in a timed match. Plain and simple. Whether your speed up is in terms of NPS, or in terms of reduced tree size is completely irrelevant. And my testing is even better, because what do you do if you screw up reductions? You get to a fixed depth faster, but play weaker. I detect that in a heartbeat in timed testing.

If you are happy with fixed depth or fixed node testing, fine. But you are not making any sort of case at all about why my testing is flawed. I know _exactly_ what every change we make does to speed, when we are making changes that are supposed to simply speed us up without changing the node count for a fixed depth whatsoever. And I know _exactly_ what every change we make does to Elo. Nothing else counts.

So I would have to say that this comes down to what you want to measure and how accurately you want to measure it. I think it gives you a real advantage if you are not oblivious to things that normal testing doesn't expose.

If there _was_ anything that normal testing doesn't expose, that is...

Unfortunately, my definition of "normal testing" apparently is not the same as your much more narrow definition of the term. I _know_ how to test changes.

Dirt · Post by **Dirt** » Sun Nov 29, 2009 7:54 pm

bob wrote:If there _was_ anything that normal testing doesn't expose, that is...

Would Crafty be stronger if it didn't consider promoting to a bishop in the search? If it did make Crafty stronger by a few Elo, would that be enough reason to remove that ability (outside of TBs)?

bob · Post by **bob** » Mon Nov 30, 2009 1:55 am

Dirt wrote:
bob wrote:If there _was_ anything that normal testing doesn't expose, that is...
Would Crafty be stronger if it didn't consider promoting to a bishop in the search? If it did make Crafty stronger by a few Elo, would that be enough reason to remove that ability (outside of TBs)?

Absolutely, if it helped. But it doesn't seem to make any difference. I have tried lots of combinations, and at someone's suggestion even included promotions to knight in the q-search. In the q-search, that did hurt a little as the search simply gets a little bit bigger.

The reason it doesn't make much difference, from back when I looked at it, is that most of the time the promotions are futile. e8=Q followed by Rxe8, and e8=anything is followed by Rxe8. Fortunately, right after ripping the promoted piece, we get a hash hit so the cost is really very small, if anything.

jwes · Post by **jwes** » Mon Nov 30, 2009 6:11 am

bob wrote:
Dirt wrote:
bob wrote:If there _was_ anything that normal testing doesn't expose, that is...
Would Crafty be stronger if it didn't consider promoting to a bishop in the search? If it did make Crafty stronger by a few Elo, would that be enough reason to remove that ability (outside of TBs)?
Absolutely, if it helped. But it doesn't seem to make any difference. I have tried lots of combinations, and at someone's suggestion even included promotions to knight in the q-search. In the q-search, that did hurt a little as the search simply gets a little bit bigger.

The reason it doesn't make much difference, from back when I looked at it, is that most of the time the promotions are futile. e8=Q followed by Rxe8, and e8=anything is followed by Rxe8. Fortunately, right after ripping the promoted piece, we get a hash hit so the cost is really very small, if anything.

An idea I've tried is:
If a promotion to Q fails low and the refutation is a capture of the promoted queen, don't even try the under-promotions. I don't think it can ever hurt, and it can help when capture of the under-promoted piece is not the first move tried.

bob · Post by **bob** » Mon Nov 30, 2009 9:29 pm

jwes wrote:
bob wrote:
Dirt wrote:
bob wrote:If there _was_ anything that normal testing doesn't expose, that is...
Would Crafty be stronger if it didn't consider promoting to a bishop in the search? If it did make Crafty stronger by a few Elo, would that be enough reason to remove that ability (outside of TBs)?
Absolutely, if it helped. But it doesn't seem to make any difference. I have tried lots of combinations, and at someone's suggestion even included promotions to knight in the q-search. In the q-search, that did hurt a little as the search simply gets a little bit bigger.

The reason it doesn't make much difference, from back when I looked at it, is that most of the time the promotions are futile. e8=Q followed by Rxe8, and e8=anything is followed by Rxe8. Fortunately, right after ripping the promoted piece, we get a hash hit so the cost is really very small, if anything.
An idea I've tried is:
If a promotion to Q fails low and the refutation is a capture of the promoted queen, don't even try the under-promotions. I don't think it can ever hurt, and it can help when capture of the under-promoted piece is not the first move tried.

I'll try to test that and a couple of related ideas to see if there is any improvement at all...

Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development

Re: Crafty and software development