testing eval

sje · Post by **sje** » Sun Apr 24, 2011 6:23 pm

bob wrote:
sje wrote:My memory certainly isn't as good as it used to be, so I had to check my copy of Frey's Chess Skill in Man and Machine (1st edition). And lo and behold, on page 96 in the discussion of the pawn evaluation code of Chess 4.x we see the pawn advancement bonus multiplier vector, indexed by file:
Code: Select all
[0 0 3.9 5.4 7.0 2.3 0 0]
And that's asymmetric and would have to be reflected for a y-axis rotation score test to work.
Won't disagree, although it is a bad idea to do things that way. If you were to play a game and actually mirror left-to-right, would _your_ evaluation of the position really change? Mine wouldn't. not one point.

The CIL toolkit's evaluator won't change either as its pawn advancement table has flank symmetry. But is that much better than the asymmetric vector in Chess 4.x, at least when measured over the space of real-world games?

For y=x and y=-x swaps, I've made a change in the toolkit that detects undeveloped minor pieces not by home rank placement but by square membership in a rim-squares-only constant bitboard. So that will have diagonal symmetry.

But what about a rook bonus for being on an open file? Should there not also be a bonus for being on a open rank? For a position with no pawns, that would make sense.

Evert · Post by **Evert** » Sun Apr 24, 2011 7:59 pm

sje wrote:But what about a rook bonus for being on an open file? Should there not also be a bonus for being on a open rank? For a position with no pawns, that would make sense.

In a position with no pawns, it doesn't matter, right? I mean, every file is an open file in that case and so the effect of an open file bonus reduces to a bonus to the base value of the rook.

sje · Post by **sje** » Sun Apr 24, 2011 10:08 pm

Evert wrote:
sje wrote:But what about a rook bonus for being on an open file? Should there not also be a bonus for being on a open rank? For a position with no pawns, that would make sense.
In a position with no pawns, it doesn't matter, right? I mean, every file is an open file in that case and so the effect of an open file bonus reduces to a bonus to the base value of the rook.

A reflection about y=x or y=-x exchanges ranks with files. What if a rook is given a bonus based on being alone on a file but not being given a bonus when being alone on a rank?

And then there's the business of giving a seventh rank bonus.

bob · Post by **bob** » Sun Apr 24, 2011 10:36 pm

sje wrote:
bob wrote:
sje wrote:My memory certainly isn't as good as it used to be, so I had to check my copy of Frey's Chess Skill in Man and Machine (1st edition). And lo and behold, on page 96 in the discussion of the pawn evaluation code of Chess 4.x we see the pawn advancement bonus multiplier vector, indexed by file:
Code: Select all
[0 0 3.9 5.4 7.0 2.3 0 0]
And that's asymmetric and would have to be reflected for a y-axis rotation score test to work.
Won't disagree, although it is a bad idea to do things that way. If you were to play a game and actually mirror left-to-right, would _your_ evaluation of the position really change? Mine wouldn't. not one point.
The CIL toolkit's evaluator won't change either as its pawn advancement table has flank symmetry. But is that much better than the asymmetric vector in Chess 4.x, at least when measured over the space of real-world games?

For y=x and y=-x swaps, I've made a change in the toolkit that detects undeveloped minor pieces not by home rank placement but by square membership in a rim-squares-only constant bitboard. So that will have diagonal symmetry.

But what about a rook bonus for being on an open file? Should there not also be a bonus for being on a open rank? For a position with no pawns, that would make sense.

Not sure what you are asking. How would open file status change? or open rank status, if you just mirror the board?

bob · Post by **bob** » Sun Apr 24, 2011 10:38 pm

sje wrote:
Evert wrote:
sje wrote:But what about a rook bonus for being on an open file? Should there not also be a bonus for being on a open rank? For a position with no pawns, that would make sense.
In a position with no pawns, it doesn't matter, right? I mean, every file is an open file in that case and so the effect of an open file bonus reduces to a bonus to the base value of the rook.
A reflection about y=x or y=-x exchanges ranks with files. What if a rook is given a bonus based on being alone on a file but not being given a bonus when being alone on a rank?

And then there's the business of giving a seventh rank bonus.

I don't reflect diagonally, you end up with illegal positions. I suggest that you test (a) normal board (b) reflect by swapping ranks and changing the color of all pieces; (c) normal board mirrored by swapping files only, and (d) reflected board mirrored by swapping files only.

That doesn't produce illegal positions... (illegal because a pawn could end up on the 1st or 8th rank if you reflect diagonally...)

All I want to do is to be able to put my finger on any piece, say a knight on c3. And then reflect/mirror the position so that the knight visits c6 (different color), f6 (different color) and f3 (same color). I don't want it's score to change one bit since all the other pieces are making symmetric "shifts" at the same time. That has caught a ton of bugs over the years, and I probably should run the test again soon, just to verify...

bob · Post by **bob** » Sun Apr 24, 2011 10:40 pm

sje wrote:Also, for positions with no pawns it might be useful to reflect the board about the lines y = x and y = -x to see if the evaluation changes.

It would be difficult for me to think how I might pass the first 4 tests, and then have a bug with respect to diagonal flipping.

michiguel · Post by **michiguel** » Mon Apr 25, 2011 12:04 am

bob wrote:
sje wrote:Also, for positions with no pawns it might be useful to reflect the board about the lines y = x and y = -x to see if the evaluation changes.
It would be difficult for me to think how I might pass the first 4 tests, and then have a bug with respect to diagonal flipping.

You have, for instance, a PST with values that are symmetric horizontally and vertically, but not diagonally. In absence of pawns that is not desirable. That will be detected by steve's test. I kind of like the idea.

Miguel

Evert · Post by **Evert** » Mon Apr 25, 2011 1:44 am

sje wrote:A reflection about y=x or y=-x exchanges ranks with files. What if a rook is given a bonus based on being alone on a file but not being given a bonus when being alone on a rank?

I guess it depends on when you call a file "open". I was going with "file contains no pawns of either colour", in which case all files are open in the absence of pawns.

And then there's the business of giving a seventh rank bonus.

True.
But without pawns, there is no real reason to give a "seventh rank" bonus (I actually don't give a seventh rank bonus, I give a bonus for being on the same rank as the base of the enemy pawn structure, which normally works out to the same thing of course). It may make sense to limit mobility of the enemy king, but that's not quite the same thing.

I agree though: in the absence of pawns, the evaluation probably aught to be invariant under rotation. Reasons for why it may not be include weighing mobility on the enemy side of the board more heavily. May be worth it (from a philosophical point of view) to replace such a notion with something more general. Not sure whether it'd be worth it in terms of playing strength though, although in reality it probably doesn't matter either way since there's often pawns on the board until the game is decided.

michiguel · Post by **michiguel** » Mon Apr 25, 2011 1:49 am

Evert wrote:
sje wrote:A reflection about y=x or y=-x exchanges ranks with files. What if a rook is given a bonus based on being alone on a file but not being given a bonus when being alone on a rank?
I guess it depends on when you call a file "open". I was going with "file contains no pawns of either colour", in which case all files are open in the absence of pawns.

And then there's the business of giving a seventh rank bonus.
True.
But without pawns, there is no real reason to give a "seventh rank" bonus (I actually don't give a seventh rank bonus, I give a bonus for being on the same rank as the base of the enemy pawn structure, which normally works out to the same thing of course).

That's the whole point. You may have inadvertently left this in the code that evaluate these endgames, that this test will warn you about it.

Miguel

It may make sense to limit mobility of the enemy king, but that's not quite the same thing.

I agree though: in the absence of pawns, the evaluation probably aught to be invariant under rotation. Reasons for why it may not be include weighing mobility on the enemy side of the board more heavily. May be worth it (from a philosophical point of view) to replace such a notion with something more general. Not sure whether it'd be worth it in terms of playing strength though, although in reality it probably doesn't matter either way since there's often pawns on the board until the game is decided.

Don · Post by **Don** » Wed May 04, 2011 8:29 pm

outAtime wrote:How should I go about testing the output from a mobility score in eval is correct? How would I be able to see just this score while the engine is running?
Thanks!

You can use some of the suggestions such as reversing colors to expose a lot of bugs.

However, I'm not sure there is an easy way. Here is what I do.

I have a routine in my program called xdebug which forks xboard with the given position - simultaneously dumping any data I need to see. I also have a text representation but it's so much easier (for me) to view the current position with xboard.

I like to randomly sample, so with (for example) probability 1 in a million the program will throw up the current position of interest, in this case right after calculating mobility in the evaluation function - and on the screen print the number of squares of mobility for the various pieces. I will manually inspect that and do the count myself to see if it's correct, then move on to the next position. I may look at 100 positions or more before concluding that it might be correct.

In general I also try to expose any unusual cases - which will require additional debug code. I often only stop on one color and then test the other color in a separate step, since I don't like the constant context switching my brain has to do.

Another way to get correctness that I use a lot is that I will write 2 versions of the same routines and I will write them significantly different. Getting them to agree will generally expose bugs in BOTH routines. A good example of this is in zobrist hashing. I do this:

if (p->key != slow_hash_function(p)) { printf("HEY, there is a bug here!!!!\n"); exit(1); }

p is pointer to a position state structure and p->key is incrementally updated. But if it does not agree with the calculated hash then there is a bug. Put this after EVERY move made in the search and play a bunch of games and if it doesn't crash it might be correct.

In general always think it terms of trying to prove it's broken. It's a lot harder to prove it's correct. Thus you should take the point of view that it "might" be correct if you have a difficult time proving it's broken.

testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval

Re: testing eval