Weird colour bias

mike_bike_kite · Post by **mike_bike_kite** » Tue Sep 04, 2012 10:53 am

I've noticed a colour bias when my program plays itself. I assume I've added a bug at some point but the problem is how to identify it. My program is a web based java program and not particularly strong but I find it fun to work on.

I let the program play itself and I can switch out various parts of the program (ie different scoring routines etc). I'll play 10 games with feature X enabled for white and disabled for black. Then I'll play another 10 with that feature enabled for black and disabled for white. The idea being that I can test whether a feature is useful or not. If one side (say black) wins most of the games no matter whether the feature is enabled or not then I report a bias.

I'm often getting a strong bias (always to black) and it happens for some enabled features more than others. I tried looking for common code that might get called so I could look for bugs but there doesn't appear to be an obvious relationship.

Anybody had a similar problem or can just give me some pointers?

Mike

hgm · Post by **hgm** » Tue Sep 04, 2012 11:12 am

I am not sure if this is the same phenomenon as you are talking about. But when testing micro-Max / Fairy-Max in self-play, (identical versions, but possibly from asymmetric initial position), I noticed that the scores would systematically differ for a position where white had the move and the color-reversed position where black had the move.

E.g., for the standard opening position you would see that white had a certain first-move advantage, and score, say, 57%, while giving black the first move (which is the same as color-reversing, as the start position is symmetric), would produce only a 51% advantage. So it seemed that having the first move would provide on average a 4% advantage, while the mere fact that you played white (even if black had the first move) gave you a 3% advantage. This bias consistently showed up for all initial positions, even asymmetric ones.

Of course it puzzled me how playing with black moving first could be different from playing with white. But I could finally trace it to the fact that Fairy-Max uses a mailbox board without piece list, so that scans the board for finding pieces to move. As it scans from a8 to h1, for white it would typically try Pawn moves first, and for black piece moves. In case of equal score the order would be decisive. So white had a slight preference for Pawn moves. This led to slightly more aggressive play, which was apparently on average a better strategy.

mike_bike_kite · Post by **mike_bike_kite** » Tue Sep 04, 2012 12:41 pm

Very interesting. I still suspect my issue is more to do with a mistake I've made. The problem is how to find it

tpetzke · Post by **tpetzke** » Tue Sep 04, 2012 1:23 pm

Hi Mike,

I assume you have a eval term that is by error not symmetric an easy example would be to score the white queen material with 900 cp and the black queen material also with 900 cp. (where it should have a score of -900cp).

In a play against another engine this would show up rather fast as a mistake, in self play maybe a bit later because both sides are affected by that error. While one side will try to exchange queens the other will try to avoid that.

If you have such an error in a less influential term you will not necessarily notice stupid play but rather a change in the win - loss stats.

Good news is that it is rather easy to find the error.

Just implement two procedures to mirror the board horizontally and vertically (flip and flop). Then use a large set of game positions from a FEN file and call eval for the original position and for the 2 mirrored ones.
(Don't forget to switch the side to move when mirroring along the ranks)

Eval should always give you the same score for all 3 positions.

One possible exception: If at least one side still has the castling rights present mirroring along the files can lead to a score deviation, which might be ok. In such positions I mirror only along the ranks (which is more important anyway as this spots the errors in the White - Black eval)

Thomas...

kbhearn · Post by **kbhearn** » Tue Sep 04, 2012 1:35 pm

First off, if you suspect you have a bug causing a colour bias:

Under the assumption that you've already tested your move generator (via perft) and thus the bias would appear in eval if anywhere:

Make a 'reverse' function or manually feed your engine reversed fens and test if evalWTM = -evalBTM for the reversed pairs. The automatic function has the advantage that if this bias doesn't happen often, you can just run normal searches with an additional check to reverse and test if eval is as expected at each position you pass, logging the positions where the exceptions occur so you can then look into them further. If you can find a position where this is not true, then break down the eval for each side to move and find which chunk is evaluating differently for white than black.

But!

I'd suspect your real problem could be that you're looking at 20 game samples. This is not enough to statistically show anything, so it'd really have to be something else you observe about how one side plays verse the other and not just results to cause you to assume that the bias you're seeing is a result of a bug and not just randomness.

Evert · Post by **Evert** » Tue Sep 04, 2012 4:26 pm

mike_bike_kite wrote: I let the program play itself and I can switch out various parts of the program (ie different scoring routines etc). I'll play 10 games with feature X enabled for white and disabled for black. Then I'll play another 10 with that feature enabled for black and disabled for white. The idea being that I can test whether a feature is useful or not. If one side (say black) wins most of the games no matter whether the feature is enabled or not then I report a bias.

With that few games you are completely dominated by random statistical noise. Try playing a few 100 or (better) 1000 games and see whether the bias is still there.

That said, scanning the board in a particular direction can have more of an impact on playing strength than you might think if it affects move ordering. More generally, what you should do is test the following: for a given test position, print the static evaluation as well as the search result for an n-ply search. Then flip the position (reversed colours, but also flip left/right if it makes sense to do so, ie, there are no castling rights). The static evaluation should be the same and the selected move from the search should be the corresponding move. If this is not the case you have a bug somewhere.

Unless you have some deliberate asymmetries built in, but it doesn't sound like you do.

Codesquid · Post by **Codesquid** » Tue Sep 04, 2012 6:37 pm

hgm wrote:I am not sure if this is the same phenomenon as you are talking about. But when testing micro-Max / Fairy-Max in self-play, (identical versions, but possibly from asymmetric initial position), I noticed that the scores would systematically differ for a position where white had the move and the color-reversed position where black had the move.

[...]

Of course it puzzled me how playing with black moving first could be different from playing with white. But I could finally trace it to the fact that Fairy-Max uses a mailbox board without piece list, so that scans the board for finding pieces to move. As it scans from a8 to h1, for white it would typically try Pawn moves first, and for black piece moves. In case of equal score the order would be decisive. So white had a slight preference for Pawn moves. This led to slightly more aggressive play, which was apparently on average a better strategy.

Very intriguing idea. I presume that it also applies to bitboard based programs, since I think the same direction of bitscan is used in move generation regardless of color.

In order to test that a program truly plays the same with a color-reversed position, would it suffice to check that the total number of nodes searched to a certain depth for some test positions is identical in the normal and the reversed case? (Edit: Evert already beat me on this one)

The other open question is whether this problem actually results in a measurable change in elo and whether one can make advantage of this.

Weird colour bias

Weird colour bias

Re: Weird colour bias

Re: Weird colour bias

Re: Weird colour bias

Re: Weird colour bias

Re: Weird colour bias

Re: Weird colour bias