Interesting difference between variants

Evert · Post by **Evert** » Fri Nov 11, 2011 4:41 pm

Because Sjaak was having problems (in Capablanca) against Fairy-Max due to horizon effects involving hanging pieces, I decided to experiment with adding an evaluation term for hanging pieces.
In short, there is a small fixed penalty for pieces that are undefended but not attacked. There is penalty proportional to the piece value for pieces that are attacked but not defended and a smaller penalty (still proportional to the value of the piece) for pieces that are defended but attacked by a piece of lower value.

Although my motivation for adding the term is problems in Capablanca, my first test was a self-play match in regular chess. The results:

Code: Select all

Rank Name                 Elo    +    - games score oppo. draws 
   1 Sjaak 312M-hanging  2116   10   10  3196   54%  2084   23% 
   2 Sjaak 312           2084   10   10  3196   46%  2116   23%

In other words, a nice increase in strength.
However, the next stage of testing was to do the same thing for Spartan chess:

Code: Select all

Rank Name                 Elo    +    - games score oppo. draws 
   1 Sjaak 312           2131   16   16  1538   58%  2069   10% 
   2 Sjaak 312M-hanging  2069   16   16  1538   42%  2131   10%

Not at the same accuracy yet, but it's very clear that results are completely different.

To me, this is entirely unexpected. I'm not sure what causes this difference (last time I saw something like this I had a problem where one version of the program was crashing, resulting in a massive amount of time-losses, this is not the case now) and I'm not sure what to do with it. An ugly option is to tune the evaluation for each variation separately, but I'd rather not do that.

Two things are different between the testing for normal chess and Spartan chess. First of all, for normal chess, I start from different opening positions. For Spartan I don't have a set of opening positions, so the engine has to figure it out on its own. Second of all, the evaluation of passed pawns is broken for the Spartan hoplite pawns, but it is exactly the same in the two versions, so I wouldn't expect this to make a difference.

Thoughts?

hgm · Post by **hgm** » Fri Nov 11, 2011 5:37 pm

Spartan Chess is an asymmetric game, so the effect on eval changes need not necessarily be the same for playing white or black. Can you split the stats into white and black games?

I could imagine that this hanging-piece penalty needs scaling with the number of opponent super-pieces. In KRPPPKRPP there should not be much danger in letting your Rook unprotected. In KQKRPPPP, however...

Spartan Chess is probably less tactical than than FIDE Chess. The strongest pieces of the Spartans could be less dangerous to hanging pieces than a Queen, because they have viewer slider moves.

Evert · Post by **Evert** » Fri Nov 11, 2011 6:08 pm

hgm wrote:Spartan Chess is an asymmetric game, so the effect on eval changes need not necessarily be the same for playing white or black. Can you split the stats into white and black games?

It'll require a bit of Perl hacking, but yes, I could. I'll get back with the results.

I could imagine that this hanging-piece penalty needs scaling with the number of opponent super-pieces. In KRPPPKRPP there should not be much danger in letting your Rook unprotected. In KQKRPPPP, however...

Hmm... maybe.
Then again, the most severe penalty is given for leaving a major piece under attack from a minor piece, which is more likely to be a bishop than a lieutenant or captain (but perhaps more likely to be a hoplite than a pawn).

Spartan Chess is probably less tactical than than FIDE Chess. The strongest pieces of the Spartans could be less dangerous to hanging pieces than a Queen, because they have viewer slider moves.

Maybe.
I guess that will also be answered (after a fashion) when results are split per colour.

EDIT:
Some simple statistics: the number of white wins, black wins and their ratio:

Code: Select all

Sjaak 312M-hanging             1115.5 / 2669 (41.79%) (303 drawn 542 white 422 black (w/b=1.284)
Sjaak 312                      1553.5 / 2669 (58.21%) (303 drawn 775 white 627 black (w/b=1.236)

So both versions win more as white than they do as black, the version with hanging piece evaluation loses more games with either colour, but relatively more as black.
Still not sure how to interpret that.

Evert · Post by **Evert** » Sun Nov 13, 2011 10:02 am

After examining some games, it seems that as implemented the evaluation term tends to underevaluate the threat posed by the Spartan pieces (which are short range and less valuable, typically, than the Persion pieces) so that as white, it's too optimistic about its own position and as black it's too pessimistic.
Looking at the evaluation terms as implemented, I think I've found where things may be going wrong for unequal armies and I've modified it slightly. Running a second test now...

Ferdy · Post by **Ferdy** » Mon Nov 14, 2011 1:57 am

Evert wrote:Because Sjaak was having problems (in Capablanca) against Fairy-Max due to horizon effects involving hanging pieces, I decided to experiment with adding an evaluation term for hanging pieces.
In short, there is a small fixed penalty for pieces that are undefended but not attacked. There is penalty proportional to the piece value for pieces that are attacked but not defended and a smaller penalty (still proportional to the value of the piece) for pieces that are defended but attacked by a piece of lower value.

Although my motivation for adding the term is problems in Capablanca, my first test was a self-play match in regular chess. The results:
Code: Select all
Rank Name                 Elo    +    - games score oppo. draws 
   1 Sjaak 312M-hanging  2116   10   10  3196   54%  2084   23% 
   2 Sjaak 312           2084   10   10  3196   46%  2116   23% 
In other words, a nice increase in strength.
However, the next stage of testing was to do the same thing for Spartan chess:
Code: Select all
Rank Name                 Elo    +    - games score oppo. draws 
   1 Sjaak 312           2131   16   16  1538   58%  2069   10% 
   2 Sjaak 312M-hanging  2069   16   16  1538   42%  2131   10% 
Not at the same accuracy yet, but it's very clear that results are completely different.

To me, this is entirely unexpected. I'm not sure what causes this difference (last time I saw something like this I had a problem where one version of the program was crashing, resulting in a massive amount of time-losses, this is not the case now) and I'm not sure what to do with it. An ugly option is to tune the evaluation for each variation separately, but I'd rather not do that.

Two things are different between the testing for normal chess and Spartan chess. First of all, for normal chess, I start from different opening positions. For Spartan I don't have a set of opening positions, so the engine has to figure it out on its own. Second of all, the evaluation of passed pawns is broken for the Spartan hoplite pawns, but it is exactly the same in the two versions, so I wouldn't expect this to make a difference.

Thoughts?

My code for hoplite passers are different from code of pawn passers. To minimize code complexity I detect passers only at 5th, and 6th ranks for both sides, of course 7th rank pawns and hoplites are always passers to me. So from 5th to 7th ranks I give additional points to those for being a passer.
I believe normal chess and spartan chess should not be compared, they are different. Hoplites can be stopped on its track by guarding 2 squares (the 2 diagonal squares in front) whereas pawns can be stopped by just 1 square in front. I guess spartans should use this advantage. Added to that the spartans have 2 kings, and this is where programs may differ big time.

Evert · Post by **Evert** » Mon Nov 14, 2011 10:31 am

Ferdy wrote: My code for hoplite passers are different from code of pawn passers. To minimize code complexity I detect passers only at 5th, and 6th ranks for both sides, of course 7th rank pawns and hoplites are always passers to me. So from 5th to 7th ranks I give additional points to those for being a passer.
I believe normal chess and spartan chess should not be compared, they are different. Hoplites can be stopped on its track by guarding 2 squares (the 2 diagonal squares in front) whereas pawns can be stopped by just 1 square in front. I guess spartans should use this advantage. Added to that the spartans have 2 kings, and this is where programs may differ big time.

I detected passed pawns on every rank (it's a simple bitboard test anyway), but I indeed don't want to test for passed hoplites before they reach the 6th rank or so. They're far too mobile for that. Even on the 6th rank I'll do it approximately, and give a lower bonus for "passed hoplites" than for passed pawns (since it's so easy for hoplites to be passers).
I agree that normal chess and Spartan chess are (very) different games, but I don't want to write variant-specific evaluation functions in Sjaak (and I'd like to avoid evaluation terms that have to be specified separately for each variant, I'd get rid of piece values if there was a reliable algorithm to predict these). Obviously there will have to be some specific terms, but I'd like these to be boolean terms that derive from the game rules.

Anyway, the idea I had seems to produce somewhat conficting results. I no longer have the big drop in elo for Spartan anyway, so that's a start. But before I look into it further I've decided to fix my passed hoplite code; I need to have it anyway.

Evert · Post by **Evert** » Tue Nov 15, 2011 7:21 am

Ok, now this is interesting. I "fixed" the passer evaluation for hoplites so it would only be done from the 6th rank onward (I still need to add the 5th rank, which is fairly easy but on the TODO list). Previously, due to the way the test was done, hoplites would almost never be scored as passers, even if they were.
The result? About -20 elo. Weird...

Interesting difference between variants

Interesting difference between variants

Re: Interesting difference between variants

Re: Interesting difference between variants

Re: Interesting difference between variants

Re: Interesting difference between variants

Re: Interesting difference between variants

Re: Interesting difference between variants