pre-announcement: 10x8 tourney

Evert · Post by **Evert** » Wed Oct 19, 2011 5:56 pm

I hacked up my perft routine to produce likely opening positions (after 8 ply). This produces ~600000 candidates. Those will need to be screened for being tactically quiet, I'll probably not do all of them, but take a random sample and generate a few thousand more-or-less balanced positions. That'll still take a while though.

Evert · Post by **Evert** » Wed Oct 19, 2011 10:14 pm

hgm wrote:Hmm, the draw rate in 10x8 Chess should be around 16%.
(For normal Chess I find indeed ~32%.) This is what Fairy-Max and Joker80 have in self-play. What material is typically on the board, when a game ends draw?

It varies, but in all cases it's pretty devestating: in some cases it just fails to deliver mate within 50 moves with a major piece, in other cases I've seen it reduce to bare kings from queen+several pawns vs lone king (!).
Something, somewhere, somehow, is deeply wrong...

hgm · Post by **hgm** » Wed Oct 19, 2011 10:49 pm

Indeed. But if it cannot mate in KQK it should be easy to trace. Most engines would give you the correct mating distance and PV for a position like

[d] 10/3k6/10/4Q5/5K4/10/10/10 w - - 0 1

Even Fair-Max gets the mate-in-7 in 9 sec on my 1.3GHz laptop, so if you get something different you can walk the tree to see where it goes wrong.

Code: Select all

 23	+79.93	31.2M	1&#58;06.68	e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7d8 c6b7 d8e8 f5e6 e8f8 b7f7
 22	+79.93	19.9M	0&#58;42.40	e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7f7 c6d7 f7f8
 21	+79.93	12.5M	0&#58;26.72	e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7f7 c6d7 f7f8
 20	+79.93	7.3M	0&#58;15.85	e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5
 19	+79.93	4.7M	0&#58;10.34	e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7f7 c6d7 f7f8
 18	+79.93	3.9M	0&#58;08.70	e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5
 18	+9.62	2.8M	0&#58;06.18	e5e4 d7d6 e4e5 d6c6 e5f5 c6b6 f5d5 b6c7 d5e5

Evert · Post by **Evert** » Thu Oct 20, 2011 12:26 am

Well, I seem to have solved at least some of the problem. There was a bug I introduced recently that could result in the program playing a random move (!) when mate was found. That seems to be fixed now, but there is still at least one bug lurking in there, because I've seen the engine report "mate-in-N" rather than "mated-in-N" (in other words, a positive mate score rather than a negative one) on a single ply before returning to sanity.

The program also seems to bounce around aimlessly for far too long before driving the king to the edge, and I've seen it draw KQK by 50 move rule on 40 moves/5 seconds, which should never happen. That, I suspect, is a problem with the tuning of the evaluation (balancing queen mobility vs. driving the king to a corner).

It indeed finds mate-in-7 fairly quickly in that position now, but it loses it immediately on the next ply and settles on mate-in-9.

Evert · Post by **Evert** » Fri Oct 21, 2011 10:55 pm

Ok, several rounds of bug-fixing later, things are looking much better and I get reasonable looking results. The draw rate for Capablanca is down to ~12%, which looks more reasonable (and there are no more problems mating a lone king). There are still some features I'd like to implement and a lot of evaluation tweaks to take care of.

One experiment that's running now is testing across variants. The current version of Sjaak measures as >100 elo stronger than the version that participated in Thermopilae at normal chess. I'm now running a test match in Spartan chess, and it's testing at equally strong, or possibly -10 elo (not nearly enough games so far though [EDIT: now at -4 with just over 400 games, well within the errorbar]). I had assumed that playing strength in one variant correlates with playing strength in another variant (at least as long as both are similar enough that similar concepts apply), but it seems that this is not the case!
Perhaps there's a set of evaluation parameters that ends up working well across different variants, tuning will tell. What might be tripping things up now is the mobility evaluation of the spartans, which is generally very high and mobility is what I have been tweaking. Or it could again be a problem with not enough variety in starting positions.

Still working on getting a decent set for Capablanca to start tuning there. I'll hopefully get that done soonish, after which I might try making a similar set for Spartan just for the heck of it. Right now I'm having fun implementing attack tables.

hgm · Post by **hgm** » Fri Oct 21, 2011 11:35 pm

It would indeed interesting that the effect on variants differ. Unless you added variant-specific evaluation, of course. Capablanca might be more similar to normal Chess than Spartan. Although King Safety might be even more critical there.

I have been runing a lot of tablebases this week, to figure out which pawnless endgames are generally won and which draw, in order to know when to discount scores when the Pawn count goes low and you get vulnarable for piece-vs-pawn sacs. Spartan Chess is quite different from normal Chess, because so many pieces have mating potential. The second King is also has a huge impact, which my tablebase generator cannot even handle yet: checking becomes a totally blunt instrument.

Evert · Post by **Evert** » Sat Oct 22, 2011 11:06 am

hgm wrote:It would indeed interesting that the effect on variants differ. Unless you added variant-specific evaluation, of course.

I'm trying very hard to resist doing that, unless it's necessary. So far I haven't really done that (the closest thing is returning a mate score from the evaluation in Shatranj for a bare king).
Anyway, these are the results of a tuning session with normal Chess:

Code: Select all

Rank Name          Elo    +    - games score oppo. draws 
   1 Sjaak 212M   2121   14   14  1490   66%  2028   52% 
   2 Sjaak 212M2  2118   14   14  1492   65%  2028   51% 
   3 Sjaak 212M3  2114   14   14  1497   65%  2028   54% 
   4 Sjaak 212M4  2112   14   14  1497   64%  2028   52% 
   5 Sjaak 204M   2108   10   10  3012   64%  2028   53% 
   6 Sjaak 204M2  2101   12   12  2111   63%  2028   55% 
   7 Sjaak 185    2028    5    5 11099   36%  2111   53%

And this is Spartan:

Code: Select all

Rank Name          Elo    +    - games score oppo. draws 
   1 Sjaak 212M3  2128   16   16  1600   56%  2077    9% 
   2 Sjaak 212M2  2119   16   16  1599   55%  2077   10% 
   3 Sjaak 212M4  2096   16   16  1600   52%  2077    9% 
   4 Sjaak 212M   2079   16   16  1599   50%  2077   11% 
   5 Sjaak 185    2077    8    8  6398   47%  2106   10%

Look for the different 212 versions (they represent different parameters controling the mobility evaluation).
The number of games is not that large and certainly for normal chess the top few are fairly close together. But comparing with Spartan, the best performing program on normal chess is no better than an older version at Spartan, while a version that seems a bit weaker at normal chess is clearly stronger at Spartan.
An open question I'll have to look into is why some of the games seem to have gone missing from the output. Looking at the results, the draw score for normal chess is also a bit odd, so I'll have to check that too.

EDIT: actually, there seem to be an aweful lot of time-losses in the Spartan test run. I'll probably have to figure out why and re-run it before drawing any conclusions.

Capablanca might be more similar to normal Chess than Spartan. Although King Safety might be even more critical there.

Well, I certainly hope so, it'll be much easier to test normal chess. King safety is one of the things that don't yet work in Capablanca though, but I'll get to it eventually.
The other thing that's sorely needed is a better evaluation of passed/candidate pawns. I'm just trying to think of a good way to do that in a way that works for other types of pawns too (although the only other relevant one that I can think of is the berolina pawn).

hgm · Post by **hgm** » Sat Oct 22, 2011 11:53 am

Well, Berolina Pawns are really messing up anything that is known of pawn evaluation. I am not even sure the concept 'passer' is useful there. The very quickly become passers as the board grows emptier, because they can choose their own path. But even in the asymmetric case, like Spartan, it is not always obvious this is an advantage. If I have a Pawn and Hoplite in the same file, the Pawn is blocked, and the Hoplite can pass it (if there is not exactly one square between them). But when it passes it, the Pawn becomes a passer as well. So the Hoplite is still not free to make a run for it, because it is bound to blocking the Pawn.

Perhaps the best way to treat generalized pawn endings is to just double the search depth in branches with only Pawns (when the root is not a pawn ending), give advance bonuses like they are passers, and compensating penalties when they are obviously blocked where they are standing. The search will figure it out, then. Another approach would be to try Pawn pushes in QS (like captures these are irreversible).

Ferdy · Post by **Ferdy** » Sat Oct 22, 2011 12:32 pm

Off topic but I'd like to know how many computer programs are able to play spartan chess? I will probably create one, I guess the play is interesting. Any reference for the rules? Thanks.

Evert · Post by **Evert** » Sun Oct 23, 2011 12:19 am

Evert wrote: EDIT: actually, there seem to be an aweful lot of time-losses in the Spartan test run. I'll probably have to figure out why and re-run it before drawing any conclusions.

Ok, when I added support for large boards I actually broke the parsing of castling flags, which meant that they weren't cleared for black kings in Spartan chess. Hilarity ensued as the board state got corrupted, trapping the program in an endless loop and resulting in a time-loss.
It was caught immediately by an assert when I tried to debug it, but fixing it properly will take a little bit of work. For now I've just restored the correct behaviour for small boards, I'll do it properly later.
In theory this could have affected normal chess as well, but since there it's actually possible to castle and I tend to start from late opening positions anyway, the impact there will be much reduced. The Spartan results now look a lot more reasonable and in line with what I'd expected:

Code: Select all

Rank Name         Elo    +    - games score oppo. draws 
   1 Sjaak 223M  2179   49   49   145   74%  2021   37% 
   2 Sjaak 185M  2021   49   49   145   26%  2179   37%

Still running, but I don't expect a big change here (EDIT: actually, the two versions seem to grow even further apart, now showing +180 elo over the old version).

I also spent some time playing with the attack table. As expensive as the in-check test is, it's probably close to optimal. The attack table performs much worse (by about a factor of 2) in perft, so it's pretty atrocious for that. In actual search it's not so bad since I always do in-check tests for both sides and they're now basically free. That means I almost recover the factor of 2 I lost, but not quite. If I can use the attack tables for move generation, SEE and evaluation as well, then it might be a net gain. So far it hasn't really worked yet, which is a shame. I was hoping for it to be a bit faster. Oh well.

pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney

Re: pre-announcement: 10x8 tourney