pre-announcement: 10x8 tourney
Moderator: Ras
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: pre-announcement: 10x8 tourney
I hacked up my perft routine to produce likely opening positions (after 8 ply). This produces ~600000 candidates. Those will need to be screened for being tactically quiet, I'll probably not do all of them, but take a random sample and generate a few thousand more-or-less balanced positions. That'll still take a while though.
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: pre-announcement: 10x8 tourney
It varies, but in all cases it's pretty devestating: in some cases it just fails to deliver mate within 50 moves with a major piece, in other cases I've seen it reduce to bare kings from queen+several pawns vs lone king (!).hgm wrote:Hmm, the draw rate in 10x8 Chess should be around 16%.
(For normal Chess I find indeed ~32%.) This is what Fairy-Max and Joker80 have in self-play. What material is typically on the board, when a game ends draw?
Something, somewhere, somehow, is deeply wrong...
-
hgm
- Posts: 28499
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: pre-announcement: 10x8 tourney
Indeed. But if it cannot mate in KQK it should be easy to trace. Most engines would give you the correct mating distance and PV for a position like
[d] 10/3k6/10/4Q5/5K4/10/10/10 w - - 0 1
Even Fair-Max gets the mate-in-7 in 9 sec on my 1.3GHz laptop, so if you get something different you can walk the tree to see where it goes wrong.
[d] 10/3k6/10/4Q5/5K4/10/10/10 w - - 0 1
Even Fair-Max gets the mate-in-7 in 9 sec on my 1.3GHz laptop, so if you get something different you can walk the tree to see where it goes wrong.
Code: Select all
23 +79.93 31.2M 1:06.68 e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7d8 c6b7 d8e8 f5e6 e8f8 b7f7
22 +79.93 19.9M 0:42.40 e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7f7 c6d7 f7f8
21 +79.93 12.5M 0:26.72 e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7f7 c6d7 f7f8
20 +79.93 7.3M 0:15.85 e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5
19 +79.93 4.7M 0:10.34 e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5 e7f7 c6d7 f7f8
18 +79.93 3.9M 0:08.70 e5c5 d7e6 c5c6 e6f7 f4g5 f7e7 g5f5
18 +9.62 2.8M 0:06.18 e5e4 d7d6 e4e5 d6c6 e5f5 c6b6 f5d5 b6c7 d5e5
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: pre-announcement: 10x8 tourney
Well, I seem to have solved at least some of the problem. There was a bug I introduced recently that could result in the program playing a random move (!) when mate was found. That seems to be fixed now, but there is still at least one bug lurking in there, because I've seen the engine report "mate-in-N" rather than "mated-in-N" (in other words, a positive mate score rather than a negative one) on a single ply before returning to sanity.
The program also seems to bounce around aimlessly for far too long before driving the king to the edge, and I've seen it draw KQK by 50 move rule on 40 moves/5 seconds, which should never happen. That, I suspect, is a problem with the tuning of the evaluation (balancing queen mobility vs. driving the king to a corner).
It indeed finds mate-in-7 fairly quickly in that position now, but it loses it immediately on the next ply and settles on mate-in-9.
The program also seems to bounce around aimlessly for far too long before driving the king to the edge, and I've seen it draw KQK by 50 move rule on 40 moves/5 seconds, which should never happen. That, I suspect, is a problem with the tuning of the evaluation (balancing queen mobility vs. driving the king to a corner).
It indeed finds mate-in-7 fairly quickly in that position now, but it loses it immediately on the next ply and settles on mate-in-9.
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: pre-announcement: 10x8 tourney
Ok, several rounds of bug-fixing later, things are looking much better and I get reasonable looking results. The draw rate for Capablanca is down to ~12%, which looks more reasonable (and there are no more problems mating a lone king). There are still some features I'd like to implement and a lot of evaluation tweaks to take care of.
One experiment that's running now is testing across variants. The current version of Sjaak measures as >100 elo stronger than the version that participated in Thermopilae at normal chess. I'm now running a test match in Spartan chess, and it's testing at equally strong, or possibly -10 elo (not nearly enough games so far though [EDIT: now at -4 with just over 400 games, well within the errorbar]). I had assumed that playing strength in one variant correlates with playing strength in another variant (at least as long as both are similar enough that similar concepts apply), but it seems that this is not the case!
Perhaps there's a set of evaluation parameters that ends up working well across different variants, tuning will tell. What might be tripping things up now is the mobility evaluation of the spartans, which is generally very high and mobility is what I have been tweaking. Or it could again be a problem with not enough variety in starting positions.
Still working on getting a decent set for Capablanca to start tuning there. I'll hopefully get that done soonish, after which I might try making a similar set for Spartan just for the heck of it. Right now I'm having fun implementing attack tables.
One experiment that's running now is testing across variants. The current version of Sjaak measures as >100 elo stronger than the version that participated in Thermopilae at normal chess. I'm now running a test match in Spartan chess, and it's testing at equally strong, or possibly -10 elo (not nearly enough games so far though [EDIT: now at -4 with just over 400 games, well within the errorbar]). I had assumed that playing strength in one variant correlates with playing strength in another variant (at least as long as both are similar enough that similar concepts apply), but it seems that this is not the case!
Perhaps there's a set of evaluation parameters that ends up working well across different variants, tuning will tell. What might be tripping things up now is the mobility evaluation of the spartans, which is generally very high and mobility is what I have been tweaking. Or it could again be a problem with not enough variety in starting positions.
Still working on getting a decent set for Capablanca to start tuning there. I'll hopefully get that done soonish, after which I might try making a similar set for Spartan just for the heck of it. Right now I'm having fun implementing attack tables.
-
hgm
- Posts: 28499
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: pre-announcement: 10x8 tourney
It would indeed interesting that the effect on variants differ. Unless you added variant-specific evaluation, of course. Capablanca might be more similar to normal Chess than Spartan. Although King Safety might be even more critical there.
I have been runing a lot of tablebases this week, to figure out which pawnless endgames are generally won and which draw, in order to know when to discount scores when the Pawn count goes low and you get vulnarable for piece-vs-pawn sacs. Spartan Chess is quite different from normal Chess, because so many pieces have mating potential. The second King is also has a huge impact, which my tablebase generator cannot even handle yet: checking becomes a totally blunt instrument.
I have been runing a lot of tablebases this week, to figure out which pawnless endgames are generally won and which draw, in order to know when to discount scores when the Pawn count goes low and you get vulnarable for piece-vs-pawn sacs. Spartan Chess is quite different from normal Chess, because so many pieces have mating potential. The second King is also has a huge impact, which my tablebase generator cannot even handle yet: checking becomes a totally blunt instrument.
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: pre-announcement: 10x8 tourney
I'm trying very hard to resist doing that, unless it's necessary. So far I haven't really done that (the closest thing is returning a mate score from the evaluation in Shatranj for a bare king).hgm wrote:It would indeed interesting that the effect on variants differ. Unless you added variant-specific evaluation, of course.
Anyway, these are the results of a tuning session with normal Chess:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Sjaak 212M 2121 14 14 1490 66% 2028 52%
2 Sjaak 212M2 2118 14 14 1492 65% 2028 51%
3 Sjaak 212M3 2114 14 14 1497 65% 2028 54%
4 Sjaak 212M4 2112 14 14 1497 64% 2028 52%
5 Sjaak 204M 2108 10 10 3012 64% 2028 53%
6 Sjaak 204M2 2101 12 12 2111 63% 2028 55%
7 Sjaak 185 2028 5 5 11099 36% 2111 53%
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Sjaak 212M3 2128 16 16 1600 56% 2077 9%
2 Sjaak 212M2 2119 16 16 1599 55% 2077 10%
3 Sjaak 212M4 2096 16 16 1600 52% 2077 9%
4 Sjaak 212M 2079 16 16 1599 50% 2077 11%
5 Sjaak 185 2077 8 8 6398 47% 2106 10%
The number of games is not that large and certainly for normal chess the top few are fairly close together. But comparing with Spartan, the best performing program on normal chess is no better than an older version at Spartan, while a version that seems a bit weaker at normal chess is clearly stronger at Spartan.
An open question I'll have to look into is why some of the games seem to have gone missing from the output. Looking at the results, the draw score for normal chess is also a bit odd, so I'll have to check that too.
EDIT: actually, there seem to be an aweful lot of time-losses in the Spartan test run. I'll probably have to figure out why and re-run it before drawing any conclusions.
Well, I certainly hope so, it'll be much easier to test normal chess. King safety is one of the things that don't yet work in Capablanca though, but I'll get to it eventually.Capablanca might be more similar to normal Chess than Spartan. Although King Safety might be even more critical there.
The other thing that's sorely needed is a better evaluation of passed/candidate pawns. I'm just trying to think of a good way to do that in a way that works for other types of pawns too (although the only other relevant one that I can think of is the berolina pawn).
-
hgm
- Posts: 28499
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: pre-announcement: 10x8 tourney
Well, Berolina Pawns are really messing up anything that is known of pawn evaluation. I am not even sure the concept 'passer' is useful there. The very quickly become passers as the board grows emptier, because they can choose their own path. But even in the asymmetric case, like Spartan, it is not always obvious this is an advantage. If I have a Pawn and Hoplite in the same file, the Pawn is blocked, and the Hoplite can pass it (if there is not exactly one square between them). But when it passes it, the Pawn becomes a passer as well. So the Hoplite is still not free to make a run for it, because it is bound to blocking the Pawn.
Perhaps the best way to treat generalized pawn endings is to just double the search depth in branches with only Pawns (when the root is not a pawn ending), give advance bonuses like they are passers, and compensating penalties when they are obviously blocked where they are standing. The search will figure it out, then. Another approach would be to try Pawn pushes in QS (like captures these are irreversible).
Perhaps the best way to treat generalized pawn endings is to just double the search depth in branches with only Pawns (when the root is not a pawn ending), give advance bonuses like they are passers, and compensating penalties when they are obviously blocked where they are standing. The search will figure it out, then. Another approach would be to try Pawn pushes in QS (like captures these are irreversible).
-
Ferdy
- Posts: 4853
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: pre-announcement: 10x8 tourney
Off topic but I'd like to know how many computer programs are able to play spartan chess? I will probably create one, I guess the play is interesting. Any reference for the rules? Thanks.
-
Evert
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: pre-announcement: 10x8 tourney
Ok, when I added support for large boards I actually broke the parsing of castling flags, which meant that they weren't cleared for black kings in Spartan chess. Hilarity ensued as the board state got corrupted, trapping the program in an endless loop and resulting in a time-loss.Evert wrote: EDIT: actually, there seem to be an aweful lot of time-losses in the Spartan test run. I'll probably have to figure out why and re-run it before drawing any conclusions.
It was caught immediately by an assert when I tried to debug it, but fixing it properly will take a little bit of work. For now I've just restored the correct behaviour for small boards, I'll do it properly later.
In theory this could have affected normal chess as well, but since there it's actually possible to castle and I tend to start from late opening positions anyway, the impact there will be much reduced. The Spartan results now look a lot more reasonable and in line with what I'd expected:
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Sjaak 223M 2179 49 49 145 74% 2021 37%
2 Sjaak 185M 2021 49 49 145 26% 2179 37%
I also spent some time playing with the attack table. As expensive as the in-check test is, it's probably close to optimal. The attack table performs much worse (by about a factor of 2) in perft, so it's pretty atrocious for that. In actual search it's not so bad since I always do in-check tests for both sides and they're now basically free. That means I almost recover the factor of 2 I lost, but not quite. If I can use the attack tables for move generation, SEE and evaluation as well, then it might be a net gain. So far it hasn't really worked yet, which is a shame. I was hoping for it to be a bit faster. Oh well.
Last edited by Evert on Sun Oct 23, 2011 12:25 am, edited 1 time in total.