Thermopylay Marathon 2011 (live!)

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Thermopylay Marathon 2011 - Final Standings

Post by hgm »

The palyoffs have finished. Fairy-Max, had obtained its ticket to the playoffs at the expense of Oberon, in a thrilling game. (It seemed to be winning, but then was taken by surprise by Oberon's promotion to King, but managed to draw despite of the heavy material disadvantage this resulted in, by converting to a drawn KWKR end-game, thus preserving its 0.5 poit lead over Oberon.) But it could not keep up with the other finalists, and only managed a single point because Spartacus went haywire and threw a game in a totally won position by sacrificing all its pieces without any purpose. (Still a mystery why this happened; it does not reproduce from the fatal positions.)

Sjaak managed a draw against Nebiyu, and was very close to a 50-move draw against Spartacus: In a KQBKG end-game Spartacus was not able to make much progress for a long time, but then Sjaak's defenses crumbled, and it was checkmated just 3 moves before the 50-move limit. I ran this 5-men ending through my tablebase generator, and it does seem to be generally won, in a maximum of 33 moves.

So before the decisive games between the two, Nebiyu had gained 0.5 point on Spartacus, but this was not enough to clode the gap entirely. So a 1-1 result would have been enough for Spartacus to secure the title, and Nebiyu needed to win. But it went the other way: Spartacus won both games, each game staying nearly equal for a long time, but being decided in the early end-game. The decisive game (posted below) is a good example of how fatal the possession of a 'bad Bishop' can be, and Spartacus skillfully exploited that.

Code: Select all

Cross table, sorted by score percentage, Buchholz, SB

                              Spa Neb Sja Fai Obe Che Cat
 1. Spartacus 0.23            ### =10 011 =11 10= 111 111
                              ### 111 111 010 1   1   0     75%  22.5 (391.0, 289.8)

 2. Nebiyu 1.1                =01 ### 11= 111 111 0=1 111
                              000 ### 0=1 111 0   0   1     67%  20.0 (406.0, 243.5)

 3. Sjaak 92                  100 00= ### 1=1 101 001 110
                              000 1=0 ### 011 1   1   1     52%  15.5 (433.0, 188.5)

 4. Fairy-Max  4.8R           =00 000 0=0 ### 10= 11= 111
                              101 000 100 ### 1   1   0     43%  13.0 (448.0, 152.5)

 5. Oberon                    01= 000 010 01= ### 10= 111
                              0   1   0   0   ### 1   1     48%  11.5 (338.0, 131.5)

 6. ChessV (Spartan)          000 1=0 110 00= 01= ### 100
                              0   1   0   0   0   ### 0     31%   7.5 (354.0, 110.8)

 7. Catalyst 3                000 000 001 000 000 011 ###
                              1   0   0   1   0   1   ###   25%   6.0 (360.0,  73.5)

Code: Select all

[Event "Thermopylae Marathon 2011"]
[Site "SCHAAKPC"]
[Date "2011.02.12"]
[Round "3.2"]
[White "Nebiyu 1.1"]
[Black "Spartacus 0.23"]
[Result "0-1"]
[TimeControl "40/1440"]
[Variant "spartan"]
[Number "11"]
[Annotator "1. +0.15   1... +2.10"]

1. c4 {+0.15/16 35} Lf6 {+2.10/14 34} 2. b3 {+0.20/16 23} Lc6 {+2.44/14 32}
3. Nf3 {+0.15/18 27} Cd6 {+2.11/14 27} 4. e3 {+0.05/17 20} Hac5
{+1.91/14 27} 5. Nc3 {+0.15/18 24} Hxc4 {+2.02/15 35} 6. bxc4 {+0.05/18 19}
Hc5 {+1.95/15 35} 7. d4 {+0.20/17 37} Hxc4 {+2.00/16 35} 8. Bxc4
{+0.30/16 38} Hbd5 {+1.81/16 35} 9. Ba3 {+0.25/17 37} Ce7 {+1.89/13 35} 10.
h4 {+0.05/15 1:00} Le6 {+1.82/13 34} 11. Bb3 {+0.35/15 21} Ga8
{+1.78/13 35} 12. Bb4 {+0.40/16 35} Gb7 {+1.53/13 33} 13. Bc5 {+0.55/16 30}
Hc4 {+1.71/14 35} 14. Ne2 {+0.25/19 20} Cd5 {+1.65/14 31} 15. Bxe7
{+0.10/18 22} Wxe7 {+1.93/15 36} 16. Nf4 {+0.85/19 38} Wb4+ {+2.00/15 35}
17. Kf1 {+0.90/21 35} Led6 {+2.02/15 32} 18. Nxd5 {+0.90/20 19} Lxd5
{+2.53/14 31} 19. Rc1 {+0.90/19 42} Gb5 {+2.62/14 36} 20. Kg1 {+0.75/19 37}
Wa3 {+2.62/14 37} 21. Rb1 {+0.30/18 39} Gb4 {+2.52/15 30} 22. Ne1
{+0.00/21 25} Gc3 {+2.67/15 37} 23. Rh3 {+0.30/21 40} Le4 {+3.48/15 1:01}
24. f3 {+0.70/21 40} Lxg2 {+4.72/16 34} 25. Nxg2 {+0.70/20 37} Lb4
{+1.30/15 35} 26. Ba4 {+0.40/19 33} Ha5 {+1.96/16 31} 27. Bc6 {+0.20/22 21}
Hd3 {+1.16/15 53} 28. Be4 {+0.95/21 38} Lc4 {+1.38/15 30} 29. Rh2
{+0.15/20 56} Gc2 {+2.57/17 32} 30. Qxc2 {+0.20/24 50} Wxc2 {+2.13/18 37}
31. Rc1 {+0.30/22 30} Wa3 {+2.04/17 37} 32. Rc3 {+0.00/24 41} Wb5
{+1.85/17 37} 33. Rc1 {+0.00/22 42} He2 {+1.71/16 38} 34. Nf4 {+0.30/20 29}
Hc6 {+1.51/15 38} 35. Nxe2 {+0.10/21 42} Lxe2 {+1.34/18 37} 36. Rxc6
{+0.05/24 25} Kd7 {+1.52/18 39} 37. Rc5 {+0.10/24 26} Wa6 {+1.62/17 31} 38.
Bc6 {+0.00/22 49} Kd6 {+1.65/19 38} 39. Bb5 {+0.45/25 36} Wb7 {+2.21/20 56}
40. Bxe2 {+0.95/24 1:55} Kxc5 {-320.00/20 34} 41. dxc5 {+0.35/23 30} Wxc5
{+5.50/20 34} 42. Kh1 {-1.05/23 32} Wxe3 {+7.82/21 35} 43. Bd3
{-1.60/22 34} Wf4 {+8.23/21 26} 44. Bxh7 {-3.55/21 38} Hb4 {+6.12/21 38}
45. Be4 {-4.45/23 44} He5 {+6.70/20 49} 46. Bb7 {-5.55/23 37} Hc3
{+11.44/20 10:12} 47. h5 {-11.60/28 24} Hd2 {+14.16/21 1:31} 48. Rxd2
{-12.70/26 20} Wxd2 {+14.81/21 15} 49. a4 {-14.85/23 19} Hd4 {+15.56/21 15}
50. h6 {-16.20/21 57} Hc3 {+319.92/21 15} 51. h7 {-299.80/23 30} Kg7
{+319.93/21 13} 52. f4 {-299.88/24 22} We3 {+319.94/27 16} 53. Ba6
{-299.90/31 19} Hd2 {+319.95/30 16} 54. Bf1 {-299.92/32 27} He1=W
{+319.96/32 0.5} 55. h8=Q+ {-299.94/96 23} Kxh8 {+319.97/40 0.2} 56. Kh2
{-299.96/96 19} W1f2 {+319.98/39 0.2} 57. Bb5 {-299.98/96 20} Wxf4#
{+319.99/45 20}
{Xboard adjudication: Checkmate} 0-1
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Thermopylay Marathon 2011 - Final Standings

Post by Daniel Shawul »

Thanks a lot for a very good tournament. I hope there is another one coming because this has been the only tournament I was actively following since some time. Though disappointed on the way Nebiyu lost, it has uncovered lots of things to work on. We will be back ... :!: I officially side with persians since I hate Spartacus ;) All engines have improved since the tournament started so I am in favor of updates during tournament breaks (cycles). Sjaak was a monster afterwards and so was catalyst. I would have achieved that too with a 60% speed improvement of evaluating the piece square tables incrementally rather than summing it in eval. But I had more bugs to fix so that was not in.

About Nebiyu , funny thing is the version I have now can only score 24% against 16:xx. I don't know what is happening since I don't have sources. I have recovered the piece square tables, thus the evaluation ,by feeding FENs. But even after that it can't handle it probably due to some search bug. But most of the bugs, the old ones had too so this is a mystery to me right now.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Thermopylay Marathon 2011 - Final Standings

Post by hgm »

It's a really too bad you lost the sources, as it would be very interesting to figure out what makes the difference. Yet I did not perceive the newer versions as significantly weaker against Spartacus. (Except that one that did not do the reductions.) I continued testing quite long against 16:12:16, to be sure I could compare newer Spartacus versions to the older ones, and the Spartacus that I used in the qualifier could finally score about 50% against it. Which is then also what it did against Nebiyu 1.1.

Sometimes a significant change in performance aganst a sngle engine can simply be caused by sudden preference for a strategically poor opening line. I guess I really should make some sort of GUI book for Spartan Chess, or a test set like Nunn. Perhaps I should ask Steven Streetman, who invented Spartan Chess, and plays it with humans, if he can recommend some opening lines.

Anyway, it seems a good thing to involve Nebiyu 16:12:16 in future rating tests next to Nebiyu 1.1. With a good set of opening lines, I could do a time-odds tourney where each engine appears in 3 or 4 progressively more severely handicapped version (e.g. time-odds factors 1, 3, 9, 27), and play a huge round-robin with (say) 20 games perpairing. That should produce a reasonably accurate rating list.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Thermopylay Marathon 2011 - Final Standings

Post by Daniel Shawul »

Involving the author is a good idea. He probably has done some research on opening lines already which we can use for the tournament. Also invite him to watch some of the live games and probably comment. I am sure there is many evaluation stuff that we didn't implement or got wrong. Also even looking at some human games he played is interesting i.e if he has them in pgn :)

There is something wrong with the check + evasion combo, where the newer one is significantly weaker. Because that is where older one usually gets the advantage after some stale moves. I do not generate evasions in both of them though. Only captures are tried and if there is none , then normal eval() is applied. I am going to try to diagnose the problem by setting up some unique position.
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: Thermopylay Marathon 2011 - Final Standings

Post by Evert »

Daniel Shawul wrote:Thanks a lot for a very good tournament. I hope there is another one coming because this has been the only tournament I was actively following since some time.
Hear, hear!
(Not that I tend to follow tournaments very much)
Though disappointed on the way Nebiyu lost, it has uncovered lots of things to work on. We will be back ... :!: I officially side with persians since I hate Spartacus ;) All engines have improved since the tournament started so I am in favor of updates during tournament breaks (cycles). Sjaak was a monster afterwards and so was catalyst. I would have achieved that too with a 60% speed improvement of evaluating the piece square tables incrementally rather than summing it in eval.
Apart from fixing time control bugs, one of the major speedups for Sjaak was the inclusion of lazy evaluation in revision 100: if the material score is already way outside the alpha-beta window (and we're not mating a lone king), just return the material score. It also gained some heuristics to recognise drawn material combinations.

What I found interesting is how good Fairy-Max scores against Spartacus. It could be a statistical fluke, but if it isn't it would be interesting to know what causes it.
About Nebiyu , funny thing is the version I have now can only score 24% against 16:xx. I don't know what is happening since I don't have sources. I have recovered the piece square tables, thus the evaluation ,by feeding FENs. But even after that it can't handle it probably due to some search bug. But most of the bugs, the old ones had too so this is a mystery to me right now.
Ouch. I take it you don't use some form of version control? I've found that to be a real life-saver in the past.
How many games do you use for testing? I sometimes use fairly short matches of a few 100 games, which is not really enough but the best I can do without having a dedicated computer on the side that I can use for these things.
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: Thermopylay Marathon 2011 - Final Standings

Post by Evert »

hgm wrote:Sometimes a significant change in performance aganst a sngle engine can simply be caused by sudden preference for a strategically poor opening line. I guess I really should make some sort of GUI book for Spartan Chess, or a test set like Nunn. Perhaps I should ask Steven Streetman, who invented Spartan Chess, and plays it with humans, if he can recommend some opening lines.
That would be interesting to know. I've played a few games against Sjaak (with white) where I tried to play a more-or-less normal active chess opening, and got bludgeoned to death because the Spartans were able to develop so much faster because they don't need to get their hoplites out of the way first.
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: Thermopylay Marathon 2011 - Final Standings

Post by Evert »

Daniel Shawul wrote:I am sure there is many evaluation stuff that we didn't implement or got wrong. Also even looking at some human games he played is interesting i.e if he has them in pgn :)
He has some pointers on his website for strategy that I've kept in mind while writing the evaluation function (without explicitly putting Spartan-specific stuff in it).
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Thermopylay Marathon 2011 - Final Standings

Post by hgm »

Evert wrote:What I found interesting is how good Fairy-Max scores against Spartacus. It could be a statistical fluke, but if it isn't it would be interesting to know what causes it.
Well, the point in the playoffs was totally undeserved. Spartacus was at +6.5 (which reproduces), and then at +8.5 (which reproduces, but through another move), and then it started to play ridiculous blunders, first throwing away his lethal passed Hoplite with an imagined +12 score, then sacking its General for a Rook, then giving away his Lieutenant. It looks a lot like a complete hash mixup.

Code: Select all

[Event "Thermopylae Marathon 2011"]
[Site "SCHAAKPC"]
[Date "2011.02.11"]
[Round "1.1"]
[White "Fairy-Max  4.8R"]
[Black "Spartacus 0.23"]
[Result "1-0"]
[TimeControl "40/1440"]
[Variant "spartan"]
[Number "2"]
[Annotator "1. +0.68   1... +2.18"]

1. d4 {+0.68/9 20} Hfd5 {+2.18/14 23} 2. Nc3 {+0.81/8 31} Hxd4
{+2.20/15 27} 3. Nf3 {-1.10/9 36} Hd5 {+2.08/15 50} 4. e3 {-1.33/9 39} Lc6
{+2.12/15 38} 5. Ne2 {-1.10/9 37} Lf6 {+2.21/14 25} 6. c3 {+0.95/8 22} Ce6
{+2.46/13 47} 7. cxd4 {+0.92/9 24} Ce4 {+2.44/13 31} 8. Ng3 {+0.89/9 33}
Hxd4 {+2.53/14 29} 9. Nxe4 {-1.04/10 18} Lxe4 {+2.46/15 39} 10. Nxd4
{-1.05/11 37} Cd6 {+2.54/15 32} 11. f3 {-1.09/10 35} Cxd4 {+2.69/16 23} 12.
exd4 {-1.24/11 24} Ld5 {+2.85/15 46} 13. b3 {-1.02/10 25} We6 {+2.81/15 40}
14. Be3 {+0.80/10 39} Wf5 {+2.69/15 47} 15. Qd2 {+0.90/10 39} Hg6
{+2.63/14 18} 16. O-O-O {+0.85/9 22} Wd6 {+3.40/14 56} 17. Qb2
{-1.06/10 35} Ha5 {+3.37/14 23} 18. Rd3 {+0.97/9 39} Hb4 {+3.33/14 41} 19.
Rc3 {+0.95/9 22} Lfe6 {+3.33/14 43} 20. Bf2 {-1.30/10 38} Hxb3
{+3.32/13 24} 21. axb3 {+0.90/10 49} Ga8 {+3.35/14 35} 22. Qa2
{+0.93/10 23} Wb4 {+4.30/15 19} 23. Qa1 {-1.47/10 40} Kb7 {+4.62/16 25} 24.
Kd1 {-2.04/12 28} Hb6 {+5.46/16 42} 25. Qc1 {-2.46/13 34} Ga2 {+5.69/16 49}
26. Be1 {-2.54/12 41} Wc6 {+5.60/16 22} 27. Bd2 {-2.47/10 36} Lxb3+
{+5.80/15 36} 28. Ke1 {-2.71/12 35} Wxd4 {+5.62/15 29} 29. Bd3
{-2.48/11 43} Lbd5 {+5.62/14 47} 30. Be4 {-2.59/11 23} Lxe4 {+5.97/15 34}
31. fxe4 {-2.61/11 30} Hb5 {+6.14/16 25} 32. Rf1 {-2.95/11 45} Hc4
{+6.50/17 26} 33. Rd3 {-3.26/12 24} Wc2+ {+6.54/16 18} 34. Qxc2
{-3.76/13 40} Gxc2 {+6.42/17 40} 35. Rg3 {-3.80/12 54} Kg8 {+8.45/17 24}
36. Rff3 {-3.96/12 1:10} Gb1+ {+10.14/17 18} 37. Kf2 {-4.18/11 42} Ga2
{+11.31/18 25} 38. Ke1 {-4.31/12 42} Hd3 {+12.75/16 22} 39. Rxd3
{-2.77/12 44} Gb1+ {+9.11/18 16} 40. Kf2 {-2.88/12 28} Gc2 {+8.63/16 42}
41. Rge3 {-2.57/12 25} Lc4 {+8.70/16 33} 42. Rd8 {-2.32/12 23} Hd6
{+8.82/16 1:11} 43. Kf3 {-2.11/14 35} Gxd2 {+11.69/17 26} 44. Rxg8
{-2.40/14 33} Gxe3+ {+8.26/21 27} 45. Kxe3 {+2.04/17 22} Ld4+ {+3.92/24 26}
46. Kxd4 {+5.53/14 34} Kc6 {-5.20/20 1:11} 47. Rxg7 {+6.32/13 39} Hf5
{-6.17/20 54} 48. exf5 {+14.01/14 1:01} Hbc5 {-319.91/21 30} 49. f6
{+14.39/14 19} Hb4 {-319.92/19 22} 50. f7 {+79.95/18 52} Ha3 {-319.94/19 3}
51. f8=Q {+79.96/28 31} Hb2 {-319.95/18 0.2} 52. Qe8+ {+79.97/28 20} Kb6
{-319.95/23 0.1} 53. Qb8+ {+79.98/28 6} Hc1=K {-319.96/100 25} 54. Qxb6
{+79.96/20 0.1} He5 {-319.97/100 1.2} 55. Kc3 {+79.97/28 0.1} Kd1
{-319.98/100 0.1} 56. Qe3 {+79.98/28 0.2} Hf4 {-319.99/100 0.1} 57. Qd2#
{+79.99/28}
{Xboard adjudication: Checkmate} 1-0
6k1/1k2h1h1/1h2l1h1/8/2h1P3/5RR1/g2B2PP/4K3 b - - 7 38
[d]6k1/1k2p1p1/1p2b1p1/8/2p1P3/5RR1/q2B2PP/4K3 b - - 7 38
Spartacus plays 38... Hd3? with a +12.75 score here

6R1/1k4h1/1h1h2h1/8/2l1P3/4RK2/3g2PP/8 b - - 0 44
[d]6R1/1k4p1/1p1p2p1/8/2b1P3/4RK2/3q2PP/8 b - - 0 44
And a few moves later, in the position above, it plays 44... Gxe3+?? 45. Kxe3 Ld4+??

For a moment I was afraid I had completely broken Spartacus with the last fix (although I could not imagine how). But in the other games it played quite normally.

The other full point of Fairy-Max against Spartacus was deserved. But of course that was against the version that pruned Hoplite killers, and was time-handicapped as well.

[edit] Come to think of it, it looks a bit like it thinks it will be checkmated. Like its remaining King is no longer in the piece list, and thus cannot evade the unavoidble Rook check.
Richard Allbert
Posts: 792
Joined: Wed Jul 19, 2006 9:58 am

Re: Thermopylay Marathon 2011 - Final Standings

Post by Richard Allbert »

Thanks for running the tournament!

Looking forward to the next one, Catalyst will have another jump in strength, I'm sure...

And hopefully won't finish last!!

:D
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: Thermopylay Marathon 2011 - Final Standings

Post by Evert »

hgm wrote: Well, the point in the playoffs was totally undeserved. Spartacus was at +6.5 (which reproduces), and then at +8.5 (which reproduces, but through another move), and then it started to play ridiculous blunders, first throwing away his lethal passed Hoplite with an imagined +12 score, then sacking its General for a Rook, then giving away his Lieutenant. It looks a lot like a complete hash mixup.
Yikes! :shock:
I just spent the afternoon hunting bugs in Sjaak that resulted in similar play in a few test games. The problem was that it decided that two rooks were not enough to mate a lone king: the heuristic for deciding whether a piece could give mate or not incorrectly thought that a rook couldn't mate a lone king, and the evaluation decided that having only one piece type (as opposed to two pieces) was enough reason to declare a material draw (whenever the evaluation decides that the game is a draw, it discards material and mobility scores and simply returns the sum of the piece square table results, which are close to 0).

Wonder how much elo that cost...