STS 1.0 revisited

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Guenther
Posts: 3260
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: STS 1.0 revisited

Post by Guenther » Sat Jan 09, 2010 9:08 am

swami wrote:
Guenther wrote:
swami wrote:
Guenther wrote:
Thomas Mayer wrote: ...I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. ...
Greets, Thomas
Well, I have to disagree on this ;-) Polar is weaker than Mint in real games too.

Guenther
PolarChess 1.3 = 6.5 points
Mint 2.3 = 6 points

Polarchess 1.3 rated (1676)
Mint 2.3 rated (1574)

Source: Chesswar tournament currently underway

http://www.open-aurec.com/chesswar/Ches ... 015P00.htm
Lol that are much too less games, but you know this.
Just look up a rating list with much more games for both.

Guenther
I did a google search, and only two responses to the query returned:

http://kirill-kryukov.com/chess/discuss ... ?f=7&t=495

Walter's COMP2006 (games only >30min/engine) list:

Code: Select all

336 PolarChess 1.3                 : 1994   27  27   541    41.4 %   2054   15.2 %
337 DChess 1.02                    : 1980   28  29   533    37.0 %   2073   13.1 %
338 Mint 2.3                       : 1960   25  25   633    40.4 %   2027   15.8 %
541 games for Polarchess 1.3 and 633 games for Mint 2.3.

They are only 34 elo apart but one could say that they are nearly equal but polarchess has a bit edge.

and the other result that turned up in google is olivier's tournament.
Well the COMP base mixes anything without checking the games themselves and the conditions...
George Lyapko has Mint much better than Polar in a list with hundreds
of games and here in RWBC it is the same.
Currently I have running a match between both at RWBC rapid conditions
for my rating list and no surprise Mint is leading 11.5 : 2.5 ;-)

Anyhow thanks for your first post, because this let me notice that
something is very wrong with Mint in Oliviers tournament and we will
need to explore what it is. I saw it making unbelievable blunders
(no buggy move could be verified with my Mint!) and not winning from
dead won games. All things which never happened here.
A first rough idea could be that Mint has a remarkable ponder
bug and it leads to much weaker play(buggy moves) than with
ponder off, it could also be that the Mint at Oliviers is corrupted,
but all this needs to be checked. (At least it seems Mint plays weaker
at higher time controls, but of course this doesn't explain the
extraordinary blunder moves I have seen in Oliviers games.)

RWBC: 40/15 Ponder off (copied from a csv list)

Code: Select all

937;Mint;_23;1773;59;58;125;54;1747;15
938;Belzebub;_065;1770;84;82;65;60;1683;15
939;Bubble;_132-120609;1770;110;119;32;30;1956;16
940;Ifrit;_b2-9;1769;113;130;34;19;2023;21
941;Ranita;_24;1769;75;76;74;44;1823;12
942;Alex;_15209;1766;135;129;21;55;1740;43
943;Embracer;_112nb;1765;65;65;88;44;1817;22
944;Ranita;_23;1764;131;129;23;50;1780;22
945;Plisk;_011;1761;112;130;32;20;2000;16
946;Heracles;_049;1761;97;101;40;38;1853;25
947;Atak;_40E;1761;80;83;64;39;1840;13
948;Grizzly;_1401b;1761;59;61;119;38;1859;12
949;Pulsar;_201f;1759;72;72;72;49;1781;26
950;JARS;_1726;1759;126;124;26;56;1714;19
951;Adam;_16;1757;104;102;41;52;1745;12
952;Ajetac;_260;1756;109;108;31;50;1771;23
953;BikJump;_13;1752;80;82;56;44;1800;27
954;WJChess;_143/144;1747;132;132;21;50;1750;24
955;Lodocase;_300607;1747;79;78;80;58;1657;13
956;Zoidberg;_01;1744;83;83;52;51;1737;29
957;Booot;_22;1742;134;133;22;48;1782;14
958;JARS;_175a;1740;71;76;88;26;1943;16
959;Kanguruh;_190;1740;107;113;32;38;1788;25
960;HoiChess;_080;1737;122;126;30;42;1813;3
961;Alex;_14010;1737;140;141;22;45;1772;9
962;Belzebub;_06;1737;132;135;20;45;1769;20
963;SilkeChess;_121209;1736;104;105;31;48;1745;26
964;Plisk;_008;1729;88;85;56;63;1614;16
965;Sungorus;_101JA;1728;105;116;32;22;1956;31
966;Ajetac;_264;1725;110;117;31;34;1857;16
967;Hopeless;_039;1725;141;131;34;71;1465;0
968;Robin;_0983;1723;90;90;52;50;1723;19
969;NewRival;_1829;1723;60;59;117;54;1683;20
970;Alfil;_507;1719;113;123;32;34;1844;6
971;TSCP;_181NetDC;1718;63;64;99;42;1785;18
972;Hoplite;_211DC;1718;74;71;84;61;1610;15
973;Gaia;_11;1715;139;137;23;57;1670;17
974;Cecir;_30;1713;86;88;47;45;1751;21
975;ChessterfieldCL;_i5a;1711;79;82;63;33;1848;19
976;Myrddin;_081a2;1710;96;93;51;66;1549;14
977;Kanguruh;_188;1710;85;85;53;51;1696;23
978;AlChess;_15b;1708;121;129;30;38;1815;10
979;ZZZZZZ;_349JA-PG;1708;137;138;32;56;1561;13
980;SEE;_067/069;1704;94;95;41;45;1759;22
981;Uragano;_041;1703;137;138;21;45;1746;14
982;Reger;_005;1702;101;95;48;67;1533;8
983;Dabbaba;_185JA;1700;137;131;25;66;1538;12
984;JSBam;_052;1698;79;82;77;40;1770;6
985;Neurosis;_13b/13c;1698;137;136;22;50;1709;18
986;HoiChess;_034;1697;119;110;33;68;1529;9
987;Amundsen;_0551gccJA;1697;132;125;32;66;1546;13
988;BrutusChess;_205;1696;96;99;41;44;1719;24
989;Frank-Walter;_105JAnb;1695;113;116;30;45;1733;17
990;Beaches;_226;1693;84;85;60;50;1686;17
991;Aldebaran;_070;1690;71;74;78;41;1723;31
992;Rocinante;_101JA;1687;109;118;34;32;1830;12
993;IQ23;_003;1686;91;92;41;43;1742;37
994;ApiChess;_134;1684;144;137;20;55;1672;20
995;HoiChess;_060618-RWBC;1683;113;113;37;53;1637;19
996;Golem;_0441;1682;67;66;108;58;1621;12
997;Gargamella;_050;1682;77;80;64;39;1769;16
998;Chenard;_20090502JA;1678;85;87;52;50;1638;23
999;ApiChess;_138;1678;122;124;21;45;1728;33
1000;WJChess;_152;1678;129;129;20;53;1654;35
1001;Crux;_50j;1677;130;145;22;30;1841;23
1002;Storm;_06;1675;80;80;63;48;1702;16
1003;Mizar;_30;1673;50;52;205;31;1827;13
1004;Chenard;_20081205JA;1669;143;199;30;8;2079;3
1005;Enigma;_113/114;1668;85;87;50;43;1723;18
1006;Adamant;_11;1666;141;137;20;55;1630;20
1007;Awesome;_173;1666;83;80;67;67;1499;15
1008;Aice;_064;1665;141;141;20;50;1671;10
1009;Vicki;_0035a;1664;79;83;66;36;1776;17
1010;Sissa;_0120;1661;114;131;37;18;1936;19
1011;Damas;_8;1658;107;117;34;28;1832;21
1012;Adamant;_17;1657;99;103;41;40;1734;17
1013;ChessRikus;_1463/1465;1655;72;75;81;39;1740;16
1014;BremboCE;_031;1651;79;80;68;46;1675;19
1015;Kanguruh;_1712;1649;145;141;20;53;1650;15
1016;PolarEngine;_13;1648;68;69;88;47;1668;14
George Lyapkos contintuous Blitz tourneys: 40/5 PO

Code: Select all

-15    276 Mint 2.3                       : 1900   86  76    72    43.1 %   1948    8.3 %
+1     277 Adamant 1.0                    : 1899   79  73    75    52.7 %   1881   14.7 %
+11    278 Damas 4                        : 1897   81  74    72    52.1 %   1883   15.3 %
-3     279 Jester 0.83                    : 1892   76  83    71    48.6 %   1902   12.7 %
+2     280 Noonian 3.0                    : 1891   79  73    75    42.0 %   1947   14.7 %
+4     281 DrunkenMaster 0.4              : 1890   72  77    75    46.0 %   1918   17.3 %
-2     282 BigLion 0.9f                   : 1889   83  76    74    44.6 %   1927    8.1 %
-5     283 Robin 0.96 beta                : 1887   74  85    72    58.3 %   1828   11.1 %
-3     284 Adam 1.6                       : 1885   71  80    72    46.5 %   1910   20.8 %
+6     285 DChess 1.0.2                   : 1881   81  72    72    39.6 %   1955   18.1 %
+1     286 KasparovX 1.0                  : 1880   70  91    72    62.5 %   1792   11.1 %
+1     287 BremboCE 0.4                   : 1879   66  83    72    49.3 %   1884   23.6 %
-2     288 Simontacchi 1.4-Decepticon     : 1871   87  71    73    39.0 %   1948   12.3 %
+4     289 PolarEngine 1.3                : 1867   81  77    72    44.4 %   1906   11.1 %
Infinite Loop: 5+3 PO

Code: Select all

184(-6)	Mint	1776(-7)	 15.9	524	 253.5
185(-19)	Silke	1774(-39)	62.8	36	15.5
186(-14)	BSC	1774(-23)	30.5	140	60.5
187(-22)	JSBam	1774(-46)	116.5	12	4.5
188(+14)	JanWillem	1765(+71)	81.6	24	13.0
189(-5)	FAUCE	1756(+1)	28.1	168	76.0
190(-3)	Cecir	1754(+13)	44.6	72	33.5
191(-3)	Embracer	1752(+16)	24.4	220	94.5
192(-10)	Alfil	1748(-21)	62.0	40	13.5
193(+3)	Crux	1747(+35)	14.1	668	337.5
194(+7)	Ayito	1745(+51)	66.4	32	16.0
195(-20)	Surprise	1740(-48)	72.1	28	12.5
196(-11)	Tristram	1733(-18)	14.3	668	351.0
197(-7)	Simontacchi	1727(+8)	41.9	76	39.5
198(-12)	DChess	1726(-16)	14.4	668	337.0
199(-1)	smash	1725(+19)	53.3	48	28.0
200(-6)	PolarChess	1722(+8)	23.6	232	99.5 
Guenther

swami
Posts: 6536
Joined: Thu Mar 09, 2006 3:21 am

Re: STS 1.0 revisited

Post by swami » Sat Jan 09, 2010 3:42 pm

Yes, Mint does seem better than Polar Chess. I just ran both the engines with the test suite. Mint Scored 43.9% and PolarChess scored 40.5%.

I should now use more hash for Polarchess but I doubt it would contribute that much better as far as to overtaking Mint. I forgot to change the hash size for Polarchess, I left it as default, ran the test and was away for too long.

swami
Posts: 6536
Joined: Thu Mar 09, 2006 3:21 am

Re: STS 1.0 revisited

Post by swami » Sat Jan 09, 2010 3:43 pm

Mint 2.3

Strategic Test Suite Conditions:
Arena GUI
10 seconds per position
800 positions
Core2Quad 32 bits, Q6600, 2 GB RAM, 2.4GHZ
Engine uses 129 Mb Hash.
Subject-wise Scores:
STS (v1.0) - Undermining:
47/100, Grade: D

STS (v2.1) - Open Files and Diagonals:
43/100, Grade: E

STS (v3.0) - Knight Outposts/Centralization/Repositioning:
46/100, Grade: D

STS (v4.0) - Square Vacancy:
44/100, Grade: E

STS (v5.0) - Bishop vs Knight:
59/100, Grade: C+

STS (v6.0) - Re-Capturing:
57/100, Grade: C+

STS (v7.0) - Offer of Simplification:
31/100, Grade: ?

STS (v8.1) - Advancement of f/g/h Pawns:
22/100, Grade: ?
Overall Performance:
Total Score: 349/800

Overall Average: 43.625 %

Grade: E
Regards,
Swami

swami
Posts: 6536
Joined: Thu Mar 09, 2006 3:21 am

Re: STS 1.0 revisited

Post by swami » Sat Jan 09, 2010 3:44 pm

Polar Chess 1.3

Strategic Test Suite Conditions:
Arena GUI
10 seconds per position
800 positions
Core2Quad 32 bits, Q6600, 2 GB RAM, 2.4GHZ
Engine uses 7 Mb Hash.
Subject-wise Scores:
STS (v1.0) - Undermining:
39/100, Grade: ?

STS (v2.1) - Open Files and Diagonals:
41/100, Grade: E

STS (v3.0) - Knight Outposts/Centralization/Repositioning:
39/100, Grade: ?

STS (v4.0) - Square Vacancy:
44/100, Grade: E

STS (v5.0) - Bishop vs Knight:
52/100, Grade: C

STS (v6.0) - Re-Capturing:
55/100, Grade: C+

STS (v7.0) - Offer of Simplification:
37/100, Grade: ?

STS (v8.1) - Advancement of f/g/h Pawns:
17/100, Grade: ?
Overall Performance:
Total Score: 324/800

Overall Average: 40.50 %

Grade: E
Regards,
Swami

User avatar
Guenther
Posts: 3260
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: STS 1.0 revisited

Post by Guenther » Mon Jan 11, 2010 12:47 pm

Hi again to all following this thread.

Meanwhile I contacted Olivier and did some own tests and some research
in old databases from Leo. Now I know what happens, but not yet why.
It is clear now that Mint is several hundreds of rating points weaker,
if playing with ponder on! This way it produces incredible blunders from
time to time(actually not to rarely!) and also sometimes is not able
to win from huge advantage, when it just draws by repetition instead!
I could verify this with my own eyes looking at ponder games and the
according debug files.
Again I want to confirm that all this does not happen in ponder off games.
I guess I will post some examples at the WB forum, so that this
knowledge will be saved for history.(I am really wondering that
I did not find any post about this issue in the past archives?)

As I said it is still unknown why this happens, most probably a bug
in Mints ponder search related to hashtables, because I saw the blunders
in ponder on games already in games from 2001 and this should exclude
compiling issues with newer hardware etc.

Guenther

Post Reply