STS 1.0 revisited

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Thomas Mayer
Posts: 383
Joined: Thu Mar 09, 2006 5:45 pm
Location: Nellmersbach, Germany

Re: STS 1.0 revisited

Post by Thomas Mayer » Fri Jan 08, 2010 11:10 am

Hi Swami,

well, in fact I must say I am impressed about the testsuites, that's why I start to put some cpu-time on it. Here is the result of the same set of engines in STS2: (Same conditions then in initial post)

Code: Select all

Engine				      Solved	Solve-Time	CEGT-Elo	WBEC-Elo
Rybka 1.0 Beta 32-bit		84	      206		2815		?
Zappa 1.0			         70	      398		2573		?
Ruffian 1.0.5			     66	      417		2618		2620
Gandalf 5.1			       65	      443		?2600?	?2650?
Quark v2.70beta			   64	      448		?2550?	2447
Little Goliath 2000 v3.9	63	      426		?		   ?
Aristarch 4.21			    63	      444		?2550?	?2620?
Gromit 3.82		          60	      480		?		   2478
LambChop 10.99	          57	      478		?		   ?2524?
WildCat 2.79			      56	      512		?		   ?
Horizon 4.1			       54	      559		?		   ?2300?
Patzer 3.61			       53	      523		?		   ?
King of Kings 2.40		   53	      528		?2450?   ?2410?
Phalanx 22			        53	      546		?		   2392
Nimzo 2000b	             51	      544		?		   ?
Beowulf 2.2		          50	      570		?		   ?2284?
Bringer 1.9			       48	      575		?		   ?2476?
PolarEngine 1.3			   45	      593		?		   1648
GnuChess 4.14			     44	      602		?		   ?2207?
Mint v2.3			         41	      651		?		   1410
Adam 2.9			          36	      692		?		   ?2050?
Celes 0.75c			       32	      722		?		   2193
Gerbil 02			         31	      730		?		   1963
like intended the result is very different compared to the first set of positions. Of course Rybka is first, but as you said it shouldn't be used with that test because the positions where verified with Rybka therefor it's score will be always extremely well. I just added it to see how much influence that has (even when you used a different version of Rybka to verify the positions already v1.0 seems to like them a lot)

The combined results looks now like this:

Code: Select all

Engine	                  Solved	STS1	STS1+2
Rybka 1.0 Beta 32-bit	   84	      90	   174
Zappa 1.0	               70	      74	   144
Gandalf 5.1	               65	      76	   141
Aristarch 4.21	            63	      76	   139
Ruffian 1.0.5	            66	      68	   134
Little Goliath 2000 v3.9	63	      70	   133
LambChop 10.99	            57	      74	   131
Gromit 3.82	               60	      71	   131
WildCat 2.79	            56	      71	   127
Quark v2.70beta	         64	      60	   124
King of Kings 2.40	      53	      71	   124
Phalanx 22	               53	      66	   119
Nimzo 2000b	               51	      67	   118
Patzer 3.61	               53	      62	   115
GnuChess 4.14	            44	      71	   115
Horizon 4.1	               54	      59	   113
Bringer 1.9	               48	      65	   113
Beowulf 2.2	               50	      60	   110
Adam 2.9	                  36	      61	   97
Mint v2.3	               41	      51	   92
PolarEngine 1.3	         45	      35	   80
Celes 0.75c	               32	      48	   80
Gerbil 02	               31	      45	   76
Imo this is already quite close to the real relations, maybe Zappa 1.0 a bit overrated and Nimzo, especially Bringer a bit underrated. You might wonder why I use such old engines - well, in fact my engine directory is not really up to date but there is a more important reason - I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. When I remember correctly this is mainly a consequence of it's search, a lot of technics are not implemented which the others have. On the other side it has clearly more knowledge then Gerbil which has a very reduced set of basic knowledge. So in a real working STRATEGIC test suite the weakest eval should be close to last the search itself shouldn't play an as big role as it plays in tactical testsuites. Of course we still have too less data, but this looks very good, I will proceed with STS 3.

Greets, Thomas

swami
Posts: 6536
Joined: Thu Mar 09, 2006 3:21 am

Re: STS 1.0 revisited

Post by swami » Fri Jan 08, 2010 4:44 pm

I'm glad you're enjoying this, Thomas.

I found the information about Mint having some knowledge and probably more than Gerbil interesting.

So, therefore we can safely say that search contributes much towards tactics and evaluation obviously for strategy.

Looking forward to more results!
Last edited by swami on Fri Jan 08, 2010 4:47 pm, edited 1 time in total.

bob
Posts: 20687
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: STS 1.0 revisited

Post by bob » Fri Jan 08, 2010 4:47 pm

swami wrote:I'm glad you're enjoying this, Thomas.

I found the information about Mint having some knowledge and probably more than Gerbil interesting. So, therefore we can safely say that search contributes much towards tactics and evaluation obviously for strategy.
That is "news"? :)

Sounds more like something covered in computer-chess 101. :)

swami
Posts: 6536
Joined: Thu Mar 09, 2006 3:21 am

Re: STS 1.0 revisited

Post by swami » Fri Jan 08, 2010 4:50 pm

bob wrote:
swami wrote:I'm glad you're enjoying this, Thomas.

I found the information about Mint having some knowledge and probably more than Gerbil interesting. So, therefore we can safely say that search contributes much towards tactics and evaluation obviously for strategy.
That is "news"? :)

Sounds more like something covered in computer-chess 101. :)
Yeah but you will have to excuse me, since I obviously have no idea about the programming involved in designing chess engines. :wink:

But the point is, STS actually _proved_ Mint having more knowledge than Gerbil. If it doesn't yet, then there's need for more suites I'd think.
Last edited by swami on Fri Jan 08, 2010 4:55 pm, edited 1 time in total.

User avatar
Guenther
Posts: 3260
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: STS 1.0 revisited

Post by Guenther » Fri Jan 08, 2010 4:53 pm

Thomas Mayer wrote: ...I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. ...
Greets, Thomas
Well, I have to disagree on this ;-) Polar is weaker than Mint in real games too.

Guenther

swami
Posts: 6536
Joined: Thu Mar 09, 2006 3:21 am

Re: STS 1.0 revisited

Post by swami » Fri Jan 08, 2010 5:02 pm

Guenther wrote:
Thomas Mayer wrote: ...I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. ...
Greets, Thomas
Well, I have to disagree on this ;-) Polar is weaker than Mint in real games too.

Guenther
PolarChess 1.3 = 6.5 points
Mint 2.3 = 6 points

Polarchess 1.3 rated (1676)
Mint 2.3 rated (1574)

Source: Chesswar tournament currently underway

http://www.open-aurec.com/chesswar/Ches ... 015P00.htm

User avatar
Guenther
Posts: 3260
Joined: Wed Oct 01, 2008 4:33 am
Location: Regensburg, Germany
Full name: Guenther Simon
Contact:

Re: STS 1.0 revisited

Post by Guenther » Fri Jan 08, 2010 9:30 pm

swami wrote:
Guenther wrote:
Thomas Mayer wrote: ...I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. ...
Greets, Thomas
Well, I have to disagree on this ;-) Polar is weaker than Mint in real games too.

Guenther
PolarChess 1.3 = 6.5 points
Mint 2.3 = 6 points

Polarchess 1.3 rated (1676)
Mint 2.3 rated (1574)

Source: Chesswar tournament currently underway

http://www.open-aurec.com/chesswar/Ches ... 015P00.htm
Lol that are much too less games, but you know this.
Just look up a rating list with much more games for both.

Guenther

Dann Corbit
Posts: 10267
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: STS 1.0 revisited

Post by Dann Corbit » Sat Jan 09, 2010 1:53 am

Guenther wrote:
swami wrote:
Guenther wrote:
Thomas Mayer wrote: ...I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. ...
Greets, Thomas
Well, I have to disagree on this ;-) Polar is weaker than Mint in real games too.

Guenther
PolarChess 1.3 = 6.5 points
Mint 2.3 = 6 points

Polarchess 1.3 rated (1676)
Mint 2.3 rated (1574)

Source: Chesswar tournament currently underway

http://www.open-aurec.com/chesswar/Ches ... 015P00.htm
Lol that are much too less games, but you know this.
Just look up a rating list with much more games for both.

Guenther
For all intents and purposes they are the same strength. From my database of 12 million chess games, produced via the query:
SELECT * FROM chess.dbo.Jb where Games >= 1000 AND (name like 'Mint%' OR name like 'Polar%') order by Elo desc
We have this:
  • ID,Name,Elo,Plus,Minus,Games,PointsPercent,OpponentEloAverage,DrawPercentage
    537675,Mint 2.3,1868,20,20,1160,46%,1912,11%
    538855,PolarChess 1.3,1865,22,22,1147,30%,2082,10%

Dann Corbit
Posts: 10267
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: STS 1.0 revisited

Post by Dann Corbit » Sat Jan 09, 2010 2:00 am

Dann Corbit wrote:
Guenther wrote:
swami wrote:
Guenther wrote:
Thomas Mayer wrote: ...I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. ...
Greets, Thomas
Well, I have to disagree on this ;-) Polar is weaker than Mint in real games too.

Guenther
PolarChess 1.3 = 6.5 points
Mint 2.3 = 6 points

Polarchess 1.3 rated (1676)
Mint 2.3 rated (1574)

Source: Chesswar tournament currently underway

http://www.open-aurec.com/chesswar/Ches ... 015P00.htm
Lol that are much too less games, but you know this.
Just look up a rating list with much more games for both.

Guenther
For all intents and purposes they are the same strength. From my database of 12 million chess games, produced via the query:
SELECT * FROM chess.dbo.Jb where Games >= 1000 AND (name like 'Mint%' OR name like 'Polar%') order by Elo desc
We have this:
  • ID,Name,Elo,Plus,Minus,Games,PointsPercent,OpponentEloAverage,DrawPercentage
    537675,Mint 2.3,1868,20,20,1160,46%,1912,11%
    538855,PolarChess 1.3,1865,22,22,1147,30%,2082,10%
Here is the output of a less restrictive query
SELECT * FROM chess.dbo.Jb where Games >= 10 AND (name like 'Mint%' OR name like 'Polar%') order by Elo desc
  • ID,Name,Elo,Plus,Minus,Games,PointsPercent,OpponentEloAverage,DrawPercentage
    266077,"Polard, Gaetan",2219,81,81,49,40%,2290,43%
    338630,"Mintel, Milan",2146,166,166,10,45%,2161,30%
    462691,Mint v2.3,2004,27,27,664,45%,2067,13%
    468005,Mintstorm,1997,209,209,11,64%,1896,0%
    474054,Mint,1988,111,111,41,27%,2207,10%
    499605,Mint(C),1947,48,48,272,46%,2031,0%
    500158,PolarEngine13,1946,94,94,45,48%,1977,16%
    501832,Mint v2.1,1943,60,60,139,34%,2108,12%
    505394,Mint_23,1937,50,50,168,41%,2021,20%
    505560,Mint23,1937,77,77,65,38%,2033,17%
    505802,PolarChess,1936,107,107,52,22%,2205,6%
    509946,polarchess 1.2,1928,120,120,34,29%,2140,6%
    512833,PolarEngine 1.2,1923,51,51,229,39%,2043,11%
    521352,PolarEngine 1.3,1906,31,31,510,42%,2000,5%
    522990,Polar Chess 1.2,1903,55,55,151,48%,1924,8%
    525125,PolarEngine 1.3 Wb2Uci,1898,174,174,18,17%,2184,11%
    526898,Mint 2.3 W2U,1894,74,74,76,46%,1953,16%
    527106,Polar Chess 1.3,1894,37,37,360,40%,2000,13%
    528060,PolarChess 1.1,1891,202,202,16,81%,1556,0%
    531295,Mint161,1884,53,53,241,21%,2195,9%
    532737,Polar 1.3,1881,111,111,35,46%,1927,6%
    536549,PolarEngine_13,1871,98,98,38,45%,1904,16%
    537675,Mint 2.3,1868,20,20,1160,46%,1912,11%
    538855,PolarChess 1.3,1865,22,22,1147,30%,2082,10%
    539983,Mint 1.61,1862,98,98,37,32%,1981,27%
    545086,PolarChess 1.2,1849,81,81,94,31%,2073,4%
    546346,Polar Chess 1.2b3,1845,112,112,35,44%,1877,14%
    548225,PolarEngine12,1840,147,147,18,53%,1820,17%
    550074,PolarChess engine 1.0 Beta (N,1834,95,95,81,21%,2169,7%
    552479,Mint 2,1827,38,38,922,9%,2357,0%
    557799,PolarChess 1.0,1809,94,94,80,38%,2007,0%
    559099,Mint 1,1805,180,180,21,14%,2155,0%
    560041,Mint_174,1801,100,100,39,40%,1892,13%
    563965,Mint 2.1,1786,147,147,16,28%,1926,19%
    564926,Mint_161,1783,106,106,32,28%,1936,6%
    566940,PolarChess 1.3 W2U,1774,76,76,71,38%,1881,14%
    568112,Mint174,1770,82,82,95,27%,2061,16%
    578291,Mint 1.74,1720,101,101,29,28%,1875,48%

swami
Posts: 6536
Joined: Thu Mar 09, 2006 3:21 am

Re: STS 1.0 revisited

Post by swami » Sat Jan 09, 2010 2:59 am

Guenther wrote:
swami wrote:
Guenther wrote:
Thomas Mayer wrote: ...I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. ...
Greets, Thomas
Well, I have to disagree on this ;-) Polar is weaker than Mint in real games too.

Guenther
PolarChess 1.3 = 6.5 points
Mint 2.3 = 6 points

Polarchess 1.3 rated (1676)
Mint 2.3 rated (1574)

Source: Chesswar tournament currently underway

http://www.open-aurec.com/chesswar/Ches ... 015P00.htm
Lol that are much too less games, but you know this.
Just look up a rating list with much more games for both.

Guenther
I did a google search, and only two responses to the query returned:

http://kirill-kryukov.com/chess/discuss ... ?f=7&t=495

Walter's COMP2006 (games only >30min/engine) list:

Code: Select all

336 PolarChess 1.3                 : 1994   27  27   541    41.4 %   2054   15.2 %
337 DChess 1.02                    : 1980   28  29   533    37.0 %   2073   13.1 %
338 Mint 2.3                       : 1960   25  25   633    40.4 %   2027   15.8 %
541 games for Polarchess 1.3 and 633 games for Mint 2.3.

They are only 34 elo apart but one could say that they are nearly equal but polarchess has a bit edge.

and the other result that turned up in google is olivier's tournament.

Post Reply