Re: STS 1.0 revisited
Posted: Fri Jan 08, 2010 12:10 pm
Hi Swami,
well, in fact I must say I am impressed about the testsuites, that's why I start to put some cpu-time on it. Here is the result of the same set of engines in STS2: (Same conditions then in initial post)
like intended the result is very different compared to the first set of positions. Of course Rybka is first, but as you said it shouldn't be used with that test because the positions where verified with Rybka therefor it's score will be always extremely well. I just added it to see how much influence that has (even when you used a different version of Rybka to verify the positions already v1.0 seems to like them a lot)
The combined results looks now like this:
Imo this is already quite close to the real relations, maybe Zappa 1.0 a bit overrated and Nimzo, especially Bringer a bit underrated. You might wonder why I use such old engines - well, in fact my engine directory is not really up to date but there is a more important reason - I know quite a lot about most of these engines. E.g. Mint is also overrated, it's clearly the weakest engine by a big margin. When I remember correctly this is mainly a consequence of it's search, a lot of technics are not implemented which the others have. On the other side it has clearly more knowledge then Gerbil which has a very reduced set of basic knowledge. So in a real working STRATEGIC test suite the weakest eval should be close to last the search itself shouldn't play an as big role as it plays in tactical testsuites. Of course we still have too less data, but this looks very good, I will proceed with STS 3.
Greets, Thomas
well, in fact I must say I am impressed about the testsuites, that's why I start to put some cpu-time on it. Here is the result of the same set of engines in STS2: (Same conditions then in initial post)
Code: Select all
Engine Solved Solve-Time CEGT-Elo WBEC-Elo
Rybka 1.0 Beta 32-bit 84 206 2815 ?
Zappa 1.0 70 398 2573 ?
Ruffian 1.0.5 66 417 2618 2620
Gandalf 5.1 65 443 ?2600? ?2650?
Quark v2.70beta 64 448 ?2550? 2447
Little Goliath 2000 v3.9 63 426 ? ?
Aristarch 4.21 63 444 ?2550? ?2620?
Gromit 3.82 60 480 ? 2478
LambChop 10.99 57 478 ? ?2524?
WildCat 2.79 56 512 ? ?
Horizon 4.1 54 559 ? ?2300?
Patzer 3.61 53 523 ? ?
King of Kings 2.40 53 528 ?2450? ?2410?
Phalanx 22 53 546 ? 2392
Nimzo 2000b 51 544 ? ?
Beowulf 2.2 50 570 ? ?2284?
Bringer 1.9 48 575 ? ?2476?
PolarEngine 1.3 45 593 ? 1648
GnuChess 4.14 44 602 ? ?2207?
Mint v2.3 41 651 ? 1410
Adam 2.9 36 692 ? ?2050?
Celes 0.75c 32 722 ? 2193
Gerbil 02 31 730 ? 1963
The combined results looks now like this:
Code: Select all
Engine Solved STS1 STS1+2
Rybka 1.0 Beta 32-bit 84 90 174
Zappa 1.0 70 74 144
Gandalf 5.1 65 76 141
Aristarch 4.21 63 76 139
Ruffian 1.0.5 66 68 134
Little Goliath 2000 v3.9 63 70 133
LambChop 10.99 57 74 131
Gromit 3.82 60 71 131
WildCat 2.79 56 71 127
Quark v2.70beta 64 60 124
King of Kings 2.40 53 71 124
Phalanx 22 53 66 119
Nimzo 2000b 51 67 118
Patzer 3.61 53 62 115
GnuChess 4.14 44 71 115
Horizon 4.1 54 59 113
Bringer 1.9 48 65 113
Beowulf 2.2 50 60 110
Adam 2.9 36 61 97
Mint v2.3 41 51 92
PolarEngine 1.3 45 35 80
Celes 0.75c 32 48 80
Gerbil 02 31 45 76
Greets, Thomas