Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Heinz Van Kempen

Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Heinz Van Kempen »

Hi all :-) ,

10 days ago Vincent Lejeune posted his Hard Test Suite 2008 here:

http://www.talkchess.com/forum/viewtopi ... 62&t=22402

This includes more than 80 different positions to solve including tactical motives like King´s Attack, positional decisions, Fortresses, Endgames and so on.

I did a test on Core 2 Quad E6600 (5 men) - 4 GB RAM giving a maximum of 600 seconds for each position to each engine using 1024 MB of RAM.

Not surprisingly for us there is no correlation between good performance in position solving and how engines perform in tournaments and matches. But it might be interesting for those doing a lot a analyses. Here are the results for some engines sorted from best to worse:

Deep Fritz 10.1
Result: 55 out of 81 = 67.9%. Average time = 80.20s / 17.43

Zappa Mexico II
Result: 51 out of 81 = 62.9%. Average time = 78.91s / 15.37

Bright 0.3d
Result: 49 out of 81 = 60.4%. Average time = 64.52s / 16.85

Hiarcs 12
Result: 47 out of 81 = 58.0%. Average time = 75.42s / 17.14

Rybka 2.3.2a x64
Result: 44 out of 81 = 54.3%. Average time = 63.74s / 19.27

Naum3.1 x64
Result: 40 out of 81 = 49.3%. Average time = 88.80s / 18.10

Glaurung 2.1 x64
Result: 36 out of 81 = 44.4%. Average time = 117.82s / 17.08

Fruit 2.4 Beta A
Result: 34 out of 81 = 41.9%. Average time = 124s / 15.70

Loop M1P
Result: 33 out of 81 = 40.7%. Average time = 121.05s / 16.03

Deep Shredder 11 x64
Result: 28 out of 81 = 34.5%. Average time = 120.84s / 14.46

More details here (scroll down for individual results on each position):

http://cegt.foren-city.de/topic,62,-har ... sults.html
Marc MP

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Marc MP »

Heinz Van Kempen wrote:Hi all :-) ,

...

Not surprisingly for us there is no correlation between good performance in position solving and how engines perform in tournaments and matches. But it might be interesting for those doing a lot a analyses.
...
Hi Heinz,

There is surely a correlation between test solving and performance in engine matches. But it is not as strong as we like, and we can't rely on it as a predictor among engines that are close by ratings. Do the test with a 2300 rated engine on your list (or 2400 even 2500). I'd bet it will finish at the bottom of what is mentionned here.

Beside: Thank you for doing the tests! Always interesting. Keep us updated with Rybka 3 or course.
Heinz Van Kempen

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Heinz Van Kempen »

Code: Select all

Beside: Thank you for doing the tests! Always interesting. Keep us updated with Rybka 3 or course.
Hi Marc :) ,

yes, of course I am intending to give an update with Rybka 3 default, human and dynamic.
jdart
Posts: 4361
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by jdart »

Has anyone run these in multi-variation mode at very long time controls to see if the solutions are valid/unique? I tried a few of them but haven't done this comprehensively.

--Jon
Marc MP

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Marc MP »

jdart wrote:Has anyone run these in multi-variation mode at very long time controls to see if the solutions are valid/unique? I tried a few of them but haven't done this comprehensively.

--Jon
I think many of them are "speculative" in the sense that they are long term sacrifices for "positional" compensation. For example:

[d]r1bq1rk1/1p3ppp/p1pp2n1/3N3Q/B1PPR2b/8/PP3PPP/R1B3K1 w - - 0 1 bm Rxh4

[d]r1b3k1/pp1n3p/2pbpq1r/3p4/2PPp1p1/PP2P1P1/1BQN1P1P/3RRBK1 b - - 0 1 bm Rxh2

Note: I'm not saying that the solutions are wrong. Only that I doubt we can verify them with computer analysis for the time being.
Heinz Van Kempen

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Heinz Van Kempen »

Hi all :) ,

while observing I noticed that some engine sometimes gave alternatives with also big advantage.

Surely the set can be improved and maybe Vincent will do this.
Marc MP

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Marc MP »

Heinz Van Kempen wrote:Hi all :) ,

while observing I noticed that some engine sometimes gave alternatives with also big advantage.

Surely the set can be improved and maybe Vincent will do this.
Hi again Heinz :) ,

To Vincent's credit, it is very difficult to gather and analyze "positional" sacrifices. Sometimes the GM say it is good, but the comps aren't able to see it yet. Some other times, the GM say it is good and the GM is wrong!

Moreover, sometimes the sacrifice can be delayed (to increase its effectiveness) by preparatory moves. That makes the challenge of finding "best moves" even thougher, because you need positions where there are no "alternative" or "preparatory" moves.
Heinz Van Kempen

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Heinz Van Kempen »

Hi Marc :-),

sure, I am aware of this. The collection of such a suite giving also some weights to all aspects of chess includes an enourmous amount of work and care. Engines will also detect and had detected a lot of "holes" in tactical masterpiece games from the past and will find better moves or at least better defences even in the best books dealing with tactical shots.

And additionally like Jon proposed any position had to be checked over hours with some very good engines and long time controls and also the alternatives. As evaluations of programs and Super GM´s on certain positions still might differ. My feeling is that a solution might be considered as valid or unique if there is really a big difference in evaluation between the best move and the next best alternative after really long calculations and going deeper into the possible lines like correspondence players usually do for hours or days.
Marc MP

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by Marc MP »

Heinz Van Kempen wrote:Hi Marc :-),

sure, I am aware of this. The collection of such a suite giving also some weights to all aspects of chess includes an enourmous amount of work and care. Engines will also detect and had detected a lot of "holes" in tactical masterpiece games from the past and will find better moves or at least better defences even in the best books dealing with tactical shots.

And additionally like Jon proposed any position had to be checked over hours with some very good engines and long time controls and also the alternatives. As evaluations of programs and Super GM´s on certain positions still might differ. My feeling is that a solution might be considered as valid or unique if there is really a big difference in evaluation between the best move and the next best alternative after really long calculations and going deeper into the possible lines like correspondence players usually do for hours or days.
Hi Heinz :) ,

That is well thought statement I believe!

Have a good day,
jdart
Posts: 4361
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Hard Test Suite 2008 (Vincent Lejeune) CEGT results

Post by jdart »

for the record I have validated these (they are not among the most difficult):

8/6p1/P1b1pp2/2p1p3/1k4P1/3PP3/1PK5/5B2 w - -

bm Bg2


and

rq3rk1/1b1n1ppp/ppn1p3/3pP3/5B2/2NBP2P/PP2QPP1/2RR2K1 w - -

bm Nxd5

--Jon