Middlegame positional test-suite

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Middlegame positional test-suite

Post by Laskos »

Just like in the case with opening positional test-suite described here:
http://www.talkchess.com/forum3/viewtop ... =2&t=61858

I built, based on large databases of human games a positional test-suite for middlegame phase of the game, roughly moves 15-22. The statistics of human games on each position is weaker than in openings, so it was harder to have some confidence in chosen unique solutions. I used engines mostly for checking to not have tactical complications in positions.

I have 250 middlegame positions in this positional test-suite. I also combined 4 of them in a suite of 1000 positions, for the statistical significance of the result. Results can vary quite a bit on just one run of the test-suite (especially on many cores), so 4 runs are better to have a more precise result. I uploaded here the positional Middlegames250 test-suite, combined 4 Middlegames1000 test-suite, the old Openings200 test-suite and combined 5 Openings1000 test-suite. They are the most faithfully positional test suites I am aware of, and are my creation :lol:.

The results using positional Middlegames1000 suite:
AB engines on 4 strong i7 cores.
Lc0 on RTX 2070 GPU
Positions found in time interval from 1 to 2 seconds per position using Polyglot EPD testing features.

Code: Select all

Midgames1000:

Lc0 v21.1 ID41844  737/1000
Lc0 v21.1 ID32930  730/1000
Houdini 6.03       655/1000
Houdini 6.03Tactic 635/1000
Komodo 12.3        609/1000
Stockfish_dev      584/1000
Booot 6.3.1        584/1000
Texel 1.07         561/1000
Andscacs 0.95      555/1000
Ethereal 11.25     548/1000
Fire 7.1           545/1000
Fruit 2.1          398/1000
We see Leela performing better on this positional test suite than regular engines. We see also a surprise, that SF_dev, unlike openings, is not the strongest positionally in the middlegames. It seems Houdini is strong here, and to check that my suite is not some artifact of tactics, I included Houdini Tactical too, and it performs worse than the regular Houdini. Also, for sanity check I included the simple eval Fruit 2.1, and it performs pretty badly. Comparing to the results on positional opening test suite with pretty much same engines:

Code: Select all

Openings1000:

Lc0 v21.1 ID41844  762/1000
Lc0 v21.1 ID32930  727/1000
Stockfish_dev      574/1000
Houdini 6.03       558/1000
Komodo 12.3        556/1000
Booot 6.3.1        494/1000
Andscacs 0.95      484/1000
Ethereal 11.00     457/1000
Fire 7.1           431/1000
Texel 1.07         419/1000
We can see that Leela, although, still by far the best, is not that distanced positionally in midgames as it is in the openings. And it is normal, as these are fairly common human openings on which Leela trained more than on more varied midgames. Also interesting to note that Stockfish seems to excel positionally among AB engines in openings and in endgames, not so much in midgames. But it probably has a much better search all around.

My verrry good :lol: positional test suites are attached.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Middlegame positional test-suite

Post by Ferdy »

Started testing some engines on midgames250.epd at shorter time, single thread.
Results so far, https://fsmosca.github.io/chess-tests/
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Middlegame positional test-suite

Post by Laskos »

Ferdy wrote: Sun Apr 07, 2019 5:37 pm Started testing some engines on midgames250.epd at shorter time, single thread.
Results so far, https://fsmosca.github.io/chess-tests/
Could you include SF_dev too? On my tests, it seems to have improved positionally sensibly (but on 1000 positions) over SF 10.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Middlegame positional test-suite

Post by Dann Corbit »

There are about 100 of the 450 positions in the test sets for which I have a different answer for best move than those provided.
I guess that many of mine are wrong (of course) but might be worth examination.
I only analyzed these with SF, so I would be very interested to see the results of LC0, Komodo, Houdini, etc.
The field cce is my own invention, and is a function of wins, losses and draws.
If there is a large number of games, and the cce differs significantly from the ce, then it bears further investigation.
They are sorted by id
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Middlegame positional test-suite

Post by Laskos »

Dann Corbit wrote: Tue Apr 09, 2019 10:20 pm There are about 100 of the 450 positions in the test sets for which I have a different answer for best move than those provided.
I guess that many of mine are wrong (of course) but might be worth examination.
I only analyzed these with SF, so I would be very interested to see the results of LC0, Komodo, Houdini, etc.
The field cce is my own invention, and is a function of wins, losses and draws.
If there is a large number of games, and the cce differs significantly from the ce, then it bears further investigation.
They are sorted by id
Thanks, I needed this help.
Jouni
Posts: 3281
Joined: Wed Mar 08, 2006 8:15 pm

Re: Middlegame positional test-suite

Post by Jouni »

Interesting test! Middlegame phase of the game, roughly moves 15-22. So after 23 it's endgame?
Jouni
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Middlegame positional test-suite

Post by Laskos »

Jouni wrote: Wed Apr 10, 2019 10:58 am Interesting test! Middlegame phase of the game, roughly moves 15-22. So after 23 it's endgame?
:)

No, not enough strong human games per position later in the game in my combined databases. Even this test-suite often has meager statistics of outcomes, only helped very moderately by engines' opinions, I don't want to heavily intervene with engines in very positional test-suites. I would rather get help from my 2000 FIDE level girlfriend than from say Stockfish on them. But engines are of some help, at least to get rid of tactics.
User avatar
Nordlandia
Posts: 2821
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: Middlegame positional test-suite

Post by Nordlandia »

Do this qualify as test position?

[d]r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - 0 14
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Middlegame positional test-suite

Post by Dann Corbit »

Nordlandia wrote: Thu Apr 11, 2019 7:44 pm Do this qualify as test position?

[d]r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - 0 14
I guess there must be something better than Qb8.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Middlegame positional test-suite

Post by Dann Corbit »

Nordlandia wrote: Thu Apr 11, 2019 7:44 pm Do this qualify as test position?

[d]r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - 0 14
Here is what I have for this one:
r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - acd 50; bm Qb8; c0 "http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=32975; viewtopic.php?f=2&t=70438&sid=84b06c0cb ... 4c#p795892"; ce -70; pm Qb8; pv Qb8 b3;

I can't find anything better than Qb8, which is found immediately, and then it sticks through ply 50.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.