Middlegame positional test-suite

Laskos · Post by **Laskos** » Sun Apr 07, 2019 3:39 pm

Just like in the case with opening positional test-suite described here:
http://www.talkchess.com/forum3/viewtop ... =2&t=61858

I built, based on large databases of human games a positional test-suite for middlegame phase of the game, roughly moves 15-22. The statistics of human games on each position is weaker than in openings, so it was harder to have some confidence in chosen unique solutions. I used engines mostly for checking to not have tactical complications in positions.

I have 250 middlegame positions in this positional test-suite. I also combined 4 of them in a suite of 1000 positions, for the statistical significance of the result. Results can vary quite a bit on just one run of the test-suite (especially on many cores), so 4 runs are better to have a more precise result. I uploaded here the positional Middlegames250 test-suite, combined 4 Middlegames1000 test-suite, the old Openings200 test-suite and combined 5 Openings1000 test-suite. They are the most faithfully positional test suites I am aware of, and are my creation

.

The results using positional Middlegames1000 suite:
AB engines on 4 strong i7 cores.
Lc0 on RTX 2070 GPU
Positions found in time interval from 1 to 2 seconds per position using Polyglot EPD testing features.

Code: Select all

Midgames1000:

Lc0 v21.1 ID41844  737/1000
Lc0 v21.1 ID32930  730/1000
Houdini 6.03       655/1000
Houdini 6.03Tactic 635/1000
Komodo 12.3        609/1000
Stockfish_dev      584/1000
Booot 6.3.1        584/1000
Texel 1.07         561/1000
Andscacs 0.95      555/1000
Ethereal 11.25     548/1000
Fire 7.1           545/1000
Fruit 2.1          398/1000

We see Leela performing better on this positional test suite than regular engines. We see also a surprise, that SF_dev, unlike openings, is not the strongest positionally in the middlegames. It seems Houdini is strong here, and to check that my suite is not some artifact of tactics, I included Houdini Tactical too, and it performs worse than the regular Houdini. Also, for sanity check I included the simple eval Fruit 2.1, and it performs pretty badly. Comparing to the results on positional opening test suite with pretty much same engines:

Code: Select all

Openings1000:

Lc0 v21.1 ID41844  762/1000
Lc0 v21.1 ID32930  727/1000
Stockfish_dev      574/1000
Houdini 6.03       558/1000
Komodo 12.3        556/1000
Booot 6.3.1        494/1000
Andscacs 0.95      484/1000
Ethereal 11.00     457/1000
Fire 7.1           431/1000
Texel 1.07         419/1000

We can see that Leela, although, still by far the best, is not that distanced positionally in midgames as it is in the openings. And it is normal, as these are fairly common human openings on which Leela trained more than on more varied midgames. Also interesting to note that Stockfish seems to excel positionally among AB engines in openings and in endgames, not so much in midgames. But it probably has a much better search all around.

My verrry good

positional test suites are attached.

Ferdy · Post by **Ferdy** » Sun Apr 07, 2019 5:37 pm

Started testing some engines on midgames250.epd at shorter time, single thread.
Results so far, https://fsmosca.github.io/chess-tests/

Laskos · Post by **Laskos** » Mon Apr 08, 2019 7:53 pm

Ferdy wrote: ↑Sun Apr 07, 2019 5:37 pm Started testing some engines on midgames250.epd at shorter time, single thread.
Results so far, https://fsmosca.github.io/chess-tests/

Could you include SF_dev too? On my tests, it seems to have improved positionally sensibly (but on 1000 positions) over SF 10.

Dann Corbit · Post by **Dann Corbit** » Tue Apr 09, 2019 10:20 pm

There are about 100 of the 450 positions in the test sets for which I have a different answer for best move than those provided.
I guess that many of mine are wrong (of course) but might be worth examination.
I only analyzed these with SF, so I would be very interested to see the results of LC0, Komodo, Houdini, etc.
The field cce is my own invention, and is a function of wins, losses and draws.
If there is a large number of games, and the cce differs significantly from the ce, then it bears further investigation.
They are sorted by id

Laskos · Post by **Laskos** » Tue Apr 09, 2019 10:31 pm

Dann Corbit wrote: ↑Tue Apr 09, 2019 10:20 pm There are about 100 of the 450 positions in the test sets for which I have a different answer for best move than those provided.
I guess that many of mine are wrong (of course) but might be worth examination.
I only analyzed these with SF, so I would be very interested to see the results of LC0, Komodo, Houdini, etc.
The field cce is my own invention, and is a function of wins, losses and draws.
If there is a large number of games, and the cce differs significantly from the ce, then it bears further investigation.
They are sorted by id

Thanks, I needed this help.

Jouni · Post by **Jouni** » Wed Apr 10, 2019 10:58 am

Interesting test! Middlegame phase of the game, roughly moves 15-22. So after 23 it's endgame?

Laskos · Post by **Laskos** » Wed Apr 10, 2019 12:24 pm

Jouni wrote: ↑Wed Apr 10, 2019 10:58 am Interesting test! Middlegame phase of the game, roughly moves 15-22. So after 23 it's endgame?

No, not enough strong human games per position later in the game in my combined databases. Even this test-suite often has meager statistics of outcomes, only helped very moderately by engines' opinions, I don't want to heavily intervene with engines in very positional test-suites. I would rather get help from my 2000 FIDE level girlfriend than from say Stockfish on them. But engines are of some help, at least to get rid of tactics.

Nordlandia · Post by **Nordlandia** » Thu Apr 11, 2019 7:44 pm

Do this qualify as test position?

[d]r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - 0 14

Dann Corbit · Post by **Dann Corbit** » Thu Apr 11, 2019 8:19 pm

Nordlandia wrote: ↑Thu Apr 11, 2019 7:44 pm Do this qualify as test position?

[d]r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - 0 14

I guess there must be something better than Qb8.

Dann Corbit · Post by **Dann Corbit** » Fri Apr 12, 2019 2:19 am

Nordlandia wrote: ↑Thu Apr 11, 2019 7:44 pm Do this qualify as test position?

[d]r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - 0 14

Here is what I have for this one:
r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - acd 50; bm Qb8; c0 "http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=32975; viewtopic.php?f=2&t=70438&sid=84b06c0cb ... 4c#p795892"; ce -70; pm Qb8; pv Qb8 b3;

I can't find anything better than Qb8, which is found immediately, and then it sticks through ply 50.

Middlegame positional test-suite

Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite

Re: Middlegame positional test-suite