Middlegame positional test-suite

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Laskos
Posts: 9498
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Middlegame positional test-suite

Post by Laskos » Sun Apr 07, 2019 1:39 pm

Just like in the case with opening positional test-suite described here:
http://www.talkchess.com/forum3/viewtop ... =2&t=61858

I built, based on large databases of human games a positional test-suite for middlegame phase of the game, roughly moves 15-22. The statistics of human games on each position is weaker than in openings, so it was harder to have some confidence in chosen unique solutions. I used engines mostly for checking to not have tactical complications in positions.

I have 250 middlegame positions in this positional test-suite. I also combined 4 of them in a suite of 1000 positions, for the statistical significance of the result. Results can vary quite a bit on just one run of the test-suite (especially on many cores), so 4 runs are better to have a more precise result. I uploaded here the positional Middlegames250 test-suite, combined 4 Middlegames1000 test-suite, the old Openings200 test-suite and combined 5 Openings1000 test-suite. They are the most faithfully positional test suites I am aware of, and are my creation :lol:.

The results using positional Middlegames1000 suite:
AB engines on 4 strong i7 cores.
Lc0 on RTX 2070 GPU
Positions found in time interval from 1 to 2 seconds per position using Polyglot EPD testing features.

Code: Select all

Midgames1000:

Lc0 v21.1 ID41844  737/1000
Lc0 v21.1 ID32930  730/1000
Houdini 6.03       655/1000
Houdini 6.03Tactic 635/1000
Komodo 12.3        609/1000
Stockfish_dev      584/1000
Booot 6.3.1        584/1000
Texel 1.07         561/1000
Andscacs 0.95      555/1000
Ethereal 11.25     548/1000
Fire 7.1           545/1000
Fruit 2.1          398/1000
We see Leela performing better on this positional test suite than regular engines. We see also a surprise, that SF_dev, unlike openings, is not the strongest positionally in the middlegames. It seems Houdini is strong here, and to check that my suite is not some artifact of tactics, I included Houdini Tactical too, and it performs worse than the regular Houdini. Also, for sanity check I included the simple eval Fruit 2.1, and it performs pretty badly. Comparing to the results on positional opening test suite with pretty much same engines:

Code: Select all

Openings1000:

Lc0 v21.1 ID41844  762/1000
Lc0 v21.1 ID32930  727/1000
Stockfish_dev      574/1000
Houdini 6.03       558/1000
Komodo 12.3        556/1000
Booot 6.3.1        494/1000
Andscacs 0.95      484/1000
Ethereal 11.00     457/1000
Fire 7.1           431/1000
Texel 1.07         419/1000
We can see that Leela, although, still by far the best, is not that distanced positionally in midgames as it is in the openings. And it is normal, as these are fairly common human openings on which Leela trained more than on more varied midgames. Also interesting to note that Stockfish seems to excel positionally among AB engines in openings and in endgames, not so much in midgames. But it probably has a much better search all around.

My verrry good :lol: positional test suites are attached.
Attachments
Positional.zip
Openings and Midgames, positional test-suites
(27.61 KiB) Downloaded 181 times

Ferdy
Posts: 4111
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: Middlegame positional test-suite

Post by Ferdy » Sun Apr 07, 2019 3:37 pm

Started testing some engines on midgames250.epd at shorter time, single thread.
Results so far, https://fsmosca.github.io/chess-tests/

User avatar
Laskos
Posts: 9498
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Middlegame positional test-suite

Post by Laskos » Mon Apr 08, 2019 5:53 pm

Ferdy wrote:
Sun Apr 07, 2019 3:37 pm
Started testing some engines on midgames250.epd at shorter time, single thread.
Results so far, https://fsmosca.github.io/chess-tests/
Could you include SF_dev too? On my tests, it seems to have improved positionally sensibly (but on 1000 positions) over SF 10.

Dann Corbit
Posts: 10196
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Middlegame positional test-suite

Post by Dann Corbit » Tue Apr 09, 2019 8:20 pm

There are about 100 of the 450 positions in the test sets for which I have a different answer for best move than those provided.
I guess that many of mine are wrong (of course) but might be worth examination.
I only analyzed these with SF, so I would be very interested to see the results of LC0, Komodo, Houdini, etc.
The field cce is my own invention, and is a function of wins, losses and draws.
If there is a large number of games, and the cce differs significantly from the ce, then it bears further investigation.
They are sorted by id
Attachments
kai-diff.7z
EPD positions with analysis where the best move differs from the value in my database. The field c3 contains the suggested bm from the
(10.46 KiB) Downloaded 65 times
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

User avatar
Laskos
Posts: 9498
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Middlegame positional test-suite

Post by Laskos » Tue Apr 09, 2019 8:31 pm

Dann Corbit wrote:
Tue Apr 09, 2019 8:20 pm
There are about 100 of the 450 positions in the test sets for which I have a different answer for best move than those provided.
I guess that many of mine are wrong (of course) but might be worth examination.
I only analyzed these with SF, so I would be very interested to see the results of LC0, Komodo, Houdini, etc.
The field cce is my own invention, and is a function of wins, losses and draws.
If there is a large number of games, and the cce differs significantly from the ce, then it bears further investigation.
They are sorted by id
Thanks, I needed this help.

Jouni
Posts: 2028
Joined: Wed Mar 08, 2006 7:15 pm

Re: Middlegame positional test-suite

Post by Jouni » Wed Apr 10, 2019 8:58 am

Interesting test! Middlegame phase of the game, roughly moves 15-22. So after 23 it's endgame?
Jouni

User avatar
Laskos
Posts: 9498
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: Middlegame positional test-suite

Post by Laskos » Wed Apr 10, 2019 10:24 am

Jouni wrote:
Wed Apr 10, 2019 8:58 am
Interesting test! Middlegame phase of the game, roughly moves 15-22. So after 23 it's endgame?
:)

No, not enough strong human games per position later in the game in my combined databases. Even this test-suite often has meager statistics of outcomes, only helped very moderately by engines' opinions, I don't want to heavily intervene with engines in very positional test-suites. I would rather get help from my 2000 FIDE level girlfriend than from say Stockfish on them. But engines are of some help, at least to get rid of tactics.

User avatar
Nordlandia
Posts: 2476
Joined: Fri Sep 25, 2015 7:38 pm
Location: Sortland, Norway

Re: Middlegame positional test-suite

Post by Nordlandia » Thu Apr 11, 2019 5:44 pm

Do this qualify as test position?


Dann Corbit
Posts: 10196
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Middlegame positional test-suite

Post by Dann Corbit » Thu Apr 11, 2019 6:19 pm

Nordlandia wrote:
Thu Apr 11, 2019 5:44 pm
Do this qualify as test position?

I guess there must be something better than Qb8.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Dann Corbit
Posts: 10196
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

Re: Middlegame positional test-suite

Post by Dann Corbit » Fri Apr 12, 2019 12:19 am

Nordlandia wrote:
Thu Apr 11, 2019 5:44 pm
Do this qualify as test position?

Here is what I have for this one:
r2q2k1/p1p3pp/2p5/3p1b2/8/2P2N2/PP3KPP/R1B2B1R b - - acd 50; bm Qb8; c0 "http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=32975; viewtopic.php?f=2&t=70438&sid=84b06c0cb ... 4c#p795892"; ce -70; pm Qb8; pv Qb8 b3;

I can't find anything better than Qb8, which is found immediately, and then it sticks through ply 50.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Post Reply