What's the best tactical test suite for the modern erA?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
johnhamlen65
Posts: 13
Joined: Fri May 12, 2023 10:15 am
Location: Melton Mowbray, England
Full name: John Hamlen

Re: What's the best tactical test suite for the modern erA?

Post by johnhamlen65 »

ImNotStockfish wrote: ↑Mon Aug 05, 2024 8:17 pm Decent ones are Hard Talkchess 2020 (2021 revision), TTT1, TTT2, Arasan... You can combine them to get a better sample size but it will still be pretty small
Bad ones are ShashChess Hardpositions2022 :lol:
Thanks ImNotStockfish. I'll check them all out πŸ€“πŸ‘ John
gaard
Posts: 459
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: What's the best tactical test suite for the modern erA?

Post by gaard »

shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.
shawn
Posts: 46
Joined: Fri Jun 28, 2024 9:24 am
Full name: Shawn Xu

Re: What's the best tactical test suite for the modern erA?

Post by shawn »

gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.
OP asked for an engine:
1. with exceptional tactical ability
2. doesn't get destroyed in 10 moves

Tactical testing suites cannot even guarantee the first (google Goodhart's law), much less the second.
gaard
Posts: 459
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: What's the best tactical test suite for the modern erA?

Post by gaard »

shawn wrote: ↑Tue Aug 06, 2024 2:03 am
gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.
OP asked for an engine:
1. with exceptional tactical ability
2. doesn't get destroyed in 10 moves

Tactical testing suites cannot even guarantee the first (google Goodhart's law), much less the second.
Is that in another thread? What I see is:
Does anyone have and recommendations for a suite(s) of test positions which will test Tech 4's tactical ability i.e. very deep combinations which lead to material win or mate.
shawn
Posts: 46
Joined: Fri Jun 28, 2024 9:24 am
Full name: Shawn Xu

Re: What's the best tactical test suite for the modern erA?

Post by shawn »

gaard wrote: ↑Tue Aug 06, 2024 2:09 am Is that in another thread? What I see is:
Does anyone have and recommendations for a suite(s) of test positions which will test Tech 4's tactical ability i.e. very deep combinations which lead to material win or mate.
The only hope a Tech program has of defeating a top rated program is to find it's way to a position where there are tactics beyond its opponent's horizon.
Here
Viz
Posts: 169
Joined: Tue Apr 09, 2024 6:24 am
Full name: Michael Chaly

Re: What's the best tactical test suite for the modern erA?

Post by Viz »

gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.
Tactical suites measure ability to
surprise
solve particular tactical suites.
It doesn't translate into solving other tactical positions (especially since tactical suites are almost always cherrypicked to be "hard to solve" so "easy to solve" positions are excluded and thus you regress in them by improving in "hard" ones, at least this definitely can happen) and into finding tactics in real games.
gaard
Posts: 459
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: What's the best tactical test suite for the modern erA?

Post by gaard »

Viz wrote: ↑Tue Aug 06, 2024 5:51 am
gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.
Tactical suites measure ability to
surprise
solve particular tactical suites.
It doesn't translate into solving other tactical positions (especially since tactical suites are almost always cherrypicked to be "hard to solve" so "easy to solve" positions are excluded and thus you regress in them by improving in "hard" ones, at least this definitely can happen) and into finding tactics in real games.
There's no reason why, with a sufficient number of varied positions, it wouldn't translate into solving other tactical positions. This is how development and testing looked eons ago when cpu cycles were very limited. It seldom translated into measurable gains in game play, however. You can definitely over-fit with too few or biased positions and there are many threads here about running into that wall.
Viz
Posts: 169
Joined: Tue Apr 09, 2024 6:24 am
Full name: Michael Chaly

Re: What's the best tactical test suite for the modern erA?

Post by Viz »

gaard wrote: ↑Tue Aug 06, 2024 6:31 amIt seldom translated into measurable gains in game play, however.
In-freaking-deed, although I should say "almost never".
And this is why almost no one actually uses them in modern era of engine development.
Not because of abundance of resources but because they are useless.
Mind you chess engines like Laser http://computerchess.org.uk/ccrl/4040/c ... 4-bit_4CPU were developed from scratch by 2 people on one 4 thread machine - but with proper SPRT and not using suites. And for pre-nn era it rating and the moment it authors quit the project was really respectable, somewhere near top-10. They definitely didn't have "unlimited resources", but they used what they had wisely and this didn't include using test suites.
The only real usage of test suites are zugzwang ones to have verification search, albeit one can argue that it doesn't really bring elo - and it indeed doesn't.
Ras
Posts: 2555
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: What's the best tactical test suite for the modern erA?

Post by Ras »

The main issue with tactical test positions is that they are designed to be tricky because they contain moves that would and should normally get pruned away. This is also why you see some positions where weaker engines can do better than top engines - paradoxically because the pruning of the weaker engines isn't as good. The gain in tricky test positions is more than compensated by the loss in normal game play.

Also, the underlying logic doesn't make sense to me: somehow reaching a position where winning by a deep tactical shot out of the blue is possible. However, that usually just doesn't happen. You need to have some sort of advantage and pressure that then translates into tactical options. Means, already having played better before that point.
Rasmus Althoff
https://www.ct800.net
Viz
Posts: 169
Joined: Tue Apr 09, 2024 6:24 am
Full name: Michael Chaly

Re: What's the best tactical test suite for the modern erA?

Post by Viz »

Also needless to say that tactical test suites represent only fraction of difficult tactical positions.
For example, crystal is excellent in this suites.
But if it's kibitzing at TCEC (which it often does) while stockfish plays, in every final there are like 5, sometimes more, games with either of following patterns:
1) Stockfish raises evaluation rapidly, up to sometimes +2-3-5, while crystal chills at +0.6 or so. Then after 5 moves are played crystal is finally picking up winning line and starts raising evaluation itself (line usually involves some deep tactical motives in all continuations);
2) Crystal shows really low eval in smth that looks like a fortress or unwinnable endgame, stockfish moves around, sometimes up to 10-15 moves, then crystal also starts to raise eval - because fortress was breakable / endgame actually had a win with some strange queen dance.
Mind you that kibitzing machine is much more powerful speedwise than TCEC main machine and also crystal has both sf and it opponent time to think - but this quite often doesn't really help.
Opposite, when sf is blind to crystal seeing some tactic or is stuck overevaluating fortress, also happens, but rarer, at least from what I've been observing.
All of what I wrote lead to one conclusion - stockfish is actually a better tactician and also understands fortresses better on average in positions that occur in real engine games. But crystal is better at solving test suites, indeed, because they include only really really small fraction of type of positions you can get.
Also indeed previous post made a good point - tactics usually occur when position is prepared for them, not out of the blue. Almost always you get tactical opportunity in positions you are better, especially at the highest level of play.