What's the best tactical test suite for the modern erA?

johnhamlen65 · Post by **johnhamlen65** » Tue Aug 06, 2024 1:19 am

ImNotStockfish wrote: ↑Mon Aug 05, 2024 8:17 pm Decent ones are Hard Talkchess 2020 (2021 revision), TTT1, TTT2, Arasan... You can combine them to get a better sample size but it will still be pretty small
Bad ones are ShashChess Hardpositions2022

Thanks ImNotStockfish. I'll check them all out

John

gaard · Post by **gaard** » Tue Aug 06, 2024 1:25 am

shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess

Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.

shawn · Post by **shawn** » Tue Aug 06, 2024 2:03 am

gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.

OP asked for an engine:
1. with exceptional tactical ability
2. doesn't get destroyed in 10 moves

Tactical testing suites cannot even guarantee the first (google Goodhart's law), much less the second.

gaard · Post by **gaard** » Tue Aug 06, 2024 2:09 am

shawn wrote: ↑Tue Aug 06, 2024 2:03 am
gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.
OP asked for an engine:
1. with exceptional tactical ability
2. doesn't get destroyed in 10 moves

Tactical testing suites cannot even guarantee the first (google Goodhart's law), much less the second.

Is that in another thread? What I see is:

Does anyone have and recommendations for a suite(s) of test positions which will test Tech 4's tactical ability i.e. very deep combinations which lead to material win or mate.

shawn · Post by **shawn** » Tue Aug 06, 2024 2:14 am

gaard wrote: ↑Tue Aug 06, 2024 2:09 am Is that in another thread? What I see is:

Does anyone have and recommendations for a suite(s) of test positions which will test Tech 4's tactical ability i.e. very deep combinations which lead to material win or mate.

The only hope a Tech program has of defeating a top rated program is to find it's way to a position where there are tactics beyond its opponent's horizon.

Here

Viz · Post by **Viz** » Tue Aug 06, 2024 5:51 am

gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.

Tactical suites measure ability to
surprise
solve particular tactical suites.
It doesn't translate into solving other tactical positions (especially since tactical suites are almost always cherrypicked to be "hard to solve" so "easy to solve" positions are excluded and thus you regress in them by improving in "hard" ones, at least this definitely can happen) and into finding tactics in real games.

gaard · Post by **gaard** » Tue Aug 06, 2024 6:31 am

Viz wrote: ↑Tue Aug 06, 2024 5:51 am
gaard wrote: ↑Tue Aug 06, 2024 1:25 am
shawn wrote: ↑Tue Aug 06, 2024 1:02 am
gaard wrote: ↑Mon Aug 05, 2024 7:53 pm
Viz wrote: ↑Mon Aug 05, 2024 6:35 pm The best usage of test suites during development is to not use test suites at all.
It seems counterintuitive that tactical test suites would not be useful for testing tactical ability. A few engines have tactical/mate finding modes that really do work. Not everyone is competing in the Elo arms race.
https://github.com/Matthies/RubiChess/w ... uter-chess
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.
Tactical suites measure ability to
surprise
solve particular tactical suites.
It doesn't translate into solving other tactical positions (especially since tactical suites are almost always cherrypicked to be "hard to solve" so "easy to solve" positions are excluded and thus you regress in them by improving in "hard" ones, at least this definitely can happen) and into finding tactics in real games.

There's no reason why, with a sufficient number of varied positions, it wouldn't translate into solving other tactical positions. This is how development and testing looked eons ago when cpu cycles were very limited. It seldom translated into measurable gains in game play, however. You can definitely over-fit with too few or biased positions and there are many threads here about running into that wall.

Viz · Post by **Viz** » Tue Aug 06, 2024 6:58 am

gaard wrote: ↑Tue Aug 06, 2024 6:31 amIt seldom translated into measurable gains in game play, however.

In-freaking-deed, although I should say "almost never".
And this is why almost no one actually uses them in modern era of engine development.
Not because of abundance of resources but because they are useless.
Mind you chess engines like Laser http://computerchess.org.uk/ccrl/4040/c ... 4-bit_4CPU were developed from scratch by 2 people on one 4 thread machine - but with proper SPRT and not using suites. And for pre-nn era it rating and the moment it authors quit the project was really respectable, somewhere near top-10. They definitely didn't have "unlimited resources", but they used what they had wisely and this didn't include using test suites.
The only real usage of test suites are zugzwang ones to have verification search, albeit one can argue that it doesn't really bring elo - and it indeed doesn't.

Ras · Post by **Ras** » Tue Aug 06, 2024 10:25 am

The main issue with tactical test positions is that they are designed to be tricky because they contain moves that would and should normally get pruned away. This is also why you see some positions where weaker engines can do better than top engines - paradoxically because the pruning of the weaker engines isn't as good. The gain in tricky test positions is more than compensated by the loss in normal game play.

Also, the underlying logic doesn't make sense to me: somehow reaching a position where winning by a deep tactical shot out of the blue is possible. However, that usually just doesn't happen. You need to have some sort of advantage and pressure that then translates into tactical options. Means, already having played better before that point.

Viz · Post by **Viz** » Tue Aug 06, 2024 10:33 am

Also needless to say that tactical test suites represent only fraction of difficult tactical positions.
For example, crystal is excellent in this suites.
But if it's kibitzing at TCEC (which it often does) while stockfish plays, in every final there are like 5, sometimes more, games with either of following patterns:
1) Stockfish raises evaluation rapidly, up to sometimes +2-3-5, while crystal chills at +0.6 or so. Then after 5 moves are played crystal is finally picking up winning line and starts raising evaluation itself (line usually involves some deep tactical motives in all continuations);
2) Crystal shows really low eval in smth that looks like a fortress or unwinnable endgame, stockfish moves around, sometimes up to 10-15 moves, then crystal also starts to raise eval - because fortress was breakable / endgame actually had a win with some strange queen dance.
Mind you that kibitzing machine is much more powerful speedwise than TCEC main machine and also crystal has both sf and it opponent time to think - but this quite often doesn't really help.
Opposite, when sf is blind to crystal seeing some tactic or is stuck overevaluating fortress, also happens, but rarer, at least from what I've been observing.
All of what I wrote lead to one conclusion - stockfish is actually a better tactician and also understands fortresses better on average in positions that occur in real engine games. But crystal is better at solving test suites, indeed, because they include only really really small fraction of type of positions you can get.
Also indeed previous post made a good point - tactics usually occur when position is prepared for them, not out of the blue. Almost always you get tactical opportunity in positions you are better, especially at the highest level of play.

What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?

Re: What's the best tactical test suite for the modern erA?