Thanks ImNotStockfish. I'll check them all out JohnImNotStockfish wrote: βMon Aug 05, 2024 8:17 pm Decent ones are Hard Talkchess 2020 (2021 revision), TTT1, TTT2, Arasan... You can combine them to get a better sample size but it will still be pretty small
Bad ones are ShashChess Hardpositions2022
What's the best tactical test suite for the modern erA?
Moderators: hgm, Rebel, chrisw, Ras, hgm, chrisw, Rebel, Ras
-
- Posts: 31
- Joined: Fri May 12, 2023 10:15 am
- Location: Melton Mowbray, England
- Full name: John Hamlen
Re: What's the best tactical test suite for the modern erA?
-
- Posts: 462
- Joined: Mon Jun 07, 2010 3:13 am
- Location: Holland, MI
- Full name: Martin W
Re: What's the best tactical test suite for the modern erA?
Are you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.shawn wrote: βTue Aug 06, 2024 1:02 amhttps://github.com/Matthies/RubiChess/w ... uter-chess
-
- Posts: 95
- Joined: Fri Jun 28, 2024 9:24 am
- Full name: Wallace Shawn
Re: What's the best tactical test suite for the modern erA?
OP asked for an engine:gaard wrote: βTue Aug 06, 2024 1:25 amAre you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.shawn wrote: βTue Aug 06, 2024 1:02 amhttps://github.com/Matthies/RubiChess/w ... uter-chess
1. with exceptional tactical ability
2. doesn't get destroyed in 10 moves
Tactical testing suites cannot even guarantee the first (google Goodhart's law), much less the second.
-
- Posts: 462
- Joined: Mon Jun 07, 2010 3:13 am
- Location: Holland, MI
- Full name: Martin W
Re: What's the best tactical test suite for the modern erA?
Is that in another thread? What I see is:shawn wrote: βTue Aug 06, 2024 2:03 amOP asked for an engine:gaard wrote: βTue Aug 06, 2024 1:25 amAre you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.shawn wrote: βTue Aug 06, 2024 1:02 amhttps://github.com/Matthies/RubiChess/w ... uter-chess
1. with exceptional tactical ability
2. doesn't get destroyed in 10 moves
Tactical testing suites cannot even guarantee the first (google Goodhart's law), much less the second.
Does anyone have and recommendations for a suite(s) of test positions which will test Tech 4's tactical ability i.e. very deep combinations which lead to material win or mate.
-
- Posts: 95
- Joined: Fri Jun 28, 2024 9:24 am
- Full name: Wallace Shawn
Re: What's the best tactical test suite for the modern erA?
HereThe only hope a Tech program has of defeating a top rated program is to find it's way to a position where there are tactics beyond its opponent's horizon.
-
- Posts: 223
- Joined: Tue Apr 09, 2024 6:24 am
- Full name: Michael Chaly
Re: What's the best tactical test suite for the modern erA?
Tactical suites measure ability togaard wrote: βTue Aug 06, 2024 1:25 amAre you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.shawn wrote: βTue Aug 06, 2024 1:02 amhttps://github.com/Matthies/RubiChess/w ... uter-chess
surprise
solve particular tactical suites.
It doesn't translate into solving other tactical positions (especially since tactical suites are almost always cherrypicked to be "hard to solve" so "easy to solve" positions are excluded and thus you regress in them by improving in "hard" ones, at least this definitely can happen) and into finding tactics in real games.
-
- Posts: 462
- Joined: Mon Jun 07, 2010 3:13 am
- Location: Holland, MI
- Full name: Martin W
Re: What's the best tactical test suite for the modern erA?
There's no reason why, with a sufficient number of varied positions, it wouldn't translate into solving other tactical positions. This is how development and testing looked eons ago when cpu cycles were very limited. It seldom translated into measurable gains in game play, however. You can definitely over-fit with too few or biased positions and there are many threads here about running into that wall.Viz wrote: βTue Aug 06, 2024 5:51 amTactical suites measure ability togaard wrote: βTue Aug 06, 2024 1:25 amAre you saying tactical test suites do not measure an engine's tactical ability? That is specifically what OP asked for. For Elo there are more productive methods.shawn wrote: βTue Aug 06, 2024 1:02 amhttps://github.com/Matthies/RubiChess/w ... uter-chess
surprise
solve particular tactical suites.
It doesn't translate into solving other tactical positions (especially since tactical suites are almost always cherrypicked to be "hard to solve" so "easy to solve" positions are excluded and thus you regress in them by improving in "hard" ones, at least this definitely can happen) and into finding tactics in real games.
-
- Posts: 223
- Joined: Tue Apr 09, 2024 6:24 am
- Full name: Michael Chaly
Re: What's the best tactical test suite for the modern erA?
In-freaking-deed, although I should say "almost never".
And this is why almost no one actually uses them in modern era of engine development.
Not because of abundance of resources but because they are useless.
Mind you chess engines like Laser http://computerchess.org.uk/ccrl/4040/c ... 4-bit_4CPU were developed from scratch by 2 people on one 4 thread machine - but with proper SPRT and not using suites. And for pre-nn era it rating and the moment it authors quit the project was really respectable, somewhere near top-10. They definitely didn't have "unlimited resources", but they used what they had wisely and this didn't include using test suites.
The only real usage of test suites are zugzwang ones to have verification search, albeit one can argue that it doesn't really bring elo - and it indeed doesn't.
-
- Posts: 2668
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: What's the best tactical test suite for the modern erA?
The main issue with tactical test positions is that they are designed to be tricky because they contain moves that would and should normally get pruned away. This is also why you see some positions where weaker engines can do better than top engines - paradoxically because the pruning of the weaker engines isn't as good. The gain in tricky test positions is more than compensated by the loss in normal game play.
Also, the underlying logic doesn't make sense to me: somehow reaching a position where winning by a deep tactical shot out of the blue is possible. However, that usually just doesn't happen. You need to have some sort of advantage and pressure that then translates into tactical options. Means, already having played better before that point.
Also, the underlying logic doesn't make sense to me: somehow reaching a position where winning by a deep tactical shot out of the blue is possible. However, that usually just doesn't happen. You need to have some sort of advantage and pressure that then translates into tactical options. Means, already having played better before that point.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 223
- Joined: Tue Apr 09, 2024 6:24 am
- Full name: Michael Chaly
Re: What's the best tactical test suite for the modern erA?
Also needless to say that tactical test suites represent only fraction of difficult tactical positions.
For example, crystal is excellent in this suites.
But if it's kibitzing at TCEC (which it often does) while stockfish plays, in every final there are like 5, sometimes more, games with either of following patterns:
1) Stockfish raises evaluation rapidly, up to sometimes +2-3-5, while crystal chills at +0.6 or so. Then after 5 moves are played crystal is finally picking up winning line and starts raising evaluation itself (line usually involves some deep tactical motives in all continuations);
2) Crystal shows really low eval in smth that looks like a fortress or unwinnable endgame, stockfish moves around, sometimes up to 10-15 moves, then crystal also starts to raise eval - because fortress was breakable / endgame actually had a win with some strange queen dance.
Mind you that kibitzing machine is much more powerful speedwise than TCEC main machine and also crystal has both sf and it opponent time to think - but this quite often doesn't really help.
Opposite, when sf is blind to crystal seeing some tactic or is stuck overevaluating fortress, also happens, but rarer, at least from what I've been observing.
All of what I wrote lead to one conclusion - stockfish is actually a better tactician and also understands fortresses better on average in positions that occur in real engine games. But crystal is better at solving test suites, indeed, because they include only really really small fraction of type of positions you can get.
Also indeed previous post made a good point - tactics usually occur when position is prepared for them, not out of the blue. Almost always you get tactical opportunity in positions you are better, especially at the highest level of play.
For example, crystal is excellent in this suites.
But if it's kibitzing at TCEC (which it often does) while stockfish plays, in every final there are like 5, sometimes more, games with either of following patterns:
1) Stockfish raises evaluation rapidly, up to sometimes +2-3-5, while crystal chills at +0.6 or so. Then after 5 moves are played crystal is finally picking up winning line and starts raising evaluation itself (line usually involves some deep tactical motives in all continuations);
2) Crystal shows really low eval in smth that looks like a fortress or unwinnable endgame, stockfish moves around, sometimes up to 10-15 moves, then crystal also starts to raise eval - because fortress was breakable / endgame actually had a win with some strange queen dance.
Mind you that kibitzing machine is much more powerful speedwise than TCEC main machine and also crystal has both sf and it opponent time to think - but this quite often doesn't really help.
Opposite, when sf is blind to crystal seeing some tactic or is stuck overevaluating fortress, also happens, but rarer, at least from what I've been observing.
All of what I wrote lead to one conclusion - stockfish is actually a better tactician and also understands fortresses better on average in positions that occur in real engine games. But crystal is better at solving test suites, indeed, because they include only really really small fraction of type of positions you can get.
Also indeed previous post made a good point - tactics usually occur when position is prepared for them, not out of the blue. Almost always you get tactical opportunity in positions you are better, especially at the highest level of play.