6 cores vs 1 core surprised me

Werewolf · Post by **Werewolf** » Sat Dec 14, 2024 1:34 pm

Ciekce wrote: ↑Sat Dec 14, 2024 10:22 am
Jouni wrote: ↑Sat Dec 14, 2024 9:29 am Maybe this confirms just that?
it confirms literally nothing, you are looking at test positions

may the gods one day grant you understanding of proper esting, as has been explained to you in any number of fora infinite times

Is the argument that test positions require searching for unusual moves that favour a broader search found in highly multi-threaded searches, but have no translation to Elo gain?

Ciekce · Post by **Ciekce** » Sat Dec 14, 2024 10:16 pm

Werewolf wrote: ↑Sat Dec 14, 2024 1:34 pm Is the argument that test positions require searching for unusual moves that favour a broader search found in highly multi-threaded searches, but have no translation to Elo gain?

it's purely that test positions have no relevance to strength

no conclusions can be drawn from performance on test suites, no matter how much certain parties on this forum cling to them

Werewolf · Post by **Werewolf** » Sun Dec 15, 2024 7:32 pm

Ciekce wrote: ↑Sat Dec 14, 2024 10:16 pm
Werewolf wrote: ↑Sat Dec 14, 2024 1:34 pm Is the argument that test positions require searching for unusual moves that favour a broader search found in highly multi-threaded searches, but have no translation to Elo gain?
it's purely that test positions have no relevance to strength

no conclusions can be drawn from performance on test suites, no matter how much certain parties on this forum cling to them

Presumably though on 1 core V 1 core, both running identical versions of Stockfish, the average solve time on a tactical test suite would be quite a good indicator of speed difference between the two machines.
I've heard the argument doesn't carry over to different thread counts (like 1 v 6) because the search profile is not the same.

Stephen Ham · Post by **Stephen Ham** » Tue Dec 17, 2024 8:13 pm

Hello Connor,

You wrote, "it's purely that test positions have no relevance to strength
no conclusions can be drawn from performance on test suites, no matter how much certain parties on this forum cling to them"

Assuming that's correct, then it's correct only for speed chess and standard time-control chess. However, as an ICCF GM, I desire the "best" analysis engine for correspondence chess. There, I think test position comparisons of engine performance is meaningful, and perhaps the only way to compare engines. For example, we've/I've seen relatively simple test positions where Stockfish (SF) fails to find solutions due to radical pruning. Yet, "weaker" chess engines will solve these same positions.

SF is designed, coded, and tested only at speed chess time-controls. To its credit, it still performs well at tournament time controls (e.g. TCEC). But even in the TCEC, moves are made in roughly three minutes/move. However in correspondence chess, the engine runs much longer per ply searched. There, a broader search is required (less pruning), along with the best evaluation function (generally the largest NNUE).

I'd thus value your opinion regarding my search for the best analysis engine. I hope you agree that test position comparisons are then perhaps the only way to measure quality.

All the best,
-Steve-

Uri Blass · Post by **Uri Blass** » Tue Dec 17, 2024 9:50 pm

Stephen Ham wrote: ↑Tue Dec 17, 2024 8:13 pm Hello Connor,

You wrote, "it's purely that test positions have no relevance to strength
no conclusions can be drawn from performance on test suites, no matter how much certain parties on this forum cling to them"

Assuming that's correct, then it's correct only for speed chess and standard time-control chess. However, as an ICCF GM, I desire the "best" analysis engine for correspondence chess. There, I think test position comparisons of engine performance is meaningful, and perhaps the only way to compare engines. For example, we've/I've seen relatively simple test positions where Stockfish (SF) fails to find solutions due to radical pruning. Yet, "weaker" chess engines will solve these same positions.

SF is designed, coded, and tested only at speed chess time-controls. To its credit, it still performs well at tournament time controls (e.g. TCEC). But even in the TCEC, moves are made in roughly three minutes/move. However in correspondence chess, the engine runs much longer per ply searched. There, a broader search is required (less pruning), along with the best evaluation function (generally the largest NNUE).

I'd thus value your opinion regarding my search for the best analysis engine. I hope you agree that test position comparisons are then perhaps the only way to measure quality.

All the best,
-Steve-

I see no proof that engines that are strong in test positions are relatively better at longer time control.
There are a lot of positions when part of the strong engines blunder that are not in test suites.

Test suites are usually about positions when one move is significantly better than the rest of the moves but part of the blunders in chess are not in positions when one move is significantly better than the rest of the moves.

Edit:For correspondence games I guess that this is practically a draw and I wonder if somebody can show me in the last 2 years a win against stockfish in correspondence chess when the opponent alway chose latest stockfish's move after at least an hour of search and did not use a book.

AndrewGrant · Post by **AndrewGrant** » Tue Dec 17, 2024 10:03 pm

I've posted this image a few times, from some Torch data with fixed movetime = 1000ms games.
SMP algorithms paired with NNUE things, core doublings are ALMOST as good as time doublings.

Werewolf · Post by **Werewolf** » Tue Dec 17, 2024 11:50 pm

AndrewGrant wrote: ↑Tue Dec 17, 2024 10:03 pm I've posted this image a few times, from some Torch data with fixed movetime = 1000ms games.
SMP algorithms paired with NNUE things, core doublings are ALMOST as good as time doublings.

That's better than I expected at lower core counts, but worse than I expected at 16 cores+ given we're now on Lazy SMP.

AndrewGrant · Post by **AndrewGrant** » Wed Dec 18, 2024 12:20 am

Werewolf wrote: ↑Tue Dec 17, 2024 11:50 pm
AndrewGrant wrote: ↑Tue Dec 17, 2024 10:03 pm I've posted this image a few times, from some Torch data with fixed movetime = 1000ms games.
SMP algorithms paired with NNUE things, core doublings are ALMOST as good as time doublings.
...
That's better than I expected at lower core counts, but worse than I expected at 16 cores+ given we're now on Lazy SMP.

Well, hard to say for sure. Some of the drop is due to the elo compression. The game-pair elo chart does not show quite the same drop.

syzygy · Post by **syzygy** » Wed Dec 18, 2024 12:37 am

Stephen Ham wrote: ↑Tue Dec 17, 2024 8:13 pm Assuming that's correct, then it's correct only for speed chess and standard time-control chess. However, as an ICCF GM, I desire the "best" analysis engine for correspondence chess. There, I think test position comparisons of engine performance is meaningful, and perhaps the only way to compare engines. For example, we've/I've seen relatively simple test positions where Stockfish (SF) fails to find solutions due to radical pruning. Yet, "weaker" chess engines will solve these same positions.

But those "simple test positions" are cherry picked. On the vast majority of positions, SF will outperform weaker engines because that is what it means to be stronger.

SF is designed, coded, and tested only at speed chess time-controls.

No engine is more thoroughly tested than SF. The only difference is that SF's testing is public and you have all the information about it. But no other group of developers has the computing capacity that would be necessary to test patches at long time controls. It is as simple as that.

jkominek · Post by **jkominek** » Wed Dec 18, 2024 5:14 am

AndrewGrant wrote: ↑Tue Dec 17, 2024 10:03 pm I've posted this image a few times, from some Torch data with fixed movetime = 1000ms games.
SMP algorithms paired with NNUE things, core doublings are ALMOST as good as time doublings.

I can grasp the measured (blue/green) curves. What is the equation and its derivation behind the slightly sub-linear curves representing ideal scaling? Likely you've explained it before, but elsewhere.

6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me

Re: 6 cores vs 1 core surprised me