AndrewGrant wrote: ↑Tue Dec 17, 2024 10:03 pm
I've posted this image a few times, from some Torch data with fixed movetime = 1000ms games.
SMP algorithms paired with NNUE things, core doublings are ALMOST as good as time doublings.
I can grasp the measured (blue/green) curves. What is the equation and its derivation behind the slightly sub-linear curves representing ideal scaling? Likely you've explained it before, but elsewhere.
Here is an alternative display to the underlying data, which might make things more clear.
You'll see that I tested not just doubling, from 1v2 to 1v4 to 1v8, .... but also 1v2 to 2v4 to 4v8 to ... in order to establish some understanding of the compression of elo due to the increased strength.
Here, "ideal" would be if the gain from 1v2 plus the gain from 1v4, would be equal to 1v4. So in the time doubling chart, giving 2s vs 1s, we gained 87.3 elo. When giving 4s vs 2s, we gained another 77.8 elo. Combined, we are a little over 160 elo. HOWEVER, in a 4s vs 1s test, we only managed to gain 139.4.
This data is some version of Torch, with a time control of MT=1000 as the baseline, scaled to a thread of a Ryzen 3700x when running max concurrency.
Ah, okay. Thank you. I have seen this behavior before and have an explanation for the phenomena of Elo clamping. Elo ratings are normally additive-transitive, and under reasonable assumptions they are, over many class separations. So when that is not seen it is worth understanding why.
Though I have never had the privilege of testing Torch myself, it is an extremely strong engine, among the top 2-3. Therefore even at your bullet time control it is performing at a high level. Combine this with a test suite drawn from, for example, Pohl's UHO 2024 8mvs_+095_+104. The weaker version of Torch playing the white (advantageous) side of the opening is nonetheless strong enough to almost always hold the game to a draw. Depending on relative strength the stronger version of Torch will sometimes lose, yet will often be able to neutralize white's opening advantage down to a draw. But the weaker Torch has chess "figured out" sufficiently well to not blunder the position into a loss, no matter how much time or cores advantage is given to the stronger side. Therefore the stronger side can at most score (1-0, 1/2-1/2) per game pair. Because ELO(p=0.75) = 190.8, that is your rating ceiling against any fixed opponent. Notice that at 32:1 time advantage, the greatest disparity in your data, the score delta is 190.5 Elo.
Let's say you expanded data collection with T=2 as the baseline. A contest of 64:2 - or 2048:2 for that matter - will also have a maximum separation of 190.8 Elo points. This is not the case when the baseline engine is weak, for example Spike 1.1, to cross-reference another thread. But chain several time-doubled Torch engines together as you have done and the rating ladder extends to greater separation. The ladder is labeled Ideal in your plots. (I get it now.) If you had conducted the full cross matrix of pairings and combined the results into one ratings calculation the resultant curves will be about midway between red/orange and blue/green. The closely matched contestants pull the ratings apart, while the distant contestants pull the ratings together.
The nearly linear curves in the Pair Ratio Elo plots are also explainable from this point of view. Under game pair scoring every color-swapped game pair is available to be won by the stronger side. The Elo dynamic range is then limited by the size of the sample (number of game pairs), only gradually being compressed as the engines, or engine settings, become stronger in an absolute sense.
Scoring game pairs as three not four results has the disadvantage of losing information. But when the players are very strong and being tested from openings on the edge of win/loss, game pair based calculations accentuates separation in strength. I can see why measuring this way has become popular among devs and testers.
jkominek wrote: ↑Wed Dec 18, 2024 7:04 am
Ah, okay. Thank you. I have seen this behavior before and have an explanation for the phenomena of Elo clamping. Elo ratings are normally additive-transitive, and under reasonable assumptions they are, over many class separations. So when that is not seen it is worth understanding why.
Though I have never had the privilege of testing Torch myself, it is an extremely strong engine, among the top 2-3. Therefore even at your bullet time control it is performing at a high level. Combine this with a test suite drawn from, for example, Pohl's UHO 2024 8mvs_+095_+104. The weaker version of Torch playing the white (advantageous) side of the opening is nonetheless strong enough to almost always hold the game to a draw. Depending on relative strength the stronger version of Torch will sometimes lose, yet will often be able to neutralize white's opening advantage down to a draw. But the weaker Torch has chess "figured out" sufficiently well to not blunder the position into a loss, no matter how much time or cores advantage is given to the stronger side. Therefore the stronger side can at most score (1-0, 1/2-1/2) per game pair. Because ELO(p=0.75) = 190.8, that is your rating ceiling against any fixed opponent. Notice that at 32:1 time advantage, the greatest disparity in your data, the score delta is 190.5 Elo.
Let's say you expanded data collection with T=2 as the baseline. A contest of 64:2 - or 2048:2 for that matter - will also have a maximum separation of 190.8 Elo points. This is not the case when the baseline engine is weak, for example Spike 1.1, to cross-reference another thread. But chain several time-doubled Torch engines together as you have done and the rating ladder extends to greater separation. The ladder is labeled Ideal in your plots. (I get it now.) If you had conducted the full cross matrix of pairings and combined the results into one ratings calculation the resultant curves will be about midway between red/orange and blue/green. The closely matched contestants pull the ratings apart, while the distant contestants pull the ratings together.
The nearly linear curves in the Pair Ratio Elo plots are also explainable from this point of view. Under game pair scoring every color-swapped game pair is available to be won by the stronger side. The Elo dynamic range is then limited by the size of the sample (number of game pairs), only gradually being compressed as the engines, or engine settings, become stronger in an absolute sense.
Scoring game pairs as three not four results has the disadvantage of losing information. But when the players are very strong and being tested from openings on the edge of win/loss, game pair based calculations accentuates separation in strength. I can see why measuring this way has become popular among devs and testers.
Yeah you've pretty much laid out the exact conclusions. The remark about the 75% "Wall" of ~190 elo, is why I have that black line on the first chart. Hard to pass for such strong engines.
Presumably, with a MORE balanced book, we could increase the elo difference between the largest imbalances, although decrease it for the closer ones. This is pretty much the ONLY (good) argument I am aware of for not using Pohl's books for sufficiently big rating lists. Although I've not actually seen anyone doing such a thing articulate said argument.
Is Torch 3 now an abandonware, after Chess.com closed the project?
In this case It's really a pity, a complete waste of time and money.
It was so close to Stockfish 17: only 50 points less according to the reliable SPCC rating list!
Chess.com could giveaway it as a freeware, even not releasing the source, since using it only as an analyzer is disappointing for all people that have partecipated to the development.
PS: As Chess.com bot I would suggest Shashchess instead, with multiple personalities and much more to come very soon. Andrea Manzo is also in very good relationships with IGCA and all the mythical developers that have partecipated to this year event.
Chess engines and dedicated chess computers fan since 1981 macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64. ProteusSF Dev Forum
AlexChess wrote: ↑Thu Dec 19, 2024 9:48 am
Is Torch 3 now an abandonware, after Chess.com closed the project?
In this case It's really a pity, a complete waste of time and money...
From what Andrew said, what they have learned in the Torch project will be applied to chess.com bots and analysis mode - so going forward, both of these will improve.
Human chess is partly about tactics and strategy, but mostly about memory
AlexChess wrote: ↑Thu Dec 19, 2024 9:48 am
Is Torch 3 now an abandonware, after Chess.com closed the project?
In this case It's really a pity, a complete waste of time and money.
It was so close to Stockfish 17: only 50 points less according to the reliable SPCC rating list!
Chess.com could giveaway it as a freeware, even not releasing the source, since using it only as an analyzer is disappointing for all people that have partecipated to the development.
PS: As Chess.com bot I would suggest Shashchess instead, with multiple personalities and much more to come very soon. Andrea Manzo is also in very good relationships with IGCA and all the mythical developers that have partecipated to this year event.
I would quit before I played any role in Shashchess being apart of anything at chesscom lol.
AlexChess wrote: ↑Thu Dec 19, 2024 9:48 am
Is Torch 3 now an abandonware, after Chess.com closed the project?
In this case It's really a pity, a complete waste of time and money...
From what Andrew said, what they have learned in the Torch project will be applied to chess.com bots and analysis mode - so going forward, both of these will improve.
The purpose of Torch, from the start, was never to grind the elo ladder. The purpose was to replace Komodo, with a stronger, more modern engine, which was more adaptable and manageable for integration with other products. I do think it is a pity that the focus on strength ended so quickly -- we had a little over a year, for a very small team of people, and we accomplished more than anyone would have expected. But Torch itself is not abandoned. I work on things inside of, or as-a-result-of torch, everyday. Basically making Torch more and more feature rich for human use.
AndrewGrant wrote: ↑Thu Dec 19, 2024 10:37 amThe purpose of Torch, from the start, was never to grind the elo ladder. The purpose was to replace Komodo, with a stronger, more modern engine, which was more adaptable and manageable for integration with other products...
Sounds as though Torch is strongly object oriented, with discrete objects and methods that can be readily dropped into other products.
On working out how to build on, or develop, other products, the choices to be made and the tasks to be done become quicker and easier.
The new logo for chess software development at chess.com: Torch lights the way.
Human chess is partly about tactics and strategy, but mostly about memory
AndrewGrant wrote: ↑Thu Dec 19, 2024 10:37 amThe purpose of Torch, from the start, was never to grind the elo ladder. The purpose was to replace Komodo, with a stronger, more modern engine, which was more adaptable and manageable for integration with other products...
Sounds as though Torch is strongly object oriented, with discrete objects and methods that can be readily dropped into other products.
On working out how to build on, or develop, other products, the choices to be made and the tasks to be done become quicker and easier.
The new logo for chess software development at chess.com: Torch lights the way.
I don't believe in object oriented programming. Its more the case that there are no globals running amok in Torch; Everything is contained inside classes or structs; and thus things compile nicely down to a .dll/.so; and many fully independent (except shared network weight) "copies" of Torch can be running inside the same binary at once.
AlexChess wrote: ↑Thu Dec 19, 2024 9:48 am
Is Torch 3 now an abandonware, after Chess.com closed the project?
In this case It's really a pity, a complete waste of time and money.
It was so close to Stockfish 17: only 50 points less according to the reliable SPCC rating list!
Chess.com could giveaway it as a freeware, even not releasing the source, since using it only as an analyzer is disappointing for all people that have partecipated to the development.
PS: As Chess.com bot I would suggest Shashchess instead, with multiple personalities and much more to come very soon. Andrea Manzo is also in very good relationships with IGCA and all the mythical developers that have partecipated to this year event.
I would quit before I played any role in Shashchess being apart of anything at chesscom lol.
Are you still there? Rewrite your engine from scratch, adding AI queries to try something different (there are specific plug-ins to exploit)
Also Ethereal hasn't been updated since months, you could do more
Chess engines and dedicated chess computers fan since 1981 macOS Sequoia 16GB-512GB, Windows 11 & Ubuntu ARM64. ProteusSF Dev Forum