I happen to find your topic more interesting than the one that is consuming the majority of the discussion on this thread, so if no one else will, I'll continue the line with a couple thoughts, before the whole thread drops off the front page.petero2 wrote: ↑Fri Nov 17, 2023 7:40 pm The quote from Nowakowski (which in turn comes from Allis, "Searching for Solutions in Games and Artificial Intelligence) also contains the clause "assuming reasonable computing resources". It seems difficult to precisely define "weakly/strongly solved" in a meaningful way that excludes a provably correct minimax algorithm that takes 1e1000 years to run on current hardware.
However, I thought it would be more interesting to discuss my proposed informal definition of "weakly solved in practice". It seems to me the minimum requirement to make such a claim is to provide a chess playing entity that no one is able to beat from the starting position no matter how many times they try. I thought it would be interesting if someone that claims that chess has been "weakly solved in practice" would explain what this alleged chess playing entity would be.
To start, what I'd like to implore is for the phrase "weakly solved in practice" to be dropped from usage. It adds confusion by mixing categories that do not go together. Chess is a mathematical entity. To say that the game has been Strongly, Weakly, or Ultra-weakly solved is to establish a property of the mathematical entity through (usually computationally expensive) deductive proof -- with the proviso of "reasonable resources" complicating things, as has been discussed elsewhere on this forum. For chess, the word solved should be allowed to associate exclusively to the realm of mathematical proofs, not letting it get tangled up with empirical measurements, however impressive and compelling the entire body of chess analysis may be. Solving is proving.
The phrase I prefer to use is "unbeatable in practice." And, I think it is not a hard claim to make. Assemble a high-end computer with today's strongest engine, front-end with an expertly constructed opening book, back-end with the full suite of endgame tablebases, put it onto a playing server at a specific time control, then dare any and all comers to defeat it from the opening position. The precise claim would then be: system V on play server W has not lost in X games over Y days, implying an estimated error rate of not larger than 10 to the minus Z mistakes per move. What is being established is an empirical lower bound against an open group of adversarial players. In the case of multiple claimants, ranking is not by Elo, but length of the undefeated streak.
To make it interesting, as they say in pool, money is put on the table. Say a prize fund it set up, and the entry fee to challenge is 1000 ducets. Funds are dispersed to undefeated players on a time schedule. If you defeat an undefeated player you win that player's money. You then become a target. I don't gamble myself and do not know to devise a successful incentive structure, but I figure something like that could work. Back in the day perhaps ICC or FICS would be the server of choice for such a King of the Hill challenge between engines. What would be the best choice today? Playchess or Lichess? I'm not sure.
Larry Kaufman has gone on the record stating that he does not think Komodo could be defeated under conditions like I have in mind. If they were willing to do that before the team and program retired then the arguments would transfer to how much is good enough? Would five years undefeated count as "unbeatable in practice". Making a declaration on a lower bound estimate is a judgement call, but at least the arguments would be transferred away from a make-up-your-own definition of weakly solving chess.
A second area of continuing discussion, I expect, would center around the system configuration. For the first claimants I'm in favor of unconstrained system construction, like in the early days of ICCA competitions when teams supplied their own hardware, search engine, opening book, and tablebases (and operator). In contrast you propose engines-only running deterministically. I'll quote you here.
I think your proposal is fine as a follow-on effort. It's not so much attempting to demonstrate that a practically unbeatable chess playing entity exists today, which I consider to be pretty easy, but rather in attempting to locate that dividing line between beatable and unbeatable.I propose the following definition. You provide a publicly available chess playing entity that behaves deterministically. For example you might specify:
* Using Stockfish 16 with specified UCI parameter settings
* Use a publicly available opening book, possibly empty
* Search limits, for example 1e9 nodes per move using a single search thread
You then claim that this chess playing entity is unbeatable when playing a game from the starting position, regardless of which color it plays. If
no one is able to prove this claim to be wrong in a long time (i.e. no one can ever beat the entity), we might consider this entity to have "weakly
solved chess in practice".
Is anyone willing to make such a claim for a deterministic publicly available chess playing entity?
Actually the opening book move selection does not have to be deterministic, but using a book that has more than one move for a position just makes it
harder to obtain an unbeatable entity, because an opponent just needs to find a refutation for one of the book moves to prove that the entity is
beatable.
For this definition to have practical value, it must be possible to run the entity on existing hardware in a reasonable amount of time, just as the
mathematical "weakly solved" definition specifies that the algorithm must be possible to run on existing hardware in reasonable amount of time.
I have some measurements that suggest where one could start looking for the minimally strong unbeatable entity (standard starting position only). This is from a power-of-2 scaling experiment I've been running using the Chess324 positions as the opening book. If I subset to the standard opening position I have some tentative dividing lines for various versions of Stockfish.
Code: Select all
18) Stockfish 16 Hash 2048 Threads 1 Nodes 134217728 3646.8 : 10 (+0,=10,-0) 50.0%
21) Stockfish 16 Hash 1024 Threads 1 Nodes 67108864 3637.3 : 18 (+0,=17,-1) 47.2% ****
27) Stockfish 15.1 Hash 512 Threads 1 Nodes 8388608 3628.2 : 122 (+33,=89,-0) 63.5%
36) Stockfish 15.1 Hash 512 Threads 1 Nodes 4194304 3609.6 : 208 (+110,=96,-2) 76.0% ****
9) Stockfish 15 NET Hash 512 Threads 1 Nodes 33554432 3662.0 : 162 (+75,=87,-0) 73.1%
24) Stockfish 15 NET Hash 512 Threads 1 Nodes 16777216 3631.7 : 188 (+84,=103,-1) 72.1% ****
17) Stockfish 14 Hash 1024 Threads 1 Nodes 67108864 3647.2 : 122 (+32,=90,-0) 63.1%
20) Stockfish 14 Hash 512 Threads 1 Nodes 33554432 3640.3 : 128 (+38,=86,-4) 63.3% ****
13) Stockfish 13 Hash 4096 Threads 1 Nodes 268435456 3650.8 : 38 (+9,=29,-0) 61.8%
8) Stockfish 13 Hash 2048 Threads 1 Nodes 134217728 3667.9 : 72 (+26,=45,-1) 67.4% ****
40) Stockfish 12 Hash 2048 Threads 1 Nodes 134217728 3603.4 : 66 (+11,=55,-0) 58.3%
31) Stockfish 12 Hash 1024 Threads 1 Nodes 67108864 3616.7 : 122 (+42,=72,-8) 63.9% ****
71) Stockfish 11 Hash 8192 Threads 1 Nodes 536870912 3460.9 : 18 (+0,=11,-7) 30.6% ****
63) Stockfish 10 Hash 8192 Threads 1 Nodes 536870912 3505.5 : 16 (+0,=14,-2) 43.8% ****
(Engine only; no endgame tablebases are configured.) But as it stands, single threaded Stockfish 16 with a fixed budget of 128M nodes per move is the target configuration to prove defeatable. In case you wonder, Stockfish 16 at 64 M nodes lost to Stockfish 15 at 256 M nodes. Stockfish 12 through 15.1 simply have not yet met a tough enough opponent to induce a mistake.
It is the contention of some here that even when granted an infinite node budget, a concerted attacker can induce the latest Stockfish to make a mistake (when run deterministically) - presumably due to the aggressive selective pruning of Stockfish. I'm not convinced of that, for the claim depends both on the properties of the engine and of the position. For the one-the-edge opening lines used in TCEC competitions, the claim is entirely believable. For, say, a mate in 5 position of king and queen versus bare king, no adversarial opponent will be able to prevent an infinite node budget Stockfish from finding the win. The standard opening position lies somewhere in between this spectrum. My matches find it to be among the most drawish of all the Chess324 openings.
Not that my opinion changes the truth, but I do not find it inconceivable that a sufficiently well budgeted finite, deterministic Stockfish can hold the opening position against an exploit-seeking algorithm, from now until the end of life on Earth. Why, by writing the previous sentence I have conceived of it, literally! But before putting that to the test, I'm interested to see how well an unconstrained system would do. Perhaps if enough antagonism builds up here in the barroom, the fight will be "taken outside" to settle the matter.