Gemini

smatovic · Post by **smatovic** » Sat Sep 28, 2024 7:30 am

Maybe worth to mention that Strawberry/ChatGPT-01 uses meanwhile some kind of reinforcement-learning (think transition of AlphaGo to AlphaZero) and additional tree of thoughts for reasoning:

https://en.wikipedia.org/wiki/Reinforcement_learning

Tree of Thoughts: Deliberate Problem Solving with Large Language Models
https://arxiv.org/abs/2305.10601

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.

--
Srdja

towforce · Post by **towforce** » Sat Sep 28, 2024 10:35 am

smatovic wrote: ↑Sat Sep 28, 2024 7:30 amTree of Thoughts: Deliberate Problem Solving with Large Language Models
https://arxiv.org/abs/2305.10601

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.

Very good find, and a good read! Thank you.

Users are forbidden from trying to work out how o1 works (link), but hopefully open source systems with similar capabilities will become available in the not-too-distant future. The text above certainly looks promising in this respect. It's possible that an important reason why OpenAI wants to hide the way o1 works is that what they've done is actually close to what's in that paper.

Btw - I know it's not the point - but I have no doubt that a chess programmer could build a better Game Of 24 solver without using an LLM! LLMs obviously aren't always the best solution.

towforce · Post by **towforce** » Sat Sep 28, 2024 2:43 pm

smatovic wrote: ↑Sat Sep 28, 2024 7:30 amTree of Thoughts: Deliberate Problem Solving with Large Language Models
https://arxiv.org/abs/2305.10601

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%.

One more thing about this: these trees would have similarities to game trees in computer chess, in that they're both generated data in tree structures. This could be an important force multiplier for chatbots going forward. Here's a brief overview of the history of game trees in chess:

* hand coded evaluations (HCEs) used to be weak (in the sense that they misevaluated a lot of chess positions)

* game trees acted as a force multiplier to these HCEs

* likewise, HCEs acted as a force multiplier to the weaknesses of game trees (especially the horizon effect)

* as computers got faster, engine developers found that the enlarged game tree was more useful than snippets of knowledge in the HCE (and some of that knowledge even became redundant), so knowledge was sacrificed in order to generate a larger game tree

* DeepMind showed us that using NNs for the eval instead of HCE gave a huge boost in playing strength - despite the fact that they took a lot longer to run than HCEs (and hence reduced the size of the game tree)

* so now again, eval and game tree are now both acting as force multipliers to each other

Now, the study linked by Srdja is showing us that trees can act as force multipliers to LLMs.

It is crystal clear to me that there is enough similarity between chatbot progress and chess progress to merit discussion about analogy between chess and chat!

smatovic · Post by **smatovic** » Sat Sep 28, 2024 3:53 pm

towforce wrote: ↑Sat Sep 28, 2024 2:43 pm One more thing about this: these trees would have similarities to game trees in computer chess...
[...]

You are not the only one to draw these similarities:

Re: Insight About Genetic (Evolutionary) Algorithms
viewtopic.php?p=949466#p949466

--
Srdja

towforce · Post by **towforce** » Sat Sep 28, 2024 5:36 pm

smatovic wrote: ↑Sat Sep 28, 2024 3:53 pm
towforce wrote: ↑Sat Sep 28, 2024 2:43 pm One more thing about this: these trees would have similarities to game trees in computer chess...
[...]
You are not the only one to draw these similarities:

Re: Insight About Genetic (Evolutionary) Algorithms
viewtopic.php?p=949466#p949466

--
Srdja

I'd forgotten that gem of a thread - thank you! Exalted company indeed - Demis Hassabis.

Werewolf · Post by **Werewolf** » Sun Sep 29, 2024 11:15 pm

towforce wrote: ↑Sat Sep 28, 2024 12:31 am
Werewolf wrote: ↑Sat Sep 21, 2024 4:45 pm I have now spent many hours writing a chess engine with 4 AIs.

There is no doubt about it the order of coding usefulness goes like this:

1) ChatGPT-01 Mini
2) ChatGPT-01 Preview

Then miles behind
3) Gemini Advanced
4) Claude Sonnet 3.5

What ChatGPT-01 seems to be doing to achieve more steps of logic is, roughly:

1. Solving a bit of what has been asked
2. Taking this bit of the solution and automatically re-prompting itself to get the next bit

So basically - using itself iteratively. Would you agree?

In case anyone is unaware, the price of using ChatGPT-01 is a lot higher than other LLMs (in the sense that you are allowed far fewer prompts for your subscription fee), but it will still be the best option for some use cases.

I’m not sure what it’s doing, they seem to have disguised the workings to make it hard to copy. If anyone asks it for “trace reasoning” they get a warning.

Both GPT-01 and 01 mini can do things previous versions can’t, so the progress seems genuine.

towforce · Post by **towforce** » Mon Sep 30, 2024 12:42 am

Werewolf wrote: ↑Sun Sep 29, 2024 11:15 pmI’m not sure what [GPT-o1 is] doing, they seem to have disguised the workings to make it hard to copy. If anyone asks it for “trace reasoning” they get a warning.

Both GPT-01 and 01 mini can do things previous versions can’t, so the progress seems genuine.

I completely agree: I haven't used it, but I accept that it's very likely to be the best chatbot available right now.

Per Srdja's post above, there's a good chance that it generates a data tree for its reasoning choices, which would be a similarity with chess engines (which generate data trees called "game trees", as you probably know!). I started a new thread making the case that chatbots might follow a similar path to chess engines, that there's an upper limit to how good they can get, and that if their journey is similar to chess engines, then they could get near that level in 20-30 years (link). As I pointed out, you would need limits on what you'd expect of a chatbot: chess engines have the advantage here, having a more limited domain.

jefk · Post by **jefk** » Mon Sep 30, 2024 8:59 am

the AI's are evolving, and the latest Qstar/Strawberry/openai40
are becoming better, but for chess moves you anyway need search
(preferably looking up a neural net but then also still some search in the tree,
simply because of game theory and eg. the minimax theorem by von Neumann).

Here's a guy going to do a little project/test like towforce also seems to
want to do:
https://saychess.substack.com/p/can-i-b ... ed-opening
After a while it might work, and then hopefully better than eg. as the chinese database
(with all the drawing moves in the opening stage) because as i said a few times already
this nowadays isn't so good for human chess (nor for correspondence chess imo unless
you only want to play draws, and/or waiting for your opponent to make a severe input mistake
(or passing away). For a human repertoire you need to specify how sharp you want to play,
etc. etc. (eg. adapting to your opponent but also taking into consideration your own style/
preference of playing.

Werewolf · Post by **Werewolf** » Mon Sep 30, 2024 9:28 am

jefk wrote: ↑Mon Sep 30, 2024 8:59 am the AI's are evolving, and the latest Qstar/Strawberry/openai40
are becoming better, but for chess moves you anyway need search
(preferably looking up a neural net but then also still some search in the tree,
simply because of game theory and eg. the minimax theorem by von Neumann).

Here's a guy going to do a little project/test like towforce also seems to
want to do:
https://saychess.substack.com/p/can-i-b ... ed-opening
After a while it might work, and then hopefully better than eg. as the chinese database
(with all the drawing moves in the opening stage) because as i said a few times already
this nowadays isn't so good for human chess (nor for correspondence chess imo unless
you only want to play draws, and/or waiting for your opponent to make a severe input mistake
(or passing away). For a human repertoire you need to specify how sharp you want to play,
etc. etc. (eg. adapting to your opponent but also taking into consideration your own style/
preference of playing.

Well it depends on what level you want to play at. For 1400 Elo you don't need a search, and in some positions it's already beyond that.

towforce · Post by **towforce** » Mon Sep 30, 2024 10:30 am

jefk wrote: ↑Mon Sep 30, 2024 8:59 am...but for chess moves you anyway need search.. ..simply because of game theory and eg. the minimax theorem by von Neumann).

Many high quality people have been saying this for a long time (even Donald Knuth said it in The Art Of Computer Programming: it's been over 40 years since I read that, but I think his wording was something like "such [problems] defy analytical solution". I accept that it has been proven that the number of moves rises at a polynomial rate with the size of the board (so a game of chess on, say, a 16x16 board would take a HUGE number of moves (though it would solve the draw problem for today's computers)), but I don't accept that it's impossible to create heuristics that could play the game extremely well without search.

Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini