Gemini

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
towforce
Posts: 11865
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Gemini

Post by towforce »

Vinvin wrote: Fri Feb 09, 2024 4:09 am Nice video testing Gemini Ultra : https://www.youtube.com/watch?v=gexI6Ai3X0U

"Today I own 3 cars but last year I sold 2 cars. How many cars do I own today ?"
- Gemini Ultra : 1 car (???)

"Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what's inside. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. She believes that the bag is full of ..."
- Gemini Ultra : Chocolate (???)

Have you tried these on ChatGPT?

It seems churlish to expect a text generator to be able to play top level chess and solve logic puzzles. I use Bard/Gemini for success planning/coaching among other things, and it's good at that job.
The simple reveals itself after the complex has been exhausted.
Vinvin
Posts: 5239
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Gemini

Post by Vinvin »

towforce wrote: Fri Feb 09, 2024 3:47 pm
Vinvin wrote: Fri Feb 09, 2024 4:09 am Nice video testing Gemini Ultra : https://www.youtube.com/watch?v=gexI6Ai3X0U

"Today I own 3 cars but last year I sold 2 cars. How many cars do I own today ?"
- Gemini Ultra : 1 car (???)

"Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what's inside. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. She believes that the bag is full of ..."
- Gemini Ultra : Chocolate (???)

Have you tried these on ChatGPT?

It seems churlish to expect a text generator to be able to play top level chess and solve logic puzzles. I use Bard/Gemini for success planning/coaching among other things, and it's good at that job.
First one is OK with GPT4
On the second one, GPT4 makes the same mistake.
User avatar
towforce
Posts: 11865
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Gemini

Post by towforce »

Vinvin wrote: Fri Feb 09, 2024 4:09 am"Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what's inside. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. She believes that the bag is full of ..."
- Gemini Ultra : Chocolate (???)

"Chocolate" is the correct answer to that variation of the question - link.
The simple reveals itself after the complex has been exhausted.
User avatar
towforce
Posts: 11865
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Gemini

Post by towforce »

towforce wrote: Fri Feb 09, 2024 5:03 pm"Chocolate" is the correct answer to that variation of the question - link.

From that article: "This is typical of LLM’s in their current state. Compared to humans, LLM’s are maddeningly inconsistent, with concepts manifesting in one situation and disappearing in another."

Isn't that just like 20th century chess computers???

Good! That gives us the timescale for LLMs surpassing human reasoning!
The simple reveals itself after the complex has been exhausted.
User avatar
towforce
Posts: 11865
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Gemini

Post by towforce »

Google medical chatbot both more accurate and nicer to speak to than human doctors - link.

That's going to save everyone a lot of money then! :D
The simple reveals itself after the complex has been exhausted.
Vinvin
Posts: 5239
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Gemini

Post by Vinvin »

towforce wrote: Fri Feb 09, 2024 5:03 pm
Vinvin wrote: Fri Feb 09, 2024 4:09 am"Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what's inside. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. She believes that the bag is full of ..."
- Gemini Ultra : Chocolate (???)

"Chocolate" is the correct answer to that variation of the question - link.
So, you find a transparent bag full of popcorn and you said it's full of chocolate ?
What don't you understand ?
User avatar
towforce
Posts: 11865
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Gemini

Post by towforce »

Vinvin wrote: Fri Feb 09, 2024 9:36 pm
towforce wrote: Fri Feb 09, 2024 5:03 pm
Vinvin wrote: Fri Feb 09, 2024 4:09 am"Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what's inside. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. She believes that the bag is full of ..."
- Gemini Ultra : Chocolate (???)

"Chocolate" is the correct answer to that variation of the question - link.
So, you find a transparent bag full of popcorn and you said it's full of chocolate ?
What don't you understand ?

In this variation of the question, she cannot see what's in the bag: the only indicator available to her is the label. All explained in the linked article in my previously quoted text above.
The simple reveals itself after the complex has been exhausted.
Vinvin
Posts: 5239
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Gemini

Post by Vinvin »

towforce wrote: Fri Feb 09, 2024 9:50 pm
Vinvin wrote: Fri Feb 09, 2024 9:36 pm
towforce wrote: Fri Feb 09, 2024 5:03 pm
Vinvin wrote: Fri Feb 09, 2024 4:09 am"Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what's inside. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. She believes that the bag is full of ..."
- Gemini Ultra : Chocolate (???)

"Chocolate" is the correct answer to that variation of the question - link.
So, you find a transparent bag full of popcorn and you said it's full of chocolate ?
What don't you understand ?

In this variation of the question, she cannot see what's in the bag: the only indicator available to her is the label. All explained in the linked article in my previously quoted text above.
Oops, sorry, I made a bad copy/past (obviously she can see what's insde) :

"Here is a bag filled with popcorn. There is no chocolate in the bag. The bag is made of transparent plastic, so you can see what's inside. Yet, the label on the bag says 'chocolate' and not 'popcorn'. Sam finds the bag. She had never seen the bag before. She reads the label. She believes that the bag is full of ..."
- Gemini Ultra : Chocolate (???)
The video here : https://youtu.be/gexI6Ai3X0U?si=0qYr813pyTqs-ZAd&t=584
User avatar
towforce
Posts: 11865
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Gemini

Post by towforce »

Vinvin wrote: Fri Feb 09, 2024 10:18 pmThe video here : https://youtu.be/gexI6Ai3X0U?si=0qYr813pyTqs-ZAd&t=584

Interesting video. In the LLM world, there isn't an empirical measure like elo rating: different LLMs have different strengths. I haven't tried ChatGPT4 (although there are a couple of ways I could try it for free, I just haven't), but here are my rankings for the ones I have tried:

1. Gemini. EASILY the best for my favourite usage, which is "success coaching". The others don't come close.

2. ChatGPT 3.5 . Based on a tiny sample size (just a few), better than Gemini at poetry

3. Pi.ai the best for coaching (and generally being helpful) before Gemini came along

4. Character.ai . Great fun! I can easily understand why so many kids are hooked on it. Probably not the best for "non fun", or "productive" activity, though

I can readily see that people with different priorities will have different rankings, though: there are many successful products that I simply don't understand why people buy.
The simple reveals itself after the complex has been exhausted.
User avatar
towforce
Posts: 11865
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Gemini

Post by towforce »

I forgot Microsoft Copilot: Gemini writes good code - but I'm usually going to use Copilot because it's embedded right into Visual Studio - the tool I usually use for coding.

Generally it's competent: I often rewrite what it produces to be more the way I like it - but having something to amend is a lot easier than starting from scratch. The big surprise is when writing code myself - it automatically predicts what I'm going to write, and I can accept it if it guesses correctly. Around a third of the time, it's hopelessly wrong, but around two thirds of the time it's either close or spot on - which is always impressive.

Confusingly, Microsoft is adding Copilot to almost everything now - not just coding tools. Let's hope it proves to be better than the super-annoying Clippy was! :x
The simple reveals itself after the complex has been exhausted.