Gemini

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
towforce
Posts: 12376
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

Werewolf wrote: Wed Nov 13, 2024 7:07 pmQuestion 9 (From Test 3, number 15): What, with reference to this question, is the next number in the sequence below?
3, 3, 5, 1, 3, 4, 1, 2, 3, 4, 1, 2, ?
Answer: 4. There is no mathematical pattern, that's a red-herring. The numbers represent the number of consonants in each word of the question, hence the precise wording of the question.

Damn - should have paid more attention to the wording of the question! :oops:

In case anyone missed the announcement, there's a new forum for AI discussion - link. This includes a discussion on the similarities between the progress of chess computers and the progress of chatbots!
Human chess is partly about tactics and strategy, but mostly about memory
Werewolf
Posts: 1996
Joined: Thu Sep 18, 2008 10:24 pm

Re: Gemini

Post by Werewolf »

Mods feel free to move this if you wish then.

Lama 3.1 Nemotron just tested and scored a terrible 35% by the way :(
User avatar
towforce
Posts: 12376
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

Experimental new version of Gemini jumps straight to the top of the "humans like the response" league table (users are given two answers to a prompt, not told which chatbot gave each answer, and asked to choose the one they prefer):




There are two major reasons I subscribed to Gemini: that it integrates with Google documents, and that I prefer its coaching answers. It also gives you access to NotebookLM, which is a very useful product.
Human chess is partly about tactics and strategy, but mostly about memory
Werewolf
Posts: 1996
Joined: Thu Sep 18, 2008 10:24 pm

Re: Gemini

Post by Werewolf »

towforce wrote: Fri Nov 15, 2024 9:12 pm Experimental new version of Gemini jumps straight to the top of the "humans like the response" league table (users are given two answers to a prompt, not told which chatbot gave each answer, and asked to choose the one they prefer):


I can't see this AI in the Gemini list I use, is it regionally restricted?
User avatar
towforce
Posts: 12376
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

Werewolf wrote: Sat Nov 16, 2024 7:15 pmIt's available on HuggingFace, where you can try lots of different chatbots, and rate them.

Click on this link, and then click on "Leaderboard": the new Gemini is the one at the top of the list. :)
I had a go at the side by side challenge with a coaching question: I choose the new Gemini (1114) against mistral-large-2407. Both gave similar types of answer. Unfortunately, I knew in a heartbeat which answer was from Gemini, because I have done a lot of coaching with it. This introduced bias. I selected the one I believed to be Gemini anyway: one of my key reasons for choosing Gemini is because I like the way it coaches. To be fair, it was clearly the better answer as well.

I am impressed that the side-by-side test sample size for the leader board has millions of examples! 8-)
Human chess is partly about tactics and strategy, but mostly about memory
Werewolf
Posts: 1996
Joined: Thu Sep 18, 2008 10:24 pm

Re: Gemini

Post by Werewolf »

I was unable to get this experimental version of Gemini working, but I suppose it'll be available soon direct from Google.

What I was able to try is the new Deep Seek R1 (known as Deep Think) which is quite a shocking development. It's a reasoning model developed by the Chinese and it's clearly aimed at ChatGPT-01 (Preview) becuase it uses inferance - thinking - and takes 30-90 seconds per answer. What makes it so special is that:
1) It actually shows its workings and you can see it "thinking" in real time.
2) It's open source.

2) could be a real blow to OpenAI becuase they've done everything they can to hide and disguise how ChatGPT-01 works.

Anyway, I tested Deep Think and it scored 50% on my IQ test, which is OK.
User avatar
towforce
Posts: 12376
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

Werewolf wrote: Fri Nov 22, 2024 5:27 pmI was unable to get this experimental version of Gemini working, but I suppose it'll be available soon direct from Google.
It has now been pushed off the top of the leader board by... ...drum roll... ...a new version of itself! You can either try it in the arena (side by side with another chatbot, you choose the best answer - but you won't be told which one it was (if, like me, you're familiar with Gemini, you won't need to be told)).

Alternatively, you can use it directly, on its own, here - link.

Werewolf wrote: Fri Nov 22, 2024 5:27 pmWhat I was able to try is the new Deep Seek R1 (known as Deep Think) which is quite a shocking development. It's a reasoning model developed by the Chinese and it's clearly aimed at ChatGPT-01 (Preview) because it uses inference - thinking - and takes 30-90 seconds per answer. What makes it so special is that:
1) It actually shows its workings and you can see it "thinking" in real time.
2) It's open source.
3) could be a real blow to OpenAI because they've done everything they can to hide and disguise how ChatGPT-01 works.

Anyway, I tested Deep Think and it scored 50% on my IQ test, which is OK.
What did GPT-o1 score?
Human chess is partly about tactics and strategy, but mostly about memory
Werewolf
Posts: 1996
Joined: Thu Sep 18, 2008 10:24 pm

Re: Gemini

Post by Werewolf »

towforce wrote: Fri Nov 22, 2024 6:00 pm
What did GPT-o1 score?
I showed GPT-01 (Preview) and other results a few posts back, it scored 14/20 or 70%. Note this is not GPT-01 which hasn't been released yet. We are expecting that in a week or so and it could be very good indeed.

Thanks for the link with the new experimental Gemini. I just tested it and it scored 11/20 or 55%.
Werewolf
Posts: 1996
Joined: Thu Sep 18, 2008 10:24 pm

Re: Gemini

Post by Werewolf »

Yesterday GPT-01 (full version) was released. I am testing it now. It seems to combine some of the usability of GPT-4O with the intelligence of 01 (preview). This could be useful for writing a chess program, and over Christmas I'll test this by trying to get my Connect 4 program working.

Also, there's a "Pro" subscription which allows 01 to think for longer (inferance)...it's $200/month. Ouch.
User avatar
towforce
Posts: 12376
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

Werewolf wrote: Fri Dec 06, 2024 9:49 amAlso, there's a "Pro" subscription which allows 01 to think for longer (inference)...it's $200/month. Ouch.
That's a new level of subscription for a chatbot. :shock:

Back in the day, high software subscription charges used to be justified on the basis of how much money they saved you. If you have a commercial need for this level of intelligence, or if you're sufficiently prosperous that this money is insignificant, then you can say, "That's a lot cheaper than employing someone" and pay the toll.

This way of justifying high prices for software ended when competition brought the price down towards what it cost to make rather than what the vendor could get away with.

However, in this case, there are actual costs to providing this level of intelligence. As time passes, providing this level of intelligence will become cheaper, and the price will fall. However, there will then be new levels of intelligence which still command this price.

The general trend is that rich people get new tech first, then it becomes affordable and everyone gets it. That seems to be where we're at with this subscription level.
Human chess is partly about tactics and strategy, but mostly about memory