Gemini

Werewolf · Post by **Werewolf** » Fri Dec 06, 2024 10:27 am

towforce wrote: ↑Fri Dec 06, 2024 10:02 am
Werewolf wrote: ↑Fri Dec 06, 2024 9:49 amAlso, there's a "Pro" subscription which allows 01 to think for longer (inference)...it's $200/month. Ouch.
That's a new level of subscription for a chatbot.

Testing complete for the GPT-01 (full) version, which scored 75% and is the new winner. Assuming my crude test is accurate (big assumption, but anyway) that puts it just eligible for Mensa - an IQ of around 150.

I wish people could see the significance of what I'm writing: in the past we easily found chess problems that were too hard for chess engines. Only in recent years has this changed with it now being very hard to find a positon Stockfish / Lc0 can't solve.
I asssumed there would be a similar trend in AI and reasoning, but in the space of a few years it has become smarter than 95% of humans in the sphere of verbal reasoning. I am truly shocked. It is now getting very hard to find problems it cannot solve!

Also, the speed of GPT-01 is staggering. It is literally 10x faster than GPT-01 (preview).

More scary still is I haven't been able to test the super-smart version (GPT-01 allowed to think for longer) because of the cost.

smatovic · Post by **smatovic** » Fri Dec 06, 2024 11:23 am

You are not alone

I am regularly flashed too, on a positive side and negative, then I read things like going from 100K GPU data center to 1M GPU data center, and only wonder where this will all lead to or end...flashing times ahead, or alike

--
Srdja

towforce · Post by **towforce** » Fri Dec 06, 2024 2:54 pm

This reviewer doesn't think that the new GPT-o1 and GPT-o1 pro chatbots are that much better than GPT-4, so maybe don't buy a subscription just yet.

If he's right in saying that GPT-o1 pro just has a voting mechanism, that would be especially disappointing: it should be breaking the question down into steps, and resolving each of the steps separately - maybe even with some trial and error.

Werewolf · Post by **Werewolf** » Fri Dec 06, 2024 7:31 pm

towforce wrote: ↑Fri Dec 06, 2024 2:54 pm This reviewer doesn't think that the new GPT-o1 and GPT-o1 pro chatbots are that much better than GPT-4, so maybe don't buy a subscription just yet.

I didn't get the impression he was saying that. The main comparisons were between GPT-01 (preview) and GPT-01 and GPT-01 (Pro). All of these slaughter GPT-4 in maths and coding and verbal reasoning. I've tested this extensively trying to write a program (on coding) and IQ tests (on maths and reasoning) and it isn't close.

His points around Pro are somewhat valid: I was hoping for more than this out of 01 Pro to be honest. But the main thing is how many times you use it. If you can cope with 50 messages a week, stick with the $20/month option. But some people, especially companies, will use that limit in a few hours. For the company I work for it's far cheaper to buy the $200/month subscription that to pay a consultant.
However, it's probably not good enough yet to replace a human.

towforce · Post by **towforce** » Thu Dec 12, 2024 11:51 am

BIG Gemini update today (though Google is not trumpeting Gemini updates in the same way that OpenAI does (lots of hype with every announcement)).

Firstly, it now has multimodal agents:

Secondly, and I only know about this because of an info box that popped up when I opened Gemini Pro just now, but there's a new feature called "Gemini Pro Deep Research" (I think it's only for subscribers right now):

1. You ask it to research an issue

2. It writes a research plan for you

3. You approve the plan

4. It spends several minutes searching the web and collating information

5. It creates a report

Just to give this a try, I asked it to research an issue that arose in the discussion about the process of selecting new moderators for CCC (if you haven't seen that, the thread has been a bit crazy in some of the directions it has taken) - testosterone levels in men.

1. My initial prompt was: Research relatively easy ways a man can increase his testosterone level.

2. Gemini quickly produced a research plan

3. I approved the research plan

4. Several minutes elapsed

5. Google produced its research document. I have shared this on Google docs - link.

Werewolf · Post by **Werewolf** » Fri Dec 13, 2024 5:54 pm

Interesting.

I haven't tested the deep research option yet, but I did test Gemini 2.0 Flash. It scored 55%, the same as 1.5 Pro.

towforce · Post by **towforce** » Fri Dec 13, 2024 6:28 pm

Werewolf wrote: ↑Fri Dec 13, 2024 5:54 pm...I did test Gemini 2.0 Flash. It scored 55%, the same as 1.5 Pro.

I am confused by the current versioning. The only thing I've seen tested (at Tom's Hardware) is that 2.0 Flash is superior to 1.5 .

"Flash" in chatbots means smaller, lighter and faster - and I understand that, using recent innovations, a "flash" version of a chatbot can be almost as good as the full versions.

The versions I have available are:

1.5 Pro (tackle complex tasks)
1.5 Flash (get everyday help)
1.5 Pro With Deep Research
2.0 Flash Experimental

Obviously, if you want to use the previously mentioned research functionality, you have to use the third option: clear and simple.

If, however, you want, say, personal coaching, then you just want to use the most intelligent model they have. Research means waiting several minutes, and the output is a report. This is not usually what you want for coaching, which is more likely to be a back-and-forth dialogue. But which is the most intelligent model - 1.5 Pro or 2.0 Flash?

Searching (link) brings back the answer that 2.0 Flash has capabilities that that 1.5 Pro doesn't - but I'm not seeing anything being reported on intelligence. I suppose I'll just have to give it a try. My best guess is that it is more intelligent.

When I did my testing several months ago, I found that Gemini was, for me, the most suitable coaching partner. I decided to stick to it, rather than keep retesting many chatbots. Now it's necessary to try different versions even under the same brand.

A positive indicator for 2.0 as a coach, though: an experimental version of Gemini is top of the leader board (link), so it's likely to be one that people like using.

It's surprising that Inflection's pi.ai isn't on the leader board: their big claim for it is that it's built to have a high emotional intelligence, so you'd expect people to like it.

Werewolf · Post by **Werewolf** » Thu Feb 20, 2025 7:06 pm

The World has a new Robot Overlord: Grok 3. This is now claimed - and not just by Elon Musk, but also by others to be the world's smartest AI.

I tested it on the world's best hobby: chess.

The board tracking is much improved over ChatGPT (all versions) - it did not lose the thread of the game even once. Reponses were instant.
Now for the chess, which is less that stellar still:

[White: Carl Bicknell]
[Black: Grok 3]

[pgn]1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Bxc6 dxc6 5. O-O f6 6. d4 exd4 7. Nxd4 c5 8. Nb3 Qxd1 9. Rxd1 Bd7 10. Bf4 O-O-O 11. Nc3 Ne7 12. Nxc5 Bc6 13. Rxd8+ Kxd8 14. Rd1+ Kc8 15. Ne6 g5 16. Rd8#[/pgn]

DomL77 · Post by **DomL77** » Thu Feb 20, 2025 10:11 pm

I'm using Grok 3 to see if it can create a decent personality settings for Komodo Dragon and Rodent IV are any good to simulate Super GM's?

A few commands, create a X player at his prime using classical games should do it

towforce · Post by **towforce** » Thu Feb 20, 2025 11:59 pm

Werewolf wrote: ↑Thu Feb 20, 2025 7:06 pm The World has a new Robot Overlord: Grok 3. This is now claimed - and not just by Elon Musk, but also by others to be the world's smartest AI.

I tested it on the world's best hobby: chess.

The board tracking is much improved over ChatGPT (all versions) - it did not lose the thread of the game even once. Reponses were instant.
Now for the chess, which is less that stellar still:

[White: Carl Bicknell]
[Black: Grok 3]

[pgn]1. e4 e5 2. Nf3 Nc6 3. Bb5 a6 4. Bxc6 dxc6 5. O-O f6 6. d4 exd4 7. Nxd4 c5 8. Nb3 Qxd1 9. Rxd1 Bd7 10. Bf4 O-O-O 11. Nc3 Ne7 12. Nxc5 Bc6 13. Rxd8+ Kxd8 14. Rd1+ Kc8 15. Ne6 g5 16. Rd8#[/pgn]

Hmmm... not sure that's how the Spanish is supposed to be played.

Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini

Re: Gemini