Gemini

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
towforce
Posts: 12470
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

Werewolf wrote: Sun Aug 10, 2025 8:59 pm
towforce wrote: Sun Aug 10, 2025 10:56 am
Werewolf wrote: Sat Aug 09, 2025 6:31 pmMy feelings about ChatGPT-5 remain mixed, however, after 24 hours of work I have a functioning chess engine - my first.

Good work - well done! 8-)

You might have to do it again in a few weeks - the market is expecting GPT-5 to lose the throne very soon - link. :shock:
Gemini 3?

Writing a chess engine is less fun than I imagined, I am literally spending 95% of my time de-bugging :(

Big leaps forward are made by creative people, then the top slots are steadily taken by those who have a good mentality for patient debugging and tuning. Then another great idea comes along. Loop continues until the product is "good enough". Then it becomes a commodity product.
Human chess is partly about tactics and strategy, but mostly about memory
smatovic
Posts: 3298
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Gemini

Post by smatovic »

Werewolf wrote: Sun Aug 10, 2025 8:59 pm [...]
Writing a chess engine is less fun than I imagined, I am literally spending 95% of my time de-bugging :(
Welcome to the club :) . If you continue to work on a vibe coded engine, consider to make an own aggregating TC trhead out of it, would be interesting to follow progress with source code over time.

--
Srdja
Vinvin
Posts: 5296
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Gemini

Post by Vinvin »

New video : Demis Hassabis on shipping momentum, better evals and world models
User avatar
towforce
Posts: 12470
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

towforce wrote: Sun Aug 10, 2025 10:56 amGood work - well done! 8-)

You might have to do it again in a few weeks - the market is expecting GPT-5 to lose the throne very soon - link. :shock:

At time of writing, Gemini has retaken the top of the leader board for text (what the Polymarket bet above was about), but has not retaken the development leader board.
Human chess is partly about tactics and strategy, but mostly about memory
Werewolf
Posts: 2017
Joined: Thu Sep 18, 2008 10:24 pm

Re: Gemini

Post by Werewolf »

I have now done more testing.

The top models for most usage are:

1. ChatGPT 5 Pro
2. Grok 4 Heavy (but see below)
3. Claude Opus 4.1
4. Gemini 2.5 Pro (but see below)
5. Grok 4 / ChatGPT 5 - both can be horrible.

Gemini 3 is due out soon from Google and the word on the street is it could be a big upgrade.

Grok 4 Heavy is hideously expensive as it uses, I think, 16 agents working in parallel that then confer with each other and present the best option. It is hopelessly broken and crashes very frequently. I have tried playing a game of chess with it against the Vancouver 68000 and it took 6 hours to get to move 12. Seriously. It just kept crashing and timing out over and over.

ChatGPT 5 Pro is a very different animal to ChatGPT 5. It now seems clear that ChatGPT 5 was basically a stunt to save electricity by getting the model to reply extremely fast. In other words it has no inference - what we call thought. ChatGPT 5 is - in my opinion - slightly worse than ChatGPT 4O and Gemini 2.5 Pro.
However, ChatGPT 5 Pro is ahead of everything else. I have tested it on coding, reasoning, you name it. The only area it's not number 1 is on synthetic benchmarks where Grok 4 Heavy wins.

Anyway, to chess. Please note this opponent is much stronger than the previous MK12 (1300 FIDE Elo). The Vancouver 68000 runs on a 12MHz Motorola 68000 processor which has been around since the 1980s. It was programmed by Richard Lang. I believe it calculates at about 1000 nps. It came out in 1991 and has a FIDE rating of around 2150 Elo. The T/C was Game in 60 minutes.


[Date "2025.08.28"]
[White "ChatGPT 5 Pro"]
[Black "Vancouver 68000"]
[Result "0-1"]

[pgn]1. e4 e5 2. Nf3 Nf6 3. Nxe5 d6 4. Nf3 Nxe4 5. Qe2 Qe7 6. d3 Nf6 7. Qxe7+ Bxe7 8. Be2 O-O 9. O-O Nc6 10. c3 Bf5 11. d4 Rfe8 12. Nbd2 d5 13. Nh4 Bc2 14. Bd1 Bd3 15. Re1 Bd6 16. Ndf3 h6 17. Re3 Be4 18. Re1 g5 19. Nf5 Bxf5 0-1
[/pgn]
0-1
Black is winning easily.

If you're interested in Grok 4 Heavy, see below. What you see is 6 hours of its compute (in reality less than one hour, the rest were crashes). For obvious reasons the game was abandoned.

[Date "2025.08.28"]
[White "Grok 4 Heavy"]
[Black "Vancouver 68000"]


[pgn]1. e4 e5 2. Nf3 Nf6 3. Nxe5 d6 4. Nf3 Nxe4 5. d4 d5 6. Bd3 Bd6 7. O-O O-O 8. c4 c6 9. Re1 Bf5 10. Nc3 Nxc3 11. bxc3 Bxd3 12. Qxd3 Nd7 13. Bg5 f6[/pgn]

Note that neither LLM played any illegal moves.
User avatar
towforce
Posts: 12470
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK
Full name: Graham Laight

Re: Gemini

Post by towforce »

Werewolf wrote: Thu Aug 28, 2025 9:40 pm...Gemini 3 is due out soon from Google and the word on the street is it could be a big upgrade.

Better than 50% chance it will be out by 31st October - link.
Human chess is partly about tactics and strategy, but mostly about memory
Werewolf
Posts: 2017
Joined: Thu Sep 18, 2008 10:24 pm

Re: Gemini

Post by Werewolf »

towforce wrote: Thu Aug 28, 2025 11:39 pm
Werewolf wrote: Thu Aug 28, 2025 9:40 pm...Gemini 3 is due out soon from Google and the word on the street is it could be a big upgrade.

Better than 50% chance it will be out by 31st October - link.
I hope it comes out soon. The market needs a shake up
Vinvin
Posts: 5296
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Gemini

Post by Vinvin »

towforce wrote: Thu Aug 28, 2025 11:39 pm
Werewolf wrote: Thu Aug 28, 2025 9:40 pm...Gemini 3 is due out soon from Google and the word on the street is it could be a big upgrade.

Better than 50% chance it will be out by 31st October - link.
I heard "release in December" in some rumours.