Werewolf wrote: ↑Sat Aug 09, 2025 6:31 pmMy feelings about ChatGPT-5 remain mixed, however, after 24 hours of work I have a functioning chess engine - my first.
Good work - well done!
You might have to do it again in a few weeks - the market is expecting GPT-5 to lose the throne very soon - link.
Gemini 3?
Writing a chess engine is less fun than I imagined, I am literally spending 95% of my time de-bugging
Big leaps forward are made by creative people, then the top slots are steadily taken by those who have a good mentality for patient debugging and tuning. Then another great idea comes along. Loop continues until the product is "good enough". Then it becomes a commodity product.
Human chess is partly about tactics and strategy, but mostly about memory
Werewolf wrote: ↑Sun Aug 10, 2025 8:59 pm
[...]
Writing a chess engine is less fun than I imagined, I am literally spending 95% of my time de-bugging
Welcome to the club . If you continue to work on a vibe coded engine, consider to make an own aggregating TC trhead out of it, would be interesting to follow progress with source code over time.
towforce wrote: ↑Sun Aug 10, 2025 10:56 amGood work - well done!
You might have to do it again in a few weeks - the market is expecting GPT-5 to lose the throne very soon - link.
At time of writing, Gemini has retaken the top of the leader board for text (what the Polymarket bet above was about), but has not retaken the development leader board.
Human chess is partly about tactics and strategy, but mostly about memory
1. ChatGPT 5 Pro
2. Grok 4 Heavy (but see below)
3. Claude Opus 4.1
4. Gemini 2.5 Pro (but see below)
5. Grok 4 / ChatGPT 5 - both can be horrible.
Gemini 3 is due out soon from Google and the word on the street is it could be a big upgrade.
Grok 4 Heavy is hideously expensive as it uses, I think, 16 agents working in parallel that then confer with each other and present the best option. It is hopelessly broken and crashes very frequently. I have tried playing a game of chess with it against the Vancouver 68000 and it took 6 hours to get to move 12. Seriously. It just kept crashing and timing out over and over.
ChatGPT 5 Pro is a very different animal to ChatGPT 5. It now seems clear that ChatGPT 5 was basically a stunt to save electricity by getting the model to reply extremely fast. In other words it has no inference - what we call thought. ChatGPT 5 is - in my opinion - slightly worse than ChatGPT 4O and Gemini 2.5 Pro.
However, ChatGPT 5 Pro is ahead of everything else. I have tested it on coding, reasoning, you name it. The only area it's not number 1 is on synthetic benchmarks where Grok 4 Heavy wins.
Anyway, to chess. Please note this opponent is much stronger than the previous MK12 (1300 FIDE Elo). The Vancouver 68000 runs on a 12MHz Motorola 68000 processor which has been around since the 1980s. It was programmed by Richard Lang. I believe it calculates at about 1000 nps. It came out in 1991 and has a FIDE rating of around 2150 Elo. The T/C was Game in 60 minutes.
If you're interested in Grok 4 Heavy, see below. What you see is 6 hours of its compute (in reality less than one hour, the rest were crashes). For obvious reasons the game was abandoned.