Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scale.

petero2 · Post by **petero2** » Sat Apr 13, 2024 1:07 pm

Vinvin wrote: ↑Fri Feb 09, 2024 3:22 am Paper : https://arxiv.org/pdf/2402.04494.pdf

"Lichess blitz Elo of 2895 against humans", my estimation is around top 50 FIDE Blitz players.

"We also show that our model outperforms AlphaZero’s policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."

Interesting though that the model is good at getting into won positions, but not good at actually winning a won position. They worked around that problem by using Stockfish to finish the game if the model can not decide how to proceed. From the paper, page 9:

To
prevent some of these situations, we check whether
the predicted scores for all top five moves lie above a
win percentage of 99% and double-check this condi-
tion with Stockfish, and if so, use Stockfish’s top move
(out of these) to have consistency in strategy across
time-steps.

towforce · Post by **towforce** » Sat Apr 13, 2024 1:25 pm

petero2 wrote: ↑Sat Apr 13, 2024 1:07 pmInteresting though that the model is good at getting into won positions, but not good at actually winning a won position...

They haven't yet reached the final destination of encoding chess as a relatively simple algorithm, and have a long way to go to accomplish that. The important point is that they've recognised that this is a good direction to go in.

Here's what probably happened (this, or something similar):

(1) They developed LLMs

(2) They didn't launch them to the public because they're too expensive to run

(3) OpenAI launched ChapGPT 3.5 in November 2022, stunned the world, and offered it for free

(4) This forced Google to launch their own LLM, Bard (now "Gemini")

(5) Google realised that offering a better LLM than ChatGPT (the free version of Gemini is MUCH better than the free version of ChatGPT, and uses technologies that make it cost less to run than using ChatGPT's model would at Gemini's level of intelligence) for free was going to be expensive for them

(6) This created urgency around the problem of running LLMs (and other large models) more cheaply

(7) As ever, chess is one of the first ports to visit on a new AI journey. Chess has been the "drosophila fly" of AI since the 1960s - and, amazingly, still is!

JacquesRW · Post by **JacquesRW** » Sat Apr 13, 2024 1:41 pm

towforce wrote: ↑Sat Apr 13, 2024 9:22 am
In the past, NN's were encoding a large number of surface features. Now they're going after the deep underlying patterns.

This is the most important thing that's ever happened in computer chess - by a wide margin: whatever the second most important thing is, it doesn't even come close to this.

This isn't even new.
I'd recommend attributing your wonder to the people who did it first, better, and with smaller networks:
https://lczero.org/blog/2024/02/how-wel ... -deepmind/

Hell the new DeepMind transformer (which is a value network) doesn't even beat the policy head on recent LC0 networks.

EDIT: By do it first, I mean reach grandmaster level without search - LC0 is of course originally based on A0 (but is far, far stronger).

towforce · Post by **towforce** » Sat Apr 13, 2024 2:33 pm

JacquesRW wrote: ↑Sat Apr 13, 2024 1:41 pm
towforce wrote: ↑Sat Apr 13, 2024 9:22 am
In the past, NN's were encoding a large number of surface features. Now they're going after the deep underlying patterns.

This is the most important thing that's ever happened in computer chess - by a wide margin: whatever the second most important thing is, it doesn't even come close to this.
This isn't even new.
I'd recommend attributing your wonder to the people who did it first, better, and with smaller networks:
https://lczero.org/blog/2024/02/how-wel ... -deepmind/

Hell the new DeepMind transformer (which is a value network) doesn't even beat the policy head on recent LC0 networks.

EDIT: By do it first, I mean reach grandmaster level without search - LC0 is of course originally based on A0 (but is far, far stronger).

My wonder today (I'm still in wonder - hours later!) is because the video linked in the original thread has revealed to me that Google is going in the direction of the final frontier of knowledge (see the new thread I started).

I was not aware that Lc0 could play at GM level at 1 ply: this is VERY impressive, and deserves a lot of credit: well done Lc0 team!

I really don't want to diminish the Lc0 achievement - it's phenomenally good, and I want to see more of it. However, DeepMind's new NN is smaller and faster than Lc0's, and the reason why this is a breakthrough moment is that DeepMind are, for the first time, targeting the deep underlying knowledge rather than the shallow surface knowledge. If they can find a single underlying algorithm for chess, then that's chess solved. If they can then make this algorithm explainable to people, then that's chess solved for everyone. They'll probably stop long before they get to that point, unfortunately: as I said before, the purpose of the exercise is probably to run LLMs more cheaply. However, if enthusiasts continue the cause with the same enthusiasm as they have NNUE and Lc0, then the march to chess's final frontier will continue.

JacquesRW · Post by **JacquesRW** » Sat Apr 13, 2024 3:22 pm

towforce wrote: ↑Sat Apr 13, 2024 2:33 pm However, DeepMind's new NN is smaller and faster than Lc0's

Clearly you didn't read the blogpost I linked, LC0's largest network is over 30% smaller than the DeepMind network. The DeepMind network is also outperformed by the T2 and T80 networks as well, which are three times smaller.

towforce wrote: ↑Sat Apr 13, 2024 2:33 pm and the reason why this is a breakthrough moment is that DeepMind are, for the first time, targeting the deep underlying knowledge rather than the shallow surface knowledge.

This is pure rubbish, A0 approaches have always targeted the deep underlying knowledge - and I think its a little insulting to the authors who pioneered such approaches to suggest that their efforts only targeted "shallow surface knowledge". The DeepMind paper uses a transformer architecture, and guess what LC0 has been using for quite a while already!

smatovic · Post by **smatovic** » Sat Apr 13, 2024 4:14 pm

towforce wrote: ↑Sat Apr 13, 2024 1:25 pm [...]
Here's what probably happened (this, or something similar):

(1) They developed LLMs

(2) They didn't launch them to the public because they're too expensive to run
[...]

Google published its Bert Transformers paper in ~2018.

https://epsilon.app26.de/post/meme-machines/

And developers like Ilya Sutskever (OpenAI) mentioned that they simply could not imagine what the use case for their GPT-3 model could be, they were too aware of the weak sides of their machines...

--
Srdja

smatovic · Post by **smatovic** » Sat Apr 13, 2024 4:19 pm

JacquesRW wrote: ↑Sat Apr 13, 2024 3:22 pm [...]
The DeepMind paper uses a transformer architecture, and guess what LC0 has been using for quite a while already!
[...]

Is there any documentation on Lc0 attention network beside source code? Can you elaborate about the advantage of attention nn vs CNN? Thx.

--
Srdja

towforce · Post by **towforce** » Sat Apr 13, 2024 5:26 pm

JacquesRW wrote: ↑Sat Apr 13, 2024 3:22 pmClearly you didn't read the blogpost I linked, LC0's largest network is over 30% smaller than the DeepMind network. The DeepMind network is also outperformed by the T2 and T80 networks as well, which are three times smaller.

Yes - you are right.

towforce wrote: ↑Sat Apr 13, 2024 2:33 pm and the reason why this is a breakthrough moment is that DeepMind are, for the first time, targeting the deep underlying knowledge rather than the shallow surface knowledge.
This is pure rubbish, A0 approaches have always targeted the deep underlying knowledge - and I think its a little insulting to the authors who pioneered such approaches to suggest that their efforts only targeted "shallow surface knowledge". The DeepMind paper uses a transformer architecture, and guess what LC0 has been using for quite a while already!

You are right again. I re-read the DeepMind paper (linked at the start of this thread, which I hadn't read since it was first published), and it doesn't claim to be targeting the deep underlying patterns. It's the video that makes that claim - without any basis.

Well that's deflated the wonder of the day: it was just YouTube hype. The good news is that I'm still going to be the first person to uncover the deep patterns in chess!

Btw - Lc0 doesn't "target" shallow knowledge - it's just that this is what it (and other NNs) end up getting. We discussed this at Talkchess some years ago. IIRC the "strongish" evidence was that, at the time of discussion, Lc0 was relatively weak in the endgame.

lkaufman · Post by **lkaufman** » Sat Apr 13, 2024 6:31 pm

Werewolf wrote: ↑Fri Feb 09, 2024 8:11 pm 2900 LiChess would be, what? 2500 FIDE Elo?

2600 FIDE would be about right. According to https://chessgoals.com/rating-comparison/, which is current and seems about right to me, the gap is about 150 at strong amateur level, about 200 at IM level, and reaches 300 at strong GM level. Compared to chess.com blitz ratings, Lichess ratings are higher for amateurs but lower for IM/GM level players, they appear to be equivalent at the 2450 point.

Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scale.

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal

Re: Grandmaster-Level Chess Without Search. Large-scale attention-based architectures and datasets of unprecedented scal