Interesting though that the model is good at getting into won positions, but not good at actually winning a won position. They worked around that problem by using Stockfish to finish the game if the model can not decide how to proceed. From the paper, page 9:Vinvin wrote: ↑Fri Feb 09, 2024 3:22 am Paper : https://arxiv.org/pdf/2402.04494.pdf
"Lichess blitz Elo of 2895 against humans", my estimation is around top 50 FIDE Blitz players.
"We also show that our model outperforms AlphaZero’s policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."
To
prevent some of these situations, we check whether
the predicted scores for all top five moves lie above a
win percentage of 99% and double-check this condi-
tion with Stockfish, and if so, use Stockfish’s top move
(out of these) to have consistency in strategy across
time-steps.