The Stockfish of shogi

lkaufman · Post by **lkaufman** » Fri Mar 20, 2020 2:59 pm

aphirst wrote: ↑Fri Mar 20, 2020 9:43 am
As you can see, I get lost very quickly, so there's currently no chance of me working out what the equivalent would be for FIDE chess. I'd be very interested to see this tried out though, as NNUE-shogi engines are now obscenely powerful - orqha1018+dolphin1 is no AlphaZero but it's still a monster.

From what I read, orqha/dolphin beats Elmo by about the same margin as AlphaZero did, so what is the basis for your statement that it "is no AlphaZero"? It's different, sure, but apparently just as strong, without using a GPU. I've played lots of games with it, and while I can usually win if I'm careful with rook and bishop handicap, I almost invariably lose with rook and lance handicap. Against the top human pros, I've generally scored well with just a rook handicap. I would love to see a top pro take a bishop handicap from dolphin.

jp · Post by jp » Fri Mar 20, 2020 4:40 pm

lkaufman wrote: ↑Fri Mar 20, 2020 2:59 pm From what I read, orqha/dolphin beats Elmo by about the same margin as AlphaZero did, so what is the basis for your statement that it "is no AlphaZero"? It's different, sure, but apparently just as strong, without using a GPU.

If its results are as good on much weaker hardware, then it's actually much stronger than A0.

There were complaints from the Elmo/shogi community that were similar to the complaints from the SF/chess community about the conditions of the Elmo-A0 games too.

aphirst · Post by **aphirst** » Fri Mar 20, 2020 8:02 pm

lkaufman? As in, Larry Kaufman? I'm honoured, I'm sure.

lkaufman wrote:what is the basis for your statement that it "is no AlphaZero"? It's different, sure, but apparently just as strong, without using a GPU

It's entirely possible that my understanding of its strength is outdated. I was under the impression that the AlphaZero published ELO was still higher even than orqha1018/dolphin1, though with a much narrower gap.

On qhapaq's site the listed ELO for dolphin1/orqha1018 is 4393. AlphaZero[1] Shogi's ELO is apparently around 4400, and DeepMind's MuZero[2] apparently has a shogi ELO of about 4700. I had to eyeball those last two since I can't see an explict ELO figure given - presumably I'm overlooking something. If someone has exact figures, I hope they can correct me.

[1] https://arxiv.org/abs/1712.01815
[2] https://arxiv.org/abs/1911.08265

(As an aside, MuZero seems to not have the rules pre-programmed. Interesting! I hadn't heard of it until I started searching for AlphaZero ELO.)

I'm certainly not qualified to judge the fairness of the ELO evaluations, and I'm aware that's a fiercely-contended (and rightfully so) aspect, but this would seem to suggest that orqha1018/dolphin1 is at least approximately level with the 2017 AlphaZero publication, but currently outclassed by MuZero by a similar degree to how AlphaZero outclassed Elmo (et al). I will of course defer to the experts - that's the reason I came here, after all.

lkaufman · Post by **lkaufman** » Sat Mar 21, 2020 2:04 am

aphirst wrote: ↑Fri Mar 20, 2020 8:02 pm lkaufman? As in, Larry Kaufman? I'm honoured, I'm sure.

lkaufman wrote:what is the basis for your statement that it "is no AlphaZero"? It's different, sure, but apparently just as strong, without using a GPU
It's entirely possible that my understanding of its strength is outdated. I was under the impression that the AlphaZero published ELO was still higher even than orqha1018/dolphin1, though with a much narrower gap.

On qhapaq's site the listed ELO for dolphin1/orqha1018 is 4393. AlphaZero[1] Shogi's ELO is apparently around 4400, and DeepMind's MuZero[2] apparently has a shogi ELO of about 4700. I had to eyeball those last two since I can't see an explict ELO figure given - presumably I'm overlooking something. If someone has exact figures, I hope they can correct me.

[1] https://arxiv.org/abs/1712.01815
[2] https://arxiv.org/abs/1911.08265

(As an aside, MuZero seems to not have the rules pre-programmed. Interesting! I hadn't heard of it until I started searching for AlphaZero ELO.)

I'm certainly not qualified to judge the fairness of the ELO evaluations, and I'm aware that's a fiercely-contended (and rightfully so) aspect, but this would seem to suggest that orqha1018/dolphin1 is at least approximately level with the 2017 AlphaZero publication, but currently outclassed by MuZero by a similar degree to how AlphaZero outclassed Elmo (et al). I will of course defer to the experts - that's the reason I came here, after all.

Yes, I'm Larry. As you say, the elos quoted for AlphaZero and Dolphin are virtually the same, but I think that the Dolphin rating may be on less than optimal hardware, I don't think it was based on anything like a 32 or 64 core threadripper, so it may be that given the best pc you can buy for around $10k or so Dolphin is already closer to MuZero (which I hadn't heard about) than to AlphaZero.

jp · Post by jp » Sat Mar 21, 2020 8:00 am

aphirst wrote: ↑Fri Mar 20, 2020 8:02 pm On qhapaq's site the listed ELO for dolphin1/orqha1018 is 4393. AlphaZero[1] Shogi's ELO is apparently around 4400, and DeepMind's MuZero[2] apparently has a shogi ELO of about 4700. I had to eyeball those last two since I can't see an explict ELO figure given - presumably I'm overlooking something. If someone has exact figures, I hope they can correct me.

You should be very cautious interpreting what DM claim. There's been endless discussion on this board and outside it about the conditions of AZ's games against SF8. What is less known is that similar complaints were made by computer Shogi people about the conditions of the Elmo-AZ games.

If dolphin1/orqha1018 played other Shogi engines in "fairer" conditions, that alone would put it higher than AZ.

You should not look at [1] either. If you want to read it, at least look at the published paper, which has big differences.

aphirst · Post by **aphirst** » Sat Mar 21, 2020 12:35 pm

jp wrote:You should be very cautious interpreting what DM claim.

I'm aware, hence the disclaimers I put into my previous post.

jp wrote:You should not look at [1] either. If you want to read it, at least look at the published paper, which has big differences.

OHH, I wasn't aware that the arxiv preprint was substantially different. Do I have the correct link [1] for the paper now? If so, I can make a clarification edit to my previous post.

[1] https://science.sciencemag.org/content/362/6419/1140

EDIT: It seems I'm past the cooldown for being able to edit my previous post anyway. I suspect the "report" feature is inappropriate for this. I apologise for any confusion this will provide to future readers.

MonteCarlo · Post by **MonteCarlo** » Sat Mar 21, 2020 2:12 pm

aphirst wrote: ↑Fri Mar 20, 2020 8:02 pm
On qhapaq's site the listed ELO for dolphin1/orqha1018 is 4393. AlphaZero[1] Shogi's ELO is apparently around 4400, and DeepMind's MuZero[2] apparently has a shogi ELO of about 4700. I had to eyeball those last two since I can't see an explict ELO figure given - presumably I'm overlooking something. If someone has exact figures, I hope they can correct me.

I don't see any indication of a 300 point improvement for MuZero over AlphaZero in the MuZero paper, for the following reasons (in ascending order of strength, burying the lede

)

First, the authors only say that MuZero "matched" the performance of AlphaZero in Shogi, which seems very unlike them if they actually exceeded AlphaZero's performance by 300 points.

Second, AlphaZero's rating is graphed along with MuZero's (it's the horizontal orange line) and I see no indication at all that MuZero's blue line ever exceeded the AlphaZero orange line at all, much less by 300 points.

Third, the json for the graphs is published with the paper (board_game_elos.json in ancillary files). It shows that in Shogi, the AlphaZero line was 4666, while MuZero maxed out at 4646.

aphirst · Post by **aphirst** » Sat Mar 21, 2020 3:27 pm

Evidently I eyeballed it wrong. Thanks for providing clarification!

jp · Post by jp » Mon Mar 23, 2020 2:29 pm

aphirst wrote: ↑Sat Mar 21, 2020 12:35 pm

How much have you used other shogi engines?

The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi