The Stockfish of shogi

Fabian Fichter · Post by **Fabian Fichter** » Wed Jan 08, 2020 1:52 pm

In Fairy-Stockfish the base piece values are the same for all variants, they are only adjusted for a few rules that can heavily influence dynamics like losing chess rules, piece drops, and board size (for sliders). E.g., for drop games, the piece values are scaled by a v_max/(v_max+v) formula, where v_max is around 3 times the value of a queen, so pieces with a high value (v) lose relative strength in drop games. Additionally the piece values are halved for drop games to have a more natural scale for thresholds in futility pruning, razoring, SEE, etc., but this of course does not change their relative value.

In crazyhouse Fairy-Stockfish is only around 200 Elo weaker (~100 of which due to speed) than the multi-variant Stockfish used on lichess for which we heavily tuned dozens/hundreds of parameters (including piece values) specifically for crazyhouse, so the generic adaptions already seem to work well. However, I have not much doubt that playing strength could be increased a lot for shogi by improving the evaluation, especially since the Stockfish-based engines using evaluation files show that with the same search engine you can get a ~1500 Elo stronger engine, but currently my focus is on adding more features/variants.

Back to the original topic of the thread, I found some more info on the most recent generation of evaluation files for shogi at https://github.com/ynasu87/nnue/blob/ma ... s/nnue.pdf (in Japanese).

Abstract
Most of the strongest shogi programs nowadays employ a linear evaluation function, which is computationally efficient but lacks nonlinear modeling capability. This report presents a new class of neural-network-based
nonlinear evaluation functions for computer shogi, called NNUE (Efficiently Updatable Neural-Network-based
evaluation functions). NNUE evaluation functions are designed to run efficiently on CPU using various acceleration techniques, including incremental computation. The first shogi program with a NNUE evaluation
function, the end of genesis T.N.K.evolution turbo type D, will be unveiled at the 28th World Computer Shogi
Championship.

GregNeto · Post by **GregNeto** » Wed Jan 08, 2020 2:02 pm

May I point to aobazero, don´t know how strong it is.

http://www.yss-aya.com/aobazero/index_e.html

Hiroshi's Computer Shogi and Go site is a treasure for me, the samples for the UEC Computer Go workshop are great.

http://www.yss-aya.com/

Ovyron · Post by **Ovyron** » Wed Jan 08, 2020 5:11 pm

Raphexon wrote: ↑Wed Jan 08, 2020 1:42 pm Fairly sure her networks store chess piece values.

Sure, but the value depends on the position, it'd be something like "this Bishop here is useless so it has some 0.80 value and this other Bishop is a killer so it's worth 6.00", not like A/B's "all Bishops are 3.00 but this one is locked to lets subtract some 2.20 penalty, and the other one is great so let's add some 3.00 bonus." I guess my question is how strong a chess engine can play without the starting base piece values and if in games like shogi (where pieces might be more valuable in the hand than in the board) a different approach from base piece values would work better.

Raphexon · Post by **Raphexon** » Wed Jan 08, 2020 5:31 pm

Ovyron wrote: ↑Wed Jan 08, 2020 5:11 pm
Raphexon wrote: ↑Wed Jan 08, 2020 1:42 pm Fairly sure her networks store chess piece values.
Sure, but the value depends on the position, it'd be something like "this Bishop here is useless so it has some 0.80 value and this other Bishop is a killer so it's worth 6.00", not like A/B's "all Bishops are 3.00 but this one is locked to lets subtract some 2.20 penalty, and the other one is great so let's add some 3.00 bonus." I guess my question is how strong a chess engine can play without the starting base piece values and if in games like shogi (where pieces might be more valuable in the hand than in the board) a different approach from base piece values would work better.

Sounds like a piece square table.

lkaufman · Post by **lkaufman** » Wed Jan 08, 2020 7:29 pm

Fabian Fichter wrote: ↑Wed Jan 08, 2020 1:52 pm In Fairy-Stockfish the base piece values are the same for all variants, they are only adjusted for a few rules that can heavily influence dynamics like losing chess rules, piece drops, and board size (for sliders). E.g., for drop games, the piece values are scaled by a v_max/(v_max+v) formula, where v_max is around 3 times the value of a queen, so pieces with a high value (v) lose relative strength in drop games. Additionally the piece values are halved for drop games to have a more natural scale for thresholds in futility pruning, razoring, SEE, etc., but this of course does not change their relative value.

In crazyhouse Fairy-Stockfish is only around 200 Elo weaker (~100 of which due to speed) than the multi-variant Stockfish used on lichess for which we heavily tuned dozens/hundreds of parameters (including piece values) specifically for crazyhouse, so the generic adaptions already seem to work well. However, I have not much doubt that playing strength could be increased a lot for shogi by improving the evaluation, especially since the Stockfish-based engines using evaluation files show that with the same search engine you can get a ~1500 Elo stronger engine, but currently my focus is on adding more features/variants.

Back to the original topic of the thread, I found some more info on the most recent generation of evaluation files for shogi at https://github.com/ynasu87/nnue/blob/ma ... s/nnue.pdf (in Japanese).

Abstract
Most of the strongest shogi programs nowadays employ a linear evaluation function, which is computationally efficient but lacks nonlinear modeling capability. This report presents a new class of neural-network-based
nonlinear evaluation functions for computer shogi, called NNUE (Efficiently Updatable Neural-Network-based
evaluation functions). NNUE evaluation functions are designed to run efficiently on CPU using various acceleration techniques, including incremental computation. The first shogi program with a NNUE evaluation
function, the end of genesis T.N.K.evolution turbo type D, will be unveiled at the 28th World Computer Shogi
Championship.

Are there any chess engines using this NNUE? Is shogi programming now ahead of chess programming?

Fabian Fichter · Post by **Fabian Fichter** » Wed Jan 08, 2020 10:32 pm

lkaufman wrote: ↑Wed Jan 08, 2020 7:29 pm
Fabian Fichter wrote: ↑Wed Jan 08, 2020 1:52 pm In Fairy-Stockfish the base piece values are the same for all variants, they are only adjusted for a few rules that can heavily influence dynamics like losing chess rules, piece drops, and board size (for sliders). E.g., for drop games, the piece values are scaled by a v_max/(v_max+v) formula, where v_max is around 3 times the value of a queen, so pieces with a high value (v) lose relative strength in drop games. Additionally the piece values are halved for drop games to have a more natural scale for thresholds in futility pruning, razoring, SEE, etc., but this of course does not change their relative value.

In crazyhouse Fairy-Stockfish is only around 200 Elo weaker (~100 of which due to speed) than the multi-variant Stockfish used on lichess for which we heavily tuned dozens/hundreds of parameters (including piece values) specifically for crazyhouse, so the generic adaptions already seem to work well. However, I have not much doubt that playing strength could be increased a lot for shogi by improving the evaluation, especially since the Stockfish-based engines using evaluation files show that with the same search engine you can get a ~1500 Elo stronger engine, but currently my focus is on adding more features/variants.

Back to the original topic of the thread, I found some more info on the most recent generation of evaluation files for shogi at https://github.com/ynasu87/nnue/blob/ma ... s/nnue.pdf (in Japanese).

Abstract
Most of the strongest shogi programs nowadays employ a linear evaluation function, which is computationally efficient but lacks nonlinear modeling capability. This report presents a new class of neural-network-based
nonlinear evaluation functions for computer shogi, called NNUE (Efficiently Updatable Neural-Network-based
evaluation functions). NNUE evaluation functions are designed to run efficiently on CPU using various acceleration techniques, including incremental computation. The first shogi program with a NNUE evaluation
function, the end of genesis T.N.K.evolution turbo type D, will be unveiled at the 28th World Computer Shogi
Championship.

Are there any chess engines using this NNUE? Is shogi programming now ahead of chess programming?

Yes, it seems like that is the most recent standard, "Kristallweizen" apparently also uses that: https://github.com/Tama4649/Kristallweizen/
See also http://www.uuunuuun.com/single-post/201 ... -v2019-May

there are remarkable developments of shogi engines. The strongest software at that time has a rating 4150, but now it reaches R4400! (We note that the ELO rating of top human players is about R3100). This dramatical change comes from an invention of a neural network system to evaluate the position. It should be distinguished from the software which uses deep learning, such as Alpha Zero. Instead, people use shallow layers, which is manageable by CPU. This new method was invented by Yu Nasu who belong to a team "the end of genesis T.N.K.evolution turbo type D" in the "world championship of Shogi software 2018 (WCSC28)". While the evaluation file in my last instruction (based on so-called KPPT format) is considerably large (nearly 1GB), the new evaluation (which is called NNUE format) is merely 64MB, but the latter is much stronger than the previous format!

Daniel Shawul · Post by **Daniel Shawul** » Thu Jan 09, 2020 4:24 am

This sounds very interesting! Stockfish should definitely try out the NNEU evaluation function to catch up to lc0 with regard to evaluation.
How is this NNEU trained? Was a bigger net with convolutions trained first and the result distilled to this shallow net ?
Direct training of a shallow neural net is often weaker than one you distill from a bigger net ...

GregNeto · Post by **GregNeto** » Thu Jan 09, 2020 9:40 am

Some older info for the non_programmers:

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Sat Jan 18, 2020 9:06 am

I don't speak Japanese but their architecture seems to be:

W1 = 125388 x 256
W2 = 512 x 32
W3 = 32 x 32
W4 = 32 x 1

And I can make some educated guesses: they exploit the fact that W1 doesn't change much to compute the result of that layer incrementally, which is coincidentally the heaviest layer. The other layers have a structure that's perfect for SIMD optimization.

However, much of the devil is in the details I'm sure, they also seem to exploit the fact that the white or black inputs to W1 don't change per turn so you can just flip them after a move, but captures would be an issue there?

Also, *what* are the inputs exactly? This is critical. 125388 inputs is a lot, so this is in itself the product of something. Piece-on-square x piece-on-square?

Note also that W1 output does not match up with W2 input, it's only half.

aphirst · Post by **aphirst** » Fri Mar 20, 2020 9:43 am

I'm a (bad) amateur shōgi player, and came upon this thread while trying to find some information about the NNUE format for shogi engines, along with having already asked myself the following two questions:

"How come this approach isn't being used for FIDE chess?"
"How many parameters would be needed in an NNUE nn.bin for FIDE chess"

For the second one, it's necessary to understand where the magic 125388 for the shogi evaluation comes from, but even after attempting to read the (earlier linked) NNUE.pdf paper, I remain ignorant of how this comes from the board state. To be honest, I haven't even been able to work out how the older KKP/KPP formats work - english documentation is between scarce and nonexistent.

For the 125388 I tried to break it down into prime factors - (2^2) * (3^6) * 43, and then tried to make some very basic observations

81 squares = 9*9 = 3^4 factor for whatever the per-square representation is
empty + (black, white)*(pawn, tokin, lance, lance+, knight, knight+, silver, silver+, gold, bishop, bishop+, rook, rook+, king) = 1 + (2*14) = 29 options per square, could either be wasteful and have a sparse 29-bit pattern per square OR use a compact 5-bit (2^5 > 29)
Whose turn it is = 1 bit
Something for the piece stands = 2*(38, all pieces minus kings) or a bit-representation of the number of each piece-in-hand?

As you can see, I get lost very quickly, so there's currently no chance of me working out what the equivalent would be for FIDE chess. I'd be very interested to see this tried out though, as NNUE-shogi engines are now obscenely powerful - orqha1018+dolphin1 is no AlphaZero but it's still a monster.

The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi

Re: The Stockfish of shogi