Improving positional play

R. Tomasi · Post by **R. Tomasi** » Thu Nov 25, 2021 8:02 pm

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 6:29 pm
mvanthoor wrote: ↑Thu Nov 25, 2021 2:53 pm Still, I stay with my original statement: if you want to become a better chess programmer, it's best to first implement a good handcrafted evaluation.
One does not program chess, one programs a chess engine. Everybody who writes an engine is a chess engine programmer, nobody is ever a 'chess' programmer. Unless I suppose if one works on a command line interface or GUI for humans or engines to play chess, which would make the people over at Lichess and chess.com and Chessbase and BanksiaGUI the 'chess programmers'.

I am sorry, but that is about as pedantic as it gets. About everyone here on these boards would call what we do chess-programming and everyone precisely understands that what we all mean by that is any programming related to chess, including programming chess engines. That's why it's called the chess-programming-wiki and not the chess-engine-programming-wiki. I for one will happily refer to myself as a chess-programmer who is (a small and insignificant) part of the chess programming community. And I very well understand what Marcel is saying when he talks about becoming a better chess programmer - I assume everyone who is not behaving willfully ignorant does so, too.

/rant: off

mvanthoor · Post by **mvanthoor** » Thu Nov 25, 2021 11:07 pm

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 6:29 pm One does not program chess, one programs a chess engine. Everybody who writes an engine is a chess engine programmer, nobody is ever a 'chess' programmer.

I disagree. If someone writes an engine and then tacks a neural net on top of it (even when writing it himself), he may have written a chess engine... or more accurately, he has written a search engine that happens to evaluate chess positions. If he had written the move generator to emit checkers moves instead of chess moves, the same search engine could be used to put a checkers NN on top. Does this programmer actually understand what evaluation terms make up a good *chess position evaluation* ?

Don't get me wrong, I'm not completely against neural networks; far from it. I expect Rustic to have its own someday, if I ever get the time to research and implement it. However, I'm still of the opinion that everybody who implements a neural network should at least have written a (simple) HCE evaluation and tuned it with Texel tuning.

Going for the NNUE directly is trying to race before you can even drive.

I've seen people who are wizards when it comes to writing web applications. A week, and you have a fully functional prototype; but in the end, it turns out they used a massive amount of libraries. Some of the best application programmers I encountered turned out to be completely dead in the water when something had to be done for which there was no library.

Therefore I think you should at least write a (simple) HCE at least once to know what sort of stuff goes into an evaluation function.

BBC is a poor example of implementing neural networks. The BBC author himself said that he wasn't interested in learning how to write his own inference code and architecture for BBC nor how to train his own nets, and was called out by the community for using Stockfish's net and Stockfish's architecture and Daniel Shawul's inference code. I highly doubt he even understands how NNUE works.

And because of it, BBC is the *PERFECT* example of what some people do. Just write an alpha/beta routine, slap a NNUE network on top of it, and presto, there's your 3000 Elo engine.

And those engines are what I would expect the Blunder author to follow if they were to implement neural networks in Blunder. However, the Blunder author said earlier in the thread that they didn't want to move to neural networks yet, as they wanted to get king safety and a few other features fully implemented in Blunder's handcrafted evaluation function first.

And the same goes for my engine. I'm not even going to consider multi-threading before I hit at least 2850, let alone neural networks. I want to first implement a classic chess engine that I could use for analysis if I wanted to, even when running on one thread.

You don't need a good HCE to generate training positions from that data. Blunder's current handcrafted evaluation function is good enough.

OK. Good to know.

algerbrex · Post by **algerbrex** » Fri Nov 26, 2021 10:23 pm

mvanthoor wrote: ↑Thu Nov 25, 2021 5:21 pm The only problem is... I'm not Karpov. I hate this style of playing. I like tactical positions. However, in the middle game, Rustic 4 is already able to go like 9-10 moves deep, and I can't keep up. I can maybe reach 7-8 on a good day in a tactical position, and then stuff starts to become unclear to me. So I often try, but I can't out-tactic engines that gan go more than ~8 moves deep in the middle game.

The same happens with the NM's or IM's. They can probably do tactics up to move 9-10, maybe 11-12 for the stronger IM's, but as Blunder can already go to depth 14-15, it's game over for the IM if he gets into a position where the outcome is decided by tactics. You just can't keep up with a computer when it comes to calculation, so you have to use knowledge... but at some point, the evaluation will have enough knowledge to rival than of an IM + all the better tactical capability, and then you'll have a grandmaster-level engine.

I also enjoy tactical games much more than positional ones, and I can often spot some pretty nice 2-3 move tactics, but I'm long past the point where I can keep up with Blunder tactically. It's sad to say but I struggled to keep up when Blunder was only searching 6-8 moves ahead. Forget it now that's it's starting to see 15 moves deep.

mvanthoor wrote: ↑Thu Nov 25, 2021 5:21 pm (Have you ever seen Nakamura do tactics on Youtube during analysis? My god.... he about starts to calculate AT move 12 or thereabouts. Still, Stockfish which easily goes down to move 24+ in the middle game, even in bullet, will take Nakamura down in short order...)

Nakamura's, and most GM's ability to calculate that I've seen is ridiculous to me. I remember a couple of days ago there was a video by GothamChess where he went over a clip where Kasparov demonstrated how he calculated several different lines to beat Karpov. His ability to correctly see forcing continuations to their conclusions was crazy to me.

Still though, as you said, of course, it's still nothing compared to modern top engines being able to see ahead twice as far.

mvanthoor wrote: ↑Thu Nov 25, 2021 5:21 pm What you may have also seen is me mentioning pawn this, pawn that, open lines, etc... lots of things depend on the pawns. Also, pawns don't often move, so things like open lines now are probably the same as open lines in the next move.

My first step after implementing mobility (somewhere in Rustic 36, at the speed which I'm going... ) will be a pawn hash table. I'm not going to recalculate all that pawn stuff over and over again. Fortunately I already have a generic TT which can store any kind of data

If you get going with pawn stuff, do the pawn TT first, because recalculating stuff in the evaluation is massively expensive. If something is calculated in the evaluation and you can keep it incrementally, then do so. I added a "detect bishop pair" function to improve draw detection (1), which I can also use in the evaluation to see if the bishop pair bonus applies. I'm thinking about just keeping that as a variable so I don't have to determine if the bishop pair is on the board, over and over again... I just change this when a bishop gets captured or someone promotes to a bishop.

(1) (You can checkmate with the bishop pair against a lone king, but not with 5 bishops on the same colored square. The bishop pair is having at least two bishops, each on different colored squares, or you have a redundant piece with regard to checkmating.)

That sounds like a good plan and is something I've been considering myself, since I've started playing around with pawn structure eval terms, and so far I've gotten about ~16 Elo from them.

I've hit a bit of a roadblock with Blunder with king safety, and I've been stubbornly trying to get that to work for the past couple of days. At this point, I've accepted that I'm a pretty lousy tuner, and I'll probably just need to use texel tuning since evaluation terms are always so orthogonal.

I tried tuning my current scheme using Blunder's current eval values, and it looked like it would be a 40+ Elo improvement, but by the end of 4000+ games, there was no difference in performance. I believe the issue at this point is that I need to move beyond just using quiet-labeled.epd from Zurichess and use another dataset, since what's probably happening is that instead of tuning the eval parameters to better values, I'm simply overfitting them to the same data.

But before I use another dataset, I'm going to try to give my tuner a facelift and rework it to use some form of gradient descent or particle swarms like Erik did in Madchess. I'd also like to wait till my new laptop arrives. It should be a pretty nice step up from this one with an AMD Ryzen 7 4700U, 8-cores, and 8-threads. Should make tuning a good deal faster.

algerbrex · Post by **algerbrex** » Fri Nov 26, 2021 10:34 pm

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 6:29 pm One does not program chess, one programs a chess engine. Everybody who writes an engine is a chess engine programmer, nobody is ever a 'chess' programmer. Unless I suppose if one works on a command line interface or GUI for humans or engines to play chess, which would make the people over at Lichess and chess.com and Chessbase and BanksiaGUI the 'chess programmers'.

While I suppose this is technically correct, I understood what Marcel was getting at with the phrasing he used, so I don't think it's too big of a deal.

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 6:29 pm Handcrafted evaluation, along with Texel tuning or Andrew Grant's gradient descent, is simply another category of machine learning models used in chess engine programming, similar to what was used in machine learning models in natural language processing or artificial computer vision before they similarly adopted neural networks in those fields.

I have noticed this relationship, and the first way I understood texel tuning was in rough machine learning terms.

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 6:29 pm ...and was called out by the community for using Stockfish's net and Stockfish's architecture and Daniel Shawul's inference code...

I wasn't aware he was called out? I thought Maksim made it clear in the videos he only included Stockfish nets for exemplary purposes? I didn't think he was trying to claim some sort of originality in the videos, or with his release of BBC + NNUE.

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 6:29 pm And those engines are what I would expect the Blunder author to follow if they were to implement neural networks in Blunder. However, the Blunder author said earlier in the thread that they didn't want to move to neural networks yet, as they wanted to get king safety and a few other features fully implemented in Blunder's handcrafted evaluation function first.

Yup, that's the plan. My goal is to make everything original and use my own data from a version of Blunder with a good, well-tuned HCE.

mvanthoor wrote: ↑Thu Nov 25, 2021 2:53 pm I think it's better to implement a good HCE and then play lots of games in gauntlets against many engines and then generate training positions from that data. Then you _will_ understand where the evaluation terms come from and the data will be your own engine's, so it won't lose all of its playing style.

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 6:29 pm You don't need a good HCE to generate training positions from that data. Blunder's current handcrafted evaluation function is good enough. Some of Stockfish's nets were trained using SimpleEval, which has no positional evaluation terms, only material evaluation terms, Seer's nets were trained using Lichess data, and both Leela's and Slowchess's current nets were trained starting from random play.

Now, this was something I didn't know? Interesting. Were those games using SimpleEval just from self-play games?

Uri Blass · Post by **Uri Blass** » Sat Nov 27, 2021 5:13 am

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 2:38 pm
Uri Blass wrote: ↑Wed Nov 24, 2021 11:38 pm
Madeleine Birchfield wrote: ↑Wed Nov 24, 2021 2:43 pm
Uri Blass wrote: ↑Wed Nov 24, 2021 7:16 am I disagree and for me it is not understanding.
For me understanding is something that you can explain why and not hide behind the words:"neural network"

If your engine have a static evaluation of +3 for white and you have no idea why because from your point of view the position is equal then it is not a positional understanding from my point of view.

I hate seeing a huge score without understanding the reason for it and I believe that this huge score is not good from practical point of view because in a game there is a big probablity that the opponent will have no idea how to translate the huge score to victory so from my point of view old engines are better for analysis in many positions because they can give a move that they do not understand that it is losing so there is a good chance that the human opponent is not going to understand how to win and not the new engine that believe I am playing against super players and if everything lose can suggest me a move that gives no practical chances.
That is all equally applicable for any closed source engine, where the average user does not have any access to the source code of the engine. The evaluation function is still a black box, regardless of whether it uses a neural network or only handcrafted evaluation, and the average user simply cannot explain anything beyond hiding behind the words "evaluation function".

And that is even true of open source engines; most people do not have the time or resources to open Stockfish's source code and see how its evaluation function works, so for most people, the evaluation function is functionally a black box to them as well.

There are many positions that handcrafted evaluation using engines would evaluate at +3 but any semi-decent chess player would tell you is an equal position.

There is a big difference between a chess engine or chess player understanding a position and a user/developer understanding how the evaluation function works in an engine, and the former is far more important to the average chess player than the latter. Grandmasters these days like Carlsen, Caurana, and Dubov have all said at some point that after the top engines adopted neural networks, the engines were able to understand positions found in i.e. the French and the KID much better than before. No grandmaster cares about how Stockfish's handcrafted evaluation works.
My experience is that I have more misunderstanding of evaluations of the new engines.
With the old engines cases when the engine said +5 when I had no idea why were rare.

With the new engines cases when the engine say +5 when I have no idea why are common.
I suspect that if you want to predict result of a game between humans with fide rating of 2000 then evaluation of fruit2.1 after searching to depth 10 when you translate it to expected result is going to give you a better prediction than evaluation of the new engines regardless of depth and it may be interesting to check it.
As mvanthoor said, I think that has more to do with the fact that newer engines have stronger searches that allow it to search more deeply than older engines.

I think no and I can give examples for huge scores that I do not understand even at small depths.

[fen]r1b2rk1/bp3p2/p1pp1q1p/4pNp1/3PPnP1/1BP1BQP1/PP3P2/R4RK1 b - - 0 19[/fen]

Stockfish already see more than +3 at small depths when fruit does not see a decisive advantage for white.
From human point of view it is not an endgame material is equal and there are too many pieces on the board to be sure about something.

Stockfish_14.1_win_x64_avx2:
NNUE evaluation using nn-13406b1dcbe0.nnue enabled
1/4 00:00 932 932k -1.24 Bc8xf5 g4xf5 e5xd4
2/2 00:00 2k 1,518k -1.24 Bc8xf5 g4xf5
3/3 00:00 2k 2,280k -2.84 Bc8xf5 g4xf5 e5xd4
4/5 00:00 6k 3,104k -2.21 Nf4-d3 Kg1-g2
5/7 00:00 11k 5,445k -3.47 Nf4-d3 Ra1-d1 Nd3xb2 Rd1-d2 Bc8xf5
6/6 00:00 14k 4,790k -3.47 Nf4-d3 Ra1-d1 Nd3xb2 Rd1-d2 Bc8xf5 g4xf5
7/7 00:00 34k 6,861k -3.73 Nf4-d3 Ra1-d1 Bc8xf5 g4xf5 e5xd4 c3xd4 Nd3xb2
8/10 00:00 60k 8,538k -3.73 Nf4-g6 Ra1-d1 Bc8xf5 g4xf5 e5xd4 c3xd4 Ng6-e7 Kg1-g2 d6-d5
9/12 00:00 143k 7,512k -3.87 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Ra1-d1 Rf8-e8 Bb3-c2 Ng6-f8 Kg1-g2 Nf8-d7 Rf1-h1
10/15 00:00 208k 8,010k -4.19 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Ra1-d1 Rf8-e8 Kg1-g2 Ng6-e7 Bb3-c2 c6-c5 e4-e5 d6xe5 d4xc5
11/13 00:00 296k 7,585k -4.35 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Qf3-h5 Ng6-e7 Bb3-c2 Kg8-g7 Ra1-d1 c6-c5 f2-f4 g5xf4 Be3xf4
12/13 00:00 344k 7,645k -4.40 Bc8xf5 g4xf5 Nf4-g6 Qf3-h5 Ng6-e7 Bb3-c2 e5xd4 c3xd4 Kg8-g7 Ra1-d1 c6-c5 f2-f4 g5xf4
13/19 00:00 791k 7,601k -4.33 Bc8xf5 g4xf5 c6-c5 d4xc5 d6xc5 Rf1-d1 Ra8-d8 a2-a4 Qf6-g7 g3xf4 e5xf4
14/18 00:00 864k 7,718k -3.97 Bc8xf5 g4xf5 c6-c5 d4xe5 d6xe5 Ra1-d1 Kg8-h7 Rd1-d7 c5-c4 Bb3xc4 g5-g4 Qf3xg4 Ba7xe3 f2xe3 Rf8-g8
15/26 00:00 1,702k 7,953k -4.13 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Bb3-c2 Ng6-e7 Qf3-h5 Rf8-e8 Ra1-d1 c6-c5 e4-e5 d6xe5 d4xe5 Qf6xe5 Rd1-d7
16/27 00:00 2,102k 7,842k -4.17 Bc8xf5 g4xf5 Nf4-g6 Qf3-h5 Ng6-e7 Kg1-g2 e5xd4 c3xd4 d6-d5 f2-f4 Ba7xd4 f4xg5 Qf6-e5 Be3xd4 Qe5xd4 Kg2-h3 Qd4-h8 f5-f6 Ne7-g6 Kh3-g2
17/23 00:00 4,253k 7,705k -3.53 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Rf1-d1 Ra7-a8 Bb3xe6 f7xe6 Kg1-g2 Ng7-e8
18/28 00:00 6,115k 7,711k -3.78 Nf4-e6 Ra1-d1 Ne6-g7 d4xe5 d6xe5 Rd1-d6 Bc8-e6 Be3xa7 Ra8xa7 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-c8 Qe3-b6 Rc8-c7 Rd2-d8 Rc7-c8 Rd8xc8 Be6xc8 Qb6-b4
19/30 00:00 6,470k 7,757k -3.63 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-c8 Qe3-b6 Rc8-b8 Qb6-a7 Ne8-g7 Rd2-d6 h6-h5 Nf5xg7 Kg8xg7 g4xh5
20/28 00:00 6,686k 7,775k -3.68 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-b8 Qe3-a7 Ne8-g7 Rd2-d6 h6-h5 Nf5xg7 Kg8xg7 g4xh5 Kg7-h6 g3-g4 Qf6-f4 Bb3xe6
21/28 00:00 7,461k 7,845k -3.76 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-c8 Qe3-b6 Rc8-b8 Qb6-c5 b7-b6 Qc5-a3 Be6xf5 g4xf5 a6-a5 Rd2-d7 g5-g4 Rd1-d6 Ne8xd6 Rd7xd6
22/39- 00:03 26,559k 7,689k -5.15 Nf4-e6 Ra1-d1
22/45 00:04 38,004k 7,671k -5.13 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Bb3-c2 Re4-e7 Rf1-e1 Kg7xf7 Bc2-g6+ Kf7-f6 Re1xe7 Kf6xe7 b2-b3 Ke7-f6 Bg6-h5
23/36- 00:05 43,737k 7,629k -5.21 Bc8xf5 g4xf5
23/38- 00:06 46,936k 7,654k -5.29 Bc8xf5 g4xf5
23/38- 00:08 63,535k 7,667k -5.60 Bc8xf5 g4xf5
23/38 00:09 71,047k 7,656k -5.67 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Rf1-e1 Re4xe1+ Rd1xe1 Kg7xf7 Bb3-d1 Kf7-f6 Bd1-h5 a6-a5 Kg1-f1 c6-c5 d4xc5 Ba7xc5 Re1-d1 Ra8-d8 Kf1-g2 d5-d4
24/33+ 00:09 71,060k 7,656k -5.52 Bc8xf5
24/33+ 00:09 71,065k 7,656k -5.43 Bc8xf5
24/33 00:09 71,413k 7,657k -5.46 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Rd1-e1 Re4xe1 Rf1xe1 Kg7xf7 Bb3-d1 Kf7-f6 Bd1-h5 Ra8-g8 Kg1-g2 c6-c5 d4xc5
25/33 00:10 77,823k 7,671k -5.46 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Rd1-e1 Re4xe1 Rf1xe1 Kg7xf7 Bb3-d1 Kf7-f6 Bd1-h5 Ra8-g8 Kg1-g2 Rg8-d8 b2-b3 c6-c5 d4xc5 Ba7xc5 Bc1-b2 g5-g4

mvanthoor · Post by **mvanthoor** » Sat Nov 27, 2021 3:16 pm

algerbrex wrote: ↑Fri Nov 26, 2021 10:23 pm Nakamura's, and most GM's ability to calculate that I've seen is ridiculous to me. I remember a couple of days ago there was a video by GothamChess where he went over a clip where Kasparov demonstrated how he calculated several different lines to beat Karpov. His ability to correctly see forcing continuations to their conclusions was crazy to me.

Cool video, wasn't it? It's from around the time where Kasparov was only in his mid-twenties. The one thing I always think it's funny is the glee with which he points out that "This defense doesn't work! <enter 14 moves> See?" He really seems to love chess. (Who wouldn't, if you could play like that?)

I tried tuning my current scheme using Blunder's current eval values ... I'm simply overfitting them to the same data.

Probably. Also note that there is only so much you can do with PST's / material value. There's a limit of the information you can encode in only so few evaluation terms. It may be better to just extend the search function first, because that works with _any_ evaluation. I'm going to try and postpone enhancing the evaluation for as long as possible, except for maybe mobility, because that's quite easy. (The easiest version is just counting available pseudo-legal moves: more is mostly better.)

But before I use another dataset, I'm going to try to give my tuner a facelift and rework it to use some form of gradient descent or particle swarms like Erik did in Madchess.

I don't know if I ever get to that point without finding some very clear explanations on how to do it. I don't have the maths background to just read a mathematical paper and then convert it into software, as I don't understand half of the mathematical notations. To be able to understand the subject, someone would actually have to explain to me what the formulas do. _Then_ I could probably write some code for that.

I'd also like to wait till my new laptop arrives. It should be a pretty nice step up from this one with an AMD Ryzen 7 4700U, 8-cores, and 8-threads. Should make tuning a good deal faster.

I still have a small computer (an Intel NUC) to convert from Windows to Linux to run a virtual piano. (This computer may be in service for a _long_ time to come; long after Windows 10 goes out of support.) After that, I'll probably order the parts for a new computer: AMD 5700G (8 cores), Gigabyte ITX board, 16 GB RAM, an SSD and a small form factor case. That system will be three times as fast as my current computer, so I can off-load all of Rustic's testing and tuning to it.

Somewhere in 2023, after the release of Debian Bookworm 12, I'll probably be building a new workstation on the basis of AMD Zen 4... if Intel _still_ doesn't have a decent successor for the 10980XE.

chrisw · Post by **chrisw** » Sun Nov 28, 2021 5:20 pm

Uri Blass wrote: ↑Sat Nov 27, 2021 5:13 am
Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 2:38 pm
Uri Blass wrote: ↑Wed Nov 24, 2021 11:38 pm
Madeleine Birchfield wrote: ↑Wed Nov 24, 2021 2:43 pm
Uri Blass wrote: ↑Wed Nov 24, 2021 7:16 am I disagree and for me it is not understanding.
For me understanding is something that you can explain why and not hide behind the words:"neural network"

If your engine have a static evaluation of +3 for white and you have no idea why because from your point of view the position is equal then it is not a positional understanding from my point of view.

I hate seeing a huge score without understanding the reason for it and I believe that this huge score is not good from practical point of view because in a game there is a big probablity that the opponent will have no idea how to translate the huge score to victory so from my point of view old engines are better for analysis in many positions because they can give a move that they do not understand that it is losing so there is a good chance that the human opponent is not going to understand how to win and not the new engine that believe I am playing against super players and if everything lose can suggest me a move that gives no practical chances.
That is all equally applicable for any closed source engine, where the average user does not have any access to the source code of the engine. The evaluation function is still a black box, regardless of whether it uses a neural network or only handcrafted evaluation, and the average user simply cannot explain anything beyond hiding behind the words "evaluation function".

And that is even true of open source engines; most people do not have the time or resources to open Stockfish's source code and see how its evaluation function works, so for most people, the evaluation function is functionally a black box to them as well.

There are many positions that handcrafted evaluation using engines would evaluate at +3 but any semi-decent chess player would tell you is an equal position.

There is a big difference between a chess engine or chess player understanding a position and a user/developer understanding how the evaluation function works in an engine, and the former is far more important to the average chess player than the latter. Grandmasters these days like Carlsen, Caurana, and Dubov have all said at some point that after the top engines adopted neural networks, the engines were able to understand positions found in i.e. the French and the KID much better than before. No grandmaster cares about how Stockfish's handcrafted evaluation works.
My experience is that I have more misunderstanding of evaluations of the new engines.
With the old engines cases when the engine said +5 when I had no idea why were rare.

With the new engines cases when the engine say +5 when I have no idea why are common.
I suspect that if you want to predict result of a game between humans with fide rating of 2000 then evaluation of fruit2.1 after searching to depth 10 when you translate it to expected result is going to give you a better prediction than evaluation of the new engines regardless of depth and it may be interesting to check it.
As mvanthoor said, I think that has more to do with the fact that newer engines have stronger searches that allow it to search more deeply than older engines.
I think no and I can give examples for huge scores that I do not understand even at small depths.

[fen]r1b2rk1/bp3p2/p1pp1q1p/4pNp1/3PPnP1/1BP1BQP1/PP3P2/R4RK1 b - - 0 19[/fen]

Stockfish already see more than +3 at small depths when fruit does not see a decisive advantage for white.
From human point of view it is not an endgame material is equal and there are too many pieces on the board to be sure about something.

NN is holistic and doesn’t split its eval into positional and material. If you as human try and interpret it as a dichotomy, you’ll not get an understanding.
Secondly the eval doesn’t describe the position, it describes the position at the end of the PV, and the iterations 6,7,8,9 all show more or less the same thing. Black trades bishop for nite (gives up bishop pair), and can’t find anywhere good for his nite (if tries take on b2, nite gets trapped by Rd1d2. If retreats to g6 it gets hemmed in by the pawn structure and has no good squares. Exchanges in centre just cede white a big pawn centre. White has the beginnings of attack in king. Too many black problems, I agree with the NN, white win probably.

Stockfish_14.1_win_x64_avx2:
NNUE evaluation using nn-13406b1dcbe0.nnue enabled
1/4 00:00 932 932k -1.24 Bc8xf5 g4xf5 e5xd4
2/2 00:00 2k 1,518k -1.24 Bc8xf5 g4xf5
3/3 00:00 2k 2,280k -2.84 Bc8xf5 g4xf5 e5xd4
4/5 00:00 6k 3,104k -2.21 Nf4-d3 Kg1-g2
5/7 00:00 11k 5,445k -3.47 Nf4-d3 Ra1-d1 Nd3xb2 Rd1-d2 Bc8xf5
6/6 00:00 14k 4,790k -3.47 Nf4-d3 Ra1-d1 Nd3xb2 Rd1-d2 Bc8xf5 g4xf5
7/7 00:00 34k 6,861k -3.73 Nf4-d3 Ra1-d1 Bc8xf5 g4xf5 e5xd4 c3xd4 Nd3xb2
8/10 00:00 60k 8,538k -3.73 Nf4-g6 Ra1-d1 Bc8xf5 g4xf5 e5xd4 c3xd4 Ng6-e7 Kg1-g2 d6-d5
9/12 00:00 143k 7,512k -3.87 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Ra1-d1 Rf8-e8 Bb3-c2 Ng6-f8 Kg1-g2 Nf8-d7 Rf1-h1
10/15 00:00 208k 8,010k -4.19 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Ra1-d1 Rf8-e8 Kg1-g2 Ng6-e7 Bb3-c2 c6-c5 e4-e5 d6xe5 d4xc5
11/13 00:00 296k 7,585k -4.35 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Qf3-h5 Ng6-e7 Bb3-c2 Kg8-g7 Ra1-d1 c6-c5 f2-f4 g5xf4 Be3xf4
12/13 00:00 344k 7,645k -4.40 Bc8xf5 g4xf5 Nf4-g6 Qf3-h5 Ng6-e7 Bb3-c2 e5xd4 c3xd4 Kg8-g7 Ra1-d1 c6-c5 f2-f4 g5xf4
13/19 00:00 791k 7,601k -4.33 Bc8xf5 g4xf5 c6-c5 d4xc5 d6xc5 Rf1-d1 Ra8-d8 a2-a4 Qf6-g7 g3xf4 e5xf4
14/18 00:00 864k 7,718k -3.97 Bc8xf5 g4xf5 c6-c5 d4xe5 d6xe5 Ra1-d1 Kg8-h7 Rd1-d7 c5-c4 Bb3xc4 g5-g4 Qf3xg4 Ba7xe3 f2xe3 Rf8-g8
15/26 00:00 1,702k 7,953k -4.13 Bc8xf5 g4xf5 e5xd4 c3xd4 Nf4-g6 Bb3-c2 Ng6-e7 Qf3-h5 Rf8-e8 Ra1-d1 c6-c5 e4-e5 d6xe5 d4xe5 Qf6xe5 Rd1-d7
16/27 00:00 2,102k 7,842k -4.17 Bc8xf5 g4xf5 Nf4-g6 Qf3-h5 Ng6-e7 Kg1-g2 e5xd4 c3xd4 d6-d5 f2-f4 Ba7xd4 f4xg5 Qf6-e5 Be3xd4 Qe5xd4 Kg2-h3 Qd4-h8 f5-f6 Ne7-g6 Kh3-g2
17/23 00:00 4,253k 7,705k -3.53 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Rf1-d1 Ra7-a8 Bb3xe6 f7xe6 Kg1-g2 Ng7-e8
18/28 00:00 6,115k 7,711k -3.78 Nf4-e6 Ra1-d1 Ne6-g7 d4xe5 d6xe5 Rd1-d6 Bc8-e6 Be3xa7 Ra8xa7 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-c8 Qe3-b6 Rc8-c7 Rd2-d8 Rc7-c8 Rd8xc8 Be6xc8 Qb6-b4
19/30 00:00 6,470k 7,757k -3.63 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-c8 Qe3-b6 Rc8-b8 Qb6-a7 Ne8-g7 Rd2-d6 h6-h5 Nf5xg7 Kg8xg7 g4xh5
20/28 00:00 6,686k 7,775k -3.68 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-b8 Qe3-a7 Ne8-g7 Rd2-d6 h6-h5 Nf5xg7 Kg8xg7 g4xh5 Kg7-h6 g3-g4 Qf6-f4 Bb3xe6
21/28 00:00 7,461k 7,845k -3.76 Nf4-e6 d4xe5 d6xe5 Be3xa7 Ra8xa7 Ra1-d1 Ne6-g7 Rd1-d6 Bc8-e6 Qf3-e3 Ra7-a8 Rf1-d1 Ng7-e8 Rd6-d2 Ra8-c8 Qe3-b6 Rc8-b8 Qb6-c5 b7-b6 Qc5-a3 Be6xf5 g4xf5 a6-a5 Rd2-d7 g5-g4 Rd1-d6 Ne8xd6 Rd7xd6
22/39- 00:03 26,559k 7,689k -5.15 Nf4-e6 Ra1-d1
22/45 00:04 38,004k 7,671k -5.13 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Bb3-c2 Re4-e7 Rf1-e1 Kg7xf7 Bc2-g6+ Kf7-f6 Re1xe7 Kf6xe7 b2-b3 Ke7-f6 Bg6-h5
23/36- 00:05 43,737k 7,629k -5.21 Bc8xf5 g4xf5
23/38- 00:06 46,936k 7,654k -5.29 Bc8xf5 g4xf5
23/38- 00:08 63,535k 7,667k -5.60 Bc8xf5 g4xf5
23/38 00:09 71,047k 7,656k -5.67 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Rf1-e1 Re4xe1+ Rd1xe1 Kg7xf7 Bb3-d1 Kf7-f6 Bd1-h5 a6-a5 Kg1-f1 c6-c5 d4xc5 Ba7xc5 Re1-d1 Ra8-d8 Kf1-g2 d5-d4
24/33+ 00:09 71,060k 7,656k -5.52 Bc8xf5
24/33+ 00:09 71,065k 7,656k -5.43 Bc8xf5
24/33 00:09 71,413k 7,657k -5.46 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Rd1-e1 Re4xe1 Rf1xe1 Kg7xf7 Bb3-d1 Kf7-f6 Bd1-h5 Ra8-g8 Kg1-g2 c6-c5 d4xc5
25/33 00:10 77,823k 7,671k -5.46 Bc8xf5 g4xf5 Nf4-g6 Ra1-d1 Rf8-e8 Qf3-h5 Ng6-f4 g3xf4 e5xf4 Qh5-g6+ Qf6xg6 f5xg6 Kg8-g7 g6xf7 Re8xe4 Be3-c1 d6-d5 Rd1-e1 Re4xe1 Rf1xe1 Kg7xf7 Bb3-d1 Kf7-f6 Bd1-h5 Ra8-g8 Kg1-g2 Rg8-d8 b2-b3 c6-c5 d4xc5 Ba7xc5 Bc1-b2 g5-g4

algerbrex · Post by **algerbrex** » Mon Nov 29, 2021 12:40 am

mvanthoor wrote: ↑Sat Nov 27, 2021 3:16 pm Cool video, wasn't it? It's from around the time where Kasparov was only in his mid-twenties. The one thing I always think it's funny is the glee with which he points out that "This defense doesn't work! <enter 14 moves> See?" He really seems to love chess. (Who wouldn't, if you could play like that?)

Yup, really puts into perspective the level of grandmasters. And it makes you appreciate how they're still able to contend with weaker engines, that can sometimes spot tactics twice as far ahead.

mvanthoor wrote: ↑Sat Nov 27, 2021 3:16 pm Probably. Also note that there is only so much you can do with PST's / material value. There's a limit of the information you can encode in only so few evaluation terms. It may be better to just extend the search function first, because that works with _any_ evaluation. I'm going to try and postpone enhancing the evaluation for as long as possible, except for maybe mobility, because that's quite easy. (The easiest version is just counting available pseudo-legal moves: more is mostly better.)

Right, that's what I've realized too. PST's are nice, but they're not the end all be all of evaluating a position, and in some areas (e.g. king safety), it's pretty useless.

Since I've been focusing a good deal on the search for the past several months, however, I'll probably keep working on the evaluation. There's still a ton of room for improvement in the search, of course, but I'd like to try to keep my improvements roughly equal between the two and not let the disparity get too large.

I have wondered though to how strong of an engine I could create with minimal evaluation terms, and perhaps at some point the future I'd like to experiment with this, maybe in the form of a new engine. For Blunder I'm going to use any evaluation terms I feel would be helpful, but for a future engine, it might be fun to see how well I could get the engine to perform, especially positionally, with only looking at say, king safety, pawn structure, piece mobility, and piece values.

mvanthoor wrote: ↑Sat Nov 27, 2021 3:16 pm I don't know if I ever get to that point without finding some very clear explanations on how to do it. I don't have the maths background to just read a mathematical paper and then convert it into software, as I don't understand half of the mathematical notations. To be able to understand the subject, someone would actually have to explain to me what the formulas do. _Then_ I could probably write some code for that.

Yep, that's a problem I've encountered. I've been able to find some decent resources it seems, however, and I think I've started to wrap my head around correctly computing the partial derivative of the mean-square error cost function. So for the next couple of days, I'll probably work on getting gradient descent implemented into the tuner, which would be a pretty nice improvement over the plain local optimization.

Madeleine Birchfield · Mon Nov 29, 2021 1:01 am

Andrew Grant has a paper on gradient descent in tuning and C++ code/implementation of the gradient descent (and an application to king safety terms in the evaluation) in the paper: https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf

algerbrex · Post by **algerbrex** » Mon Nov 29, 2021 12:42 pm

Madeleine Birchfield wrote: ↑Mon Nov 29, 2021 1:01 am Andrew Grant has a paper on gradient descent in tuning and C++ code/implementation of the gradient descent (and an application to king safety terms in the evaluation) in the paper: https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf

Thanks yeah, I've also been making use of that paper. I did want to make sure that I understood where his derivations in the paper came from, however, which is why I took the time to calculate them myself.

I think for the most part I've understood the correct derivatives. The only issue I've encountered now is how to collect the coefficients used efficiently and cleanly. For now, I've just settled on tracing them in Blunder's evaluation.

Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play