Improving positional play

algerbrex · Post by **algerbrex** » Wed Nov 24, 2021 1:27 pm

jdart wrote: ↑Wed Nov 24, 2021 12:14 pm King safety is somewhat difficult. If you tune it up too high, the program will sacrifice pieces for an attack, and the attack may be unsound. Another problem is blocked positions. If there is what looks to the program like an attack, but the opposing side can't actually penetrate the pawn shield, then the score shouldn't be too high. Lacking a NN, the next best thing is to employ automated parameter tuning to find the right values for the eval weights.

Thanks. At this point, I share the sentiment that automated parameter tuning is the way to go. I something like texel tuning what you had in mind? If so, would general positions work here? Or would we need some positions specifically related to king safety in there?

To tune all of my eval weights, I've currently only used Zurichess's quiet-labeled.epd training set. Any other recommendations for what might work well?

jdart wrote: ↑Wed Nov 24, 2021 12:14 pm I will also mention, as I have said here before: a major cause of program weaknesses is bugs. You can add all the program features you like but if there are fundamental bugs, then you may not get the performance you might expect. That is why "perft" and other sanity checks are important. So is making sure every piece of the program does exactly what it's designed to do, and in line with good practice.

Thanks, I'll also double check my code. I did make sure to go through and make sure my eval was symmetrical on several hundred positions already as well.

Madeleine Birchfield · Wed Nov 24, 2021 2:43 pm

Uri Blass wrote: ↑Wed Nov 24, 2021 7:16 am I disagree and for me it is not understanding.
For me understanding is something that you can explain why and not hide behind the words:"neural network"

If your engine have a static evaluation of +3 for white and you have no idea why because from your point of view the position is equal then it is not a positional understanding from my point of view.

I hate seeing a huge score without understanding the reason for it and I believe that this huge score is not good from practical point of view because in a game there is a big probablity that the opponent will have no idea how to translate the huge score to victory so from my point of view old engines are better for analysis in many positions because they can give a move that they do not understand that it is losing so there is a good chance that the human opponent is not going to understand how to win and not the new engine that believe I am playing against super players and if everything lose can suggest me a move that gives no practical chances.

That is all equally applicable for any closed source engine, where the average user does not have any access to the source code of the engine. The evaluation function is still a black box, regardless of whether it uses a neural network or only handcrafted evaluation, and the average user simply cannot explain anything beyond hiding behind the words "evaluation function".

And that is even true of open source engines; most people do not have the time or resources to open Stockfish's source code and see how its evaluation function works, so for most people, the evaluation function is functionally a black box to them as well.

There are many positions that handcrafted evaluation using engines would evaluate at +3 but any semi-decent chess player would tell you is an equal position.

There is a big difference between a chess engine or chess player understanding a position and a user/developer understanding how the evaluation function works in an engine, and the former is far more important to the average chess player than the latter. Grandmasters these days like Carlsen, Caurana, and Dubov have all said at some point that after the top engines adopted neural networks, the engines were able to understand positions found in i.e. the French and the KID much better than before. No grandmaster cares about how Stockfish's handcrafted evaluation works.

Uri Blass · Post by **Uri Blass** » Wed Nov 24, 2021 11:38 pm

Madeleine Birchfield wrote: ↑Wed Nov 24, 2021 2:43 pm
Uri Blass wrote: ↑Wed Nov 24, 2021 7:16 am I disagree and for me it is not understanding.
For me understanding is something that you can explain why and not hide behind the words:"neural network"

If your engine have a static evaluation of +3 for white and you have no idea why because from your point of view the position is equal then it is not a positional understanding from my point of view.

I hate seeing a huge score without understanding the reason for it and I believe that this huge score is not good from practical point of view because in a game there is a big probablity that the opponent will have no idea how to translate the huge score to victory so from my point of view old engines are better for analysis in many positions because they can give a move that they do not understand that it is losing so there is a good chance that the human opponent is not going to understand how to win and not the new engine that believe I am playing against super players and if everything lose can suggest me a move that gives no practical chances.
That is all equally applicable for any closed source engine, where the average user does not have any access to the source code of the engine. The evaluation function is still a black box, regardless of whether it uses a neural network or only handcrafted evaluation, and the average user simply cannot explain anything beyond hiding behind the words "evaluation function".

And that is even true of open source engines; most people do not have the time or resources to open Stockfish's source code and see how its evaluation function works, so for most people, the evaluation function is functionally a black box to them as well.

There are many positions that handcrafted evaluation using engines would evaluate at +3 but any semi-decent chess player would tell you is an equal position.

There is a big difference between a chess engine or chess player understanding a position and a user/developer understanding how the evaluation function works in an engine, and the former is far more important to the average chess player than the latter. Grandmasters these days like Carlsen, Caurana, and Dubov have all said at some point that after the top engines adopted neural networks, the engines were able to understand positions found in i.e. the French and the KID much better than before. No grandmaster cares about how Stockfish's handcrafted evaluation works.

My experience is that I have more misunderstanding of evaluations of the new engines.
With the old engines cases when the engine said +5 when I had no idea why were rare.

With the new engines cases when the engine say +5 when I have no idea why are common.
I suspect that if you want to predict result of a game between humans with fide rating of 2000 then evaluation of fruit2.1 after searching to depth 10 when you translate it to expected result is going to give you a better prediction than evaluation of the new engines regardless of depth and it may be interesting to check it.

mvanthoor · Post by **mvanthoor** » Thu Nov 25, 2021 11:18 am

Madeleine Birchfield wrote: ↑Wed Nov 24, 2021 12:29 am The biggest way to improve the positional play of your engine is to use a neural network for the opening and middlegame.

No, it isn't, because in that case you'll be implementing something of which you won't ever know how it actually works. The best way is to write a decent hand-crafted evaluation first and THEN implement a neural network, using your own games as training data.

mvanthoor · Post by **mvanthoor** » Thu Nov 25, 2021 11:31 am

algerbrex wrote: ↑Tue Nov 23, 2021 11:02 pm Of course, I'm aware of PSQTs to help improve positional play, and I've already turned a pretty decent set for Blunder. But does anyone have some recommendations on how to further improve engine play in closed positions? It's already hard enough to get Blunder to play sound quiet moves in normal positions, let alone closed ones. That seems to be the usual trend. A blunder can spot mid-level tactics pretty well, but it will still "blunder" when it's forced to play a careful positional game.

This was a classic anti-computer chess game:

- Close the position, making sure you have a space advantage (closed position + space advantage = beneficial for attacker)
- Sacrifice something, destroying king safety, especially if you have, or can easily create open lines
- Attack!

Because of the space advantage and open lines the attacker has more mobility and more tactical capabilities.

The problem is that Blunder gets its current rating from tactical play; as is the case for most lower-to-intermediate engines. A pawn is worth 100cp, which is so much that it outweighs any positional elements in the position. In short; because of the sacrifice on h6, Blunder gained an advantage of +100cp, and it does not yet have enough positional awareness to see that white has massive compensation for the material deficit.

The solution is not to try and focus on anything specific to make "Blunder play better in position type X". That is not going to work. Just add more positional knowledge:

- Bishop pair
- Mobility
- Space (available behind your pawns)
- Doubled / Tripled pawns
- (Half)open lines / rooks on (half)open lines
- Passed pawn, defended passed pawn
- King safety

If you add enough positional terms and then tune the values coming out of these terms, Blunder will be able to see the compensation gained from these terms and thus be able to also see positional sacrifices; both for itself and the opponent.

For example: if the engine is cramped (not enough mobility, not enough space behind its pawns) and the penality it gets for that is large, it could decide to sacrifice a pawn (-100cp), but gain mobility (+50cp), two rooks on open lines (+40cp), the bishop pair (+20cp), and give the oppenent a doubled pawn (+20 for Blunder, or -20 for the opponent), which would gain a positional advantage of +130cp. Thus it will sacrifice the pawn to break open the position to create counterplay or initiative.

Stockfish for example often amazes me. The position is completely equal in material, and it _SEEMS_ to be completely reasonable for both sides, and still Stockfish says (for example) +180cp for white; even though I can't even see why. Probably Stockfish can see deep enough to evaluate that there is _something_ in the position already which will either gain it a huge positional or material advantage down the road.

Madeleine Birchfield · Thu Nov 25, 2021 2:38 pm

Uri Blass wrote: ↑Wed Nov 24, 2021 11:38 pm
Madeleine Birchfield wrote: ↑Wed Nov 24, 2021 2:43 pm
Uri Blass wrote: ↑Wed Nov 24, 2021 7:16 am I disagree and for me it is not understanding.
For me understanding is something that you can explain why and not hide behind the words:"neural network"

If your engine have a static evaluation of +3 for white and you have no idea why because from your point of view the position is equal then it is not a positional understanding from my point of view.

I hate seeing a huge score without understanding the reason for it and I believe that this huge score is not good from practical point of view because in a game there is a big probablity that the opponent will have no idea how to translate the huge score to victory so from my point of view old engines are better for analysis in many positions because they can give a move that they do not understand that it is losing so there is a good chance that the human opponent is not going to understand how to win and not the new engine that believe I am playing against super players and if everything lose can suggest me a move that gives no practical chances.
That is all equally applicable for any closed source engine, where the average user does not have any access to the source code of the engine. The evaluation function is still a black box, regardless of whether it uses a neural network or only handcrafted evaluation, and the average user simply cannot explain anything beyond hiding behind the words "evaluation function".

And that is even true of open source engines; most people do not have the time or resources to open Stockfish's source code and see how its evaluation function works, so for most people, the evaluation function is functionally a black box to them as well.

There are many positions that handcrafted evaluation using engines would evaluate at +3 but any semi-decent chess player would tell you is an equal position.

There is a big difference between a chess engine or chess player understanding a position and a user/developer understanding how the evaluation function works in an engine, and the former is far more important to the average chess player than the latter. Grandmasters these days like Carlsen, Caurana, and Dubov have all said at some point that after the top engines adopted neural networks, the engines were able to understand positions found in i.e. the French and the KID much better than before. No grandmaster cares about how Stockfish's handcrafted evaluation works.
My experience is that I have more misunderstanding of evaluations of the new engines.
With the old engines cases when the engine said +5 when I had no idea why were rare.

With the new engines cases when the engine say +5 when I have no idea why are common.
I suspect that if you want to predict result of a game between humans with fide rating of 2000 then evaluation of fruit2.1 after searching to depth 10 when you translate it to expected result is going to give you a better prediction than evaluation of the new engines regardless of depth and it may be interesting to check it.

As mvanthoor said, I think that has more to do with the fact that newer engines have stronger searches that allow it to search more deeply than older engines.

mvanthoor · Post by **mvanthoor** » Thu Nov 25, 2021 2:53 pm

Madeleine Birchfield wrote: ↑Thu Nov 25, 2021 2:38 pm As mvanthoor said, I think that has more to do with the fact that newer engines have stronger searches that allow it to search more deeply than older engines.

Stockfish searches to depth 15 in less than a second on my 2018 mid-range phone... 20 years ago, that sort of time to depth was the expectation for a high-end computer.

Still, I stay with my original statement: if you want to become a better chess programmer, it's best to first implement a good handcrafted evaluation. We saw what happened with Maksim's BBC. That engine was only 2100, but after he stuck NNUE on top of it using a Stockfish net, it was close to 3000.

Blunder is close to 2400 already. If you'd stick a NNUE SF net on top of that and gain 800 Elo, just like BBC, the engine would be 3200 Elo. And what then... add some more search functionality to hit 3300 and be done? Then you'd have an engine that does half of its evaluation in a way you don't understand, and you'll have learned nothing. (PS: before Maksim takes my head off: I know NNUE was implemented in BBC to explain how this is done, not to get the engine to ~3000 as fast as possible.)

I think it's better to implement a good HCE and then play lots of games in gauntlets against many engines and then generate training positions from that data. Then you _will_ understand where the evaluation terms come from and the data will be your own engine's, so it won't lose all of its playing style.

algerbrex · Post by **algerbrex** » Thu Nov 25, 2021 4:57 pm

mvanthoor wrote: ↑Thu Nov 25, 2021 11:31 am This was a classic anti-computer chess game:

- Close the position, making sure you have a space advantage (closed position + space advantage = beneficial for attacker)
- Sacrifice something, destroying king safety, especially if you have, or can easily create open lines
- Attack!

Because of the space advantage and open lines the attacker has more mobility and more tactical capabilities.

The problem is that Blunder gets its current rating from tactical play;

I completely agree. I've had several strong players besides the NM tell me that they pretty easily got a superior position in the opening, but ended up losing because Blunder was able to find counter-play in the form of clever tactics.

mvanthoor wrote: ↑Thu Nov 25, 2021 11:31 am as is the case for most lower-to-intermediate engines. A pawn is worth 100cp, which is so much that it outweighs any positional elements in the position. In short; because of the sacrifice on h6, Blunder gained an advantage of +100cp, and it does not yet have enough positional awareness to see that white has massive compensation for the material deficit.

Yup, that about sums up the issue. I will say however that Blunder does have some understanding of relative material value. For example, the current texely tuned values Blunder uses in its evaluation have a pawn valued at 89 centipawn in the middle game, and depending on the square it's sitting on, a pawn might have a value as low as 60 centi-pawn. So I do think there's some level of understanding of compensation, particularly during the middle game. I have seen it play some games where it will accept temporarily giving up a pawn in order to get ahead in development, and eventually either win it back or have some other obvious form of compensation (e.g. a very exposed king).

The issue however as you pointed out is that it still doesn't understand other kinds of compensation very well beyond positional, and the understanding it does have if positional compensation is still very basic.

mvanthoor wrote: ↑Thu Nov 25, 2021 11:31 am The solution is not to try and focus on anything specific to make "Blunder play better in position type X". That is not going to work. Just add more positional knowledge:

- Bishop pair
- Mobility
- Space (available behind your pawns)
- Doubled / Tripled pawns
- (Half)open lines / rooks on (half)open lines
- Passed pawn, defended passed pawn
- King safety

If you add enough positional terms and then tune the values coming out of these terms, Blunder will be able to see the compensation gained from these terms and thus be able to also see positional sacrifices; both for itself and the opponent.

For example: if the engine is cramped (not enough mobility, not enough space behind its pawns) and the penality it gets for that is large, it could decide to sacrifice a pawn (-100cp), but gain mobility (+50cp), two rooks on open lines (+40cp), the bishop pair (+20cp), and give the oppenent a doubled pawn (+20 for Blunder, or -20 for the opponent), which would gain a positional advantage of +130cp. Thus it will sacrifice the pawn to break open the position to create counterplay or initiative.

Thanks, I see your point, especially with king safety. And honestly, that's the biggest issue I saw with the game. Yes, Blunder got itself into a bad position, but the mistake which lost on the spot was creating a pawn hook and allowing its king to become exposed and hunted down eventually.

I still haven't quite gotten king safety working yet to the level I'd like, but I can still tell the difference it makes. While testing another configuration, I watched it play a game yesterday against a version that didn't have any king safety, and it happily gave up a pawn, and rook for a bishop because it correctly realized its opponent's king was weak.

That's the kind of play I'd eventually like to see in Blunder consistently.

mvanthoor · Post by **mvanthoor** » Thu Nov 25, 2021 5:21 pm

algerbrex wrote: ↑Thu Nov 25, 2021 4:57 pm I completely agree. I've had several strong players besides the NM tell me that they pretty easily got a superior position in the opening, but ended up losing because Blunder was able to find counter-play in the form of clever tactics.

Tell me about it. I'm not a master-level player at 2000 Elo, but I'm sure I could beat engines up to 2400 CCRL or thereabout if I would just STOP trying to out-tactic the computer.

I can crush Rustic 4 (~2160 CCRL) if I just close the position and don't give the engine any way of gaining a tactical advantage. I just keep advancing square by square, keeping the position as closed as possible, gaining space behind my pawns. When I have enough space, I can transfer my pieces to where the enemy king is... but because:

1 - The lack of positional knowledge, the engine doesn't see what I'm doing. From its point of view, I'm just shuffling pieces behind my pawns.
2 - Even if if it COULD understand, it wouldn't be able to react because it has no room to move.

As soon as my pieces are in position, it's just... tick, tock, open file, *BOOM*.

I can go 60 moves against Rustic 4 where it says "about equal", and then have the game be over at move 65.

The only problem is... I'm not Karpov. I hate this style of playing. I like tactical positions. However, in the middle game, Rustic 4 is already able to go like 9-10 moves deep, and I can't keep up. I can maybe reach 7-8 on a good day in a tactical position, and then stuff starts to become unclear to me. So I often try, but I can't out-tactic engines that gan go more than ~8 moves deep in the middle game.

The same happens with the NM's or IM's. They can probably do tactics up to move 9-10, maybe 11-12 for the stronger IM's, but as Blunder can already go to depth 14-15, it's game over for the IM if he gets into a position where the outcome is decided by tactics. You just can't keep up with a computer when it comes to calculation, so you have to use knowledge... but at some point, the evaluation will have enough knowledge to rival than of an IM + all the better tactical capability, and then you'll have a grandmaster-level engine.

(Have you ever seen Nakamura do tactics on Youtube during analysis? My god.... he about starts to calculate AT move 12 or thereabouts. Still, Stockfish which easily goes down to move 24+ in the middle game, even in bullet, will take Nakamura down in short order...)

Thanks, I see your point, especially with king safety. And honestly, that's the biggest issue I saw with the game. Yes, Blunder got itself into a bad position, but the mistake which lost on the spot was creating a pawn hook and allowing its king to become exposed and hunted down eventually.

What you may have also seen is me mentioning pawn this, pawn that, open lines, etc... lots of things depend on the pawns. Also, pawns don't often move, so things like open lines now are probably the same as open lines in the next move.

My first step after implementing mobility (somewhere in Rustic 36, at the speed which I'm going...

) will be a pawn hash table. I'm not going to recalculate all that pawn stuff over and over again. Fortunately I already have a generic TT which can store any kind of data

If you get going with pawn stuff, do the pawn TT first, because recalculating stuff in the evaluation is massively expensive. If something is calculated in the evaluation and you can keep it incrementally, then do so. I added a "detect bishop pair" function to improve draw detection (1), which I can also use in the evaluation to see if the bishop pair bonus applies. I'm thinking about just keeping that as a variable so I don't have to determine if the bishop pair is on the board, over and over again... I just change this when a bishop gets captured or someone promotes to a bishop.

(1) (You can checkmate with the bishop pair against a lone king, but not with 5 bishops on the same colored square. The bishop pair is having at least two bishops, each on different colored squares, or you have a redundant piece with regard to checkmating.)

Madeleine Birchfield · Thu Nov 25, 2021 6:29 pm

mvanthoor wrote: ↑Thu Nov 25, 2021 2:53 pm Still, I stay with my original statement: if you want to become a better chess programmer, it's best to first implement a good handcrafted evaluation.

One does not program chess, one programs a chess engine. Everybody who writes an engine is a chess engine programmer, nobody is ever a 'chess' programmer. Unless I suppose if one works on a command line interface or GUI for humans or engines to play chess, which would make the people over at Lichess and chess.com and Chessbase and BanksiaGUI the 'chess programmers'.

Handcrafted evaluation, along with Texel tuning or Andrew Grant's gradient descent, is simply another category of machine learning models used in chess engine programming, similar to what was used in machine learning models in natural language processing or artificial computer vision before they similarly adopted neural networks in those fields.

mvanthoor wrote: ↑Thu Nov 25, 2021 2:53 pm We saw what happened with Maksim's BBC. That engine was only 2100, but after he stuck NNUE on top of it using a Stockfish net, it was close to 3000.

Blunder is close to 2400 already. If you'd stick a NNUE SF net on top of that and gain 800 Elo, just like BBC, the engine would be 3200 Elo. And what then... add some more search functionality to hit 3300 and be done? Then you'd have an engine that does half of its evaluation in a way you don't understand, and you'll have learned nothing. (PS: before Maksim takes my head off: I know NNUE was implemented in BBC to explain how this is done, not to get the engine to ~3000 as fast as possible.)

BBC is a poor example of implementing neural networks. The BBC author himself said that he wasn't interested in learning how to write his own inference code and architecture for BBC nor how to train his own nets, and was called out by the community for using Stockfish's net and Stockfish's architecture and Daniel Shawul's inference code. I highly doubt he even understands how NNUE works. As a result, the section about using Stockfish's NNUE in his BBC tutorial is fairly useless for actual chess engine programmers who wish to implement their own neural networks.

Better examples are Seer, which never had a handcrafted evaluation function. Connor McMonigle implemented an NNUE architecture by himself into Seer before NNUE was even merged into Stockfish. He then wrote his own trainer and used reinforcement learning to train his neural network. Ethereal, Koivisto, Berserk, Halogen, Zahek, Komodo Dragon, Revenge, Arasen, all other engines that have written their own inference code and architecture and trained their own nets. None of them have seen the insane jumps in elo that would have come from simply plugging in a Stockfish net into their engine. And those engines are what I would expect the Blunder author to follow if they were to implement neural networks in Blunder. However, the Blunder author said earlier in the thread that they didn't want to move to neural networks yet, as they wanted to get king safety and a few other features fully implemented in Blunder's handcrafted evaluation function first.

mvanthoor wrote: ↑Thu Nov 25, 2021 2:53 pm I think it's better to implement a good HCE and then play lots of games in gauntlets against many engines and then generate training positions from that data. Then you _will_ understand where the evaluation terms come from and the data will be your own engine's, so it won't lose all of its playing style.

You don't need a good HCE to generate training positions from that data. Blunder's current handcrafted evaluation function is good enough. Some of Stockfish's nets were trained using SimpleEval, which has no positional evaluation terms, only material evaluation terms, Seer's nets were trained using Lichess data, and both Leela's and Slowchess's current nets were trained starting from random play.

And playing style is not defined by an engine's handcrafted evaluation. Slowchess and Stoofvless still have very unique playing styles, compared to many other engines out there even though both engines solely use neural networks that are completely unrelated to the handcrafted evaluation that they used to have.

Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play

Re: Improving positional play