Stockfish Development Ver 8/25 - The decay sets in

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

CornfedForever
Posts: 650
Joined: Mon Jun 20, 2022 4:08 am
Full name: Brian D. Smith

Stockfish Development Ver 8/25 - The decay sets in

Post by CornfedForever »

I....'think' I get what this is trying to do, but I am wondering if it really gets them anything. Thoughts?
Does this 'decay' ...ahem...simply smell or does it smell ever so slightly like ShashChess?

***********************************************************************************
Author: Stéphane Nicolet
Date: Thu Aug 24 08:11:17 2023 +0200
Timestamp: 1692857477

Play turbulent when defending, simpler when attacking

This patch decays a little the evaluation (up to a few percent) for
positions which have a large complexity measure (material imbalance,
positional compensations, etc).

This may have nice consequences on the playing style, as it modifies
the search differently for attack and defense, both effects being
desirable:

- to see the effect on positions when Stockfish is defending, let us
suppose for instance that the side to move is Stockfish and the nnue
evaluation on the principal variation is -100 : this patch will decay
positions with an evaluation of -103 (say) to the same level, provided
they have huge material imbalance or huge positional compensation.
In other words, chaotic positions with an evaluation of -103 are now
comparable in our search tree to stable positions with an evaluation
of -100, and chaotic positions with an evaluation of -102 are now
preferred to stable positions with an evaluation of -100.

- the effect on positions when Stockfish is attacking is the opposite.
Let us suppose for instance that the side to move is Stockfish and the
nnue evaluation on the principal variation is +100 : this patch will
decay the evaluation to +97 if the positions on the principal variation
have huge material imbalance or huge positional compensation. In other
words, stable positions with an evaluation of +97 are now comparable
in our search tree to chaotic positions with an evaluation of +100,
and stable positions with an evaluation of +98 are now preferred to
chaotic positions with an evaluation of +100.

So the effect of this small change of evaluation on the playing style
is that Stockfish should now play a little bit more turbulent when
defending, and choose slightly simpler lines when attacking.

passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 268448 W: 68713 L: 68055 D: 131680 Elo +0.85
Ptnml(0-2): 856, 31514, 68943, 31938, 973
https://tests.stockfishchess.org/tests/ ... 12526653ed

passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 141060 W: 36066 L: 35537 D: 69457 Elo +1.30
Ptnml(0-2): 71, 15179, 39522, 15666, 92
https://tests.stockfishchess.org/tests/ ... 7747553725

closes https://github.com/official-stockfish/S ... /pull/4762
Stephen Ham
Posts: 2504
Joined: Wed Mar 08, 2006 9:40 pm
Location: Eden Prairie, Minnesota
Full name: Stephen Ham

Stockfish Development Ver 8/25 - The decay sets in

Post by Stephen Ham »

Hi Brian,

I agree. I've long assumed that the confusing posts at SF's website are due to its authors being non-Anglophones. Their English communications still require require translations. But I appreciate that they're doing the best they can in a foreign language.

Like you, I think I know what Stéphane Nicolet (surely a Francophone) tried to communicate. But if my interpretation of his post is correct, I question how it's even rational.

For example, "Decay the evaluation..." - what the heck does that mean? Reduce the evaluation?
"...chaotic positions...". Does he mean complex?
"...play a little bit more turbulent..." - again, what's that mean? How do "chaotic positions" positions differ from those he deems "turbulent"? My guess is he wants to increase complexity when defending. But, how is that rational? The NNUE evaluation is what it is. It's objective and empirical output from an engine far stronger than any human. Instead, changing the output seems arbitrary and "human". How does it make sense to select a line with a lower evaluation?

Nonetheless, their Fishtest results suggest that their modification grows the elo by nearly 3 points, so it doesn't seem to harm SF's results. Still, these Fishtest results are performed at super-fast time-controls, even their LTC. I wonder what a more realistic time-control result is. Nonetheless, I understand that they lack the time for such tests. So, I'm willing to accept this result.

Regarding playing style, many of us think SF's playing style is "safety first" and thus too drawish. I'd instead prefer to see SF always select more complex lines ("chaotic"?, "turbulent"?) when the options are roughly equivalent. Being the strongest chess engine, all of its opponents are inferior. So, it should always avoid draws as much as possible.
Uri Blass
Posts: 11081
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Stockfish Development Ver 8/25 - The decay sets in

Post by Uri Blass »

Stephen Ham wrote: Sun Aug 27, 2023 2:38 am Hi Brian,

I agree. I've long assumed that the confusing posts at SF's website are due to its authors being non-Anglophones. Their English communications still require require translations. But I appreciate that they're doing the best they can in a foreign language.

Like you, I think I know what Stéphane Nicolet (surely a Francophone) tried to communicate. But if my interpretation of his post is correct, I question how it's even rational.

For example, "Decay the evaluation..." - what the heck does that mean? Reduce the evaluation?
"...chaotic positions...". Does he mean complex?
"...play a little bit more turbulent..." - again, what's that mean? How do "chaotic positions" positions differ from those he deems "turbulent"? My guess is he wants to increase complexity when defending. But, how is that rational? The NNUE evaluation is what it is. It's objective and empirical output from an engine far stronger than any human. Instead, changing the output seems arbitrary and "human". How does it make sense to select a line with a lower evaluation?

Nonetheless, their Fishtest results suggest that their modification grows the elo by nearly 3 points, so it doesn't seem to harm SF's results. Still, these Fishtest results are performed at super-fast time-controls, even their LTC. I wonder what a more realistic time-control result is. Nonetheless, I understand that they lack the time for such tests. So, I'm willing to accept this result.

Regarding playing style, many of us think SF's playing style is "safety first" and thus too drawish. I'd instead prefer to see SF always select more complex lines ("chaotic"?, "turbulent"?) when the options are roughly equivalent. Being the strongest chess engine, all of its opponents are inferior. So, it should always avoid draws as much as possible.

I believe that the NNUE evaluation of stockfish is relativelty bad when one side has a big material advantage.
It is known that it is better to trade when you have material advantage but stockfish does not know it and even with queen and 2 rooks advantage it does not evaluate less material as better inspite of the fact that I am relatively sure that white mate in less moves with less material on the board.

FEN: 1nb1kbn1/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQ - 0 1

Stockfish_23082406_x64_avx2:
NNUE evaluation using nn-c38c3d8d3920.nnue
1/1 00:00 759 127k +11.99 Nb1-a3
2/2 00:00 899 150k +14.63 Nb1-a3 g7-g6
3/3 00:00 935 156k +14.85 Nb1-a3 d7-d6 Na3-b5
4/4 00:00 960 160k +14.85 Nb1-a3 d7-d6 Na3-b5
5/4 00:00 989 165k +14.85 Nb1-a3 d7-d6 Na3-b5
6/4 00:00 2k 71k +14.85 Nb1-a3 d7-d6 Na3-b5 d6-d5
7/6 00:00 3k 100k +14.55 Nb1-a3 e7-e6 Na3-b5 Ng8-f6 c2-c3
8/8 00:00 8k 194k +14.48 Nb1-a3 e7-e5 Na3-b5 Ke8-d8 c2-c3 e5-e4
9/10 00:00 9k 211k +14.35 Nb1-a3 e7-e5 Na3-b5 Ke8-d8 c2-c3 e5-e4 Qd1-c2 a7-a6
10/9 00:00 10k 224k +13.88 Nb1-a3 e7-e5 Na3-b5 a7-a6 Nb5-a7 Bf8-c5
11/9 00:00 16k 296k +13.59 Nb1-a3 e7-e5 Na3-b5 a7-a6 Nb5-c3 d7-d5 d2-d3
12/13 00:00 64k 634k +14.16 Nb1-c3 d7-d5 a2-a3 e7-e5 e2-e3 Ng8-f6 Ng1-e2 e5-e4 Ra1-a2 c7-c6
13/13 00:00 111k 936k +13.83 Nb1-c3 e7-e5 g2-g3 d7-d5 Ra1-b1
14/11 00:00 221k 1,392k +13.57 Nb1-c3 e7-e5 g2-g3 d7-d5 a2-a3 Ng8-f6 Ra1-a2
15/18 00:00 801k 2,609k +13.50 Nb1-c3 e7-e5 a2-a4 Ng8-f6 e2-e4 d7-d5 e4xd5 Ke8-d8 Qd1-e2 e5-e4 Qe2-e3
16/23 00:00 1,752k 2,975k +13.45 Nb1-c3 d7-d5 Ra1-b1 e7-e5 b2-b4 Ng8-f6 Rb1-b3 e5-e4 e2-e3 Bf8-e7 h2-h3
17/18 00:00 2,019k 3,026k +13.75 Nb1-c3 d7-d5 e2-e4 e7-e6 a2-a3 d5xe4 d2-d4 Ng8-f6
18/21 00:00 2,344k 3,036k +13.48 Nb1-c3 d7-d5 e2-e4 d5xe4 d2-d3 e4-e3 Bc1xe3 g7-g6 d3-d4 Ng8-f6 Ra1-c1 Nf6-g4
19/33 00:01 3,409k 3,244k +13.59 Nb1-a3 e7-e6 Ra1-b1 Bf8xa3 b2xa3 Nb8-c6 Ng1-f3 Ng8-e7 c2-c3 e6-e5
20/25 00:01 4,754k 3,318k +13.67 Nb1-a3 e7-e5 Ra1-b1 d7-d5 c2-c3 Bf8xa3 b2xa3 Nb8-c6 Rb1-b2 e5-e4 d2-d3 e4-e3 Ng1-f3 Bc8-g4 Bc1xe3 Bg4xf3
21/24 00:01 5,934k 3,343k +13.56 Nb1-a3 d7-d5 Ra1-b1 e7-e5 c2-c3 Ng8-f6 b2-b4 Nb8-c6 Rb1-b2 e5-e4 f2-f3 Nc6xb4 c3xb4 Ke8-d8 Qd1-b3 Bf8-d6
22/27 00:02 10,217k 3,616k +13.75 d2-d4 e7-e5 d4-d5 Ng8-f6 a2-a3 Nf6-e4 f2-f3 d7-d6 f3xe4 Bf8-e7 Ng1-f3
23/30 00:07 23,071k 3,265k +13.63 d2-d4 d7-d5 Nb1-c3 e7-e6 a2-a3 Ng8-f6 f2-f3 c7-c5 e2-e3 c5xd4 e3xd4 Nb8-c6 Bf1-b5 Bf8-e7 Qd1-e2 a7-a6 Bb5-a4 b7-b5 Ba4-b3 Nc6xd4
24/30 00:09 32,000k 3,342k +13.60 d2-d4 d7-d5 Nb1-c3 e7-e5 d4xe5 Bf8-b4 Bc1-d2 c7-c6 a2-a3 Bb4-e7 Bd2-c1 Bc8-g4 h2-h3 Bg4-h5 g2-g4 d5-d4 Nc3-e4
25/36 00:13 44,213k 3,374k +13.58 e2-e4 d7-d5 e4xd5 e7-e6 d2-d4 Ng8-f6 Nb1-d2 e6xd5 Nd2-b3 Bf8-e7 c2-c3 Bc8-g4 Ng1-f3 Nf6-e4 Bc1-f4 Bg4xf3 g2xf3 Ne4xf2 Ke1xf2 Ke8-f8 Bf4xc7 h7-h5
26/41 00:19 67,303k 3,461k +13.46 e2-e4 d7-d5 e4xd5 e7-e6 d5xe6 Bc8xe6 Nb1-c3 Nb8-c6 Ng1-e2 Ng8-f6 a2-a3 Nf6-d5 Ne2-g3 Bf8-d6 Nc3xd5 Be6xd5 d2-d4 Bd6xg3 h2xg3 Ke8-d7 Rh1xh7 Kd7-e8 Qd1-e2+ Ke8-d7
27/35 00:19 68,593k 3,463k +13.53 e2-e4 d7-d5 e4xd5 Ng8-f6 Nb1-c3 c7-c6 Ng1-e2 c6xd5 d2-d4 Nb8-c6 a2-a3 e7-e5 d4xe5 Nc6xe5 Ne2-d4 Bf8-c5 Nd4-b3 Bc5xf2+ Ke1xf2 Bc8-g4 Qd1-e1
28/42 00:23 84,116k 3,509k +13.46 e2-e4 d7-d5 e4xd5 Ng8-f6 Nb1-c3 c7-c6 Ng1-e2 c6xd5 d2-d4 Nb8-c6 a2-a3 e7-e5 d4xe5 Nc6xe5 Ne2-d4 Bf8-c5 Nd4-b3 Bc5-e7 Bf1-b5+ Ke8-f8 Bc1-g5 Bc8-g4 Qd1-d2 Ne5-c4 Bb5xc4 d5xc4
29/41 00:27 99,603k 3,588k +13.46 e2-e4 d7-d5 e4xd5 Ng8-f6 Nb1-c3 c7-c6 d2-d4 c6xd5 a2-a3 Nb8-c6 f2-f3 e7-e6 Ra1-a2 Bf8-e7 b2-b4 e6-e5 d4xe5 Nc6xe5 Bc1-f4 Ne5-c6 Nc3-b5 Nf6-h5 Bf4-d6 Be7xd6 Nb5xd6+ Ke8-e7 Nd6xc8+ Ke7-f8 c2-c3 d5-d4 Bf1-b5 a7-a6 Nc8-b6

I removed bishops and knights and the evaluation became smaller instead of bigger.



FEN: 4k3/pppppppp/8/8/8/8/PPPPPPPP/R2QK2R w KQ - 0 1

Stockfish_23082406_x64_avx2:
NNUE evaluation using nn-c38c3d8d3920.nnue
1/1 00:00 10k 605k +7.78 e2-e3
2/2 00:00 10k 614k +11.99 f2-f3
3/3 00:00 11k 618k +12.13 f2-f3 c7-c5 c2-c3
4/4 00:00 11k 626k +12.58 c2-c3 c7-c6
5/5 00:00 11k 634k +11.67 c2-c3 c7-c6 Qd1-b3 d7-d5 Qb3xb7
6/6 00:00 11k 644k +11.67 c2-c3 c7-c6 Qd1-b3 d7-d5 Qb3xb7 Ke8-f8
7/8 00:00 13k 749k +12.13 f2-f3 f7-f5 c2-c3 b7-b5 Qd1-b3
8/6 00:00 14k 781k +10.27 f2-f3 Ke8-d8
9/7 00:00 19k 913k +11.24 f2-f3 d7-d5 Rh1-f1 Ke8-d7 Rf1-f2 Kd7-d8
10/8 00:00 21k 968k +11.71 c2-c3 Ke8-d8 Qd1-a4 d7-d5 Qa4xa7
11/8 00:00 40k 1,446k +12.23 c2-c3 Ke8-d8 Qd1-a4 d7-d5 Rh1-f1
12/14 00:00 164k 2,075k +12.43 f2-f3 a7-a6 Rh1-f1 f7-f6 c2-c3 c7-c6 a2-a3 Ke8-f7 g2-g3
13/12 00:00 320k 2,442k +12.50 f2-f3 a7-a6 Rh1-f1 d7-d5 c2-c3 f7-f5 Rf1-f2 Ke8-d7
14/20 00:00 790k 2,753k +12.43 f2-f3 d7-d5 Rh1-f1 c7-c6 c2-c3 f7-f5 Ra1-b1 Ke8-f7 f3-f4 g7-g6 e2-e3
15/18 00:00 967k 2,819k +12.29 f2-f3 d7-d5 Rh1-f1 f7-f6 c2-c3 Ke8-f7 Rf1-f2 f6-f5 f3-f4 g7-g5 Qd1-c2 Kf7-g8 Qc2-a4
16/21 00:01 2,429k 1,239k +12.32 f2-f3 d7-d5 Rh1-f1 f7-f6 Rf1-f2 Ke8-f7 c2-c3 f6-f5 Ra1-b1 Kf7-g8 Qd1-a4 c7-c6 Qa4-d1 f5-f4 c3-c4
17/25 00:07 6,306k 871k +12.62 a2-a4 c7-c5 c2-c4 d7-d5 Ra1-a3 d5xc4 Ra3-c3 a7-a6 Rh1-f1 f7-f5 f2-f4 a6-a5 Rc3xc4 h7-h6
18/26 00:10 9,544k 894k +12.62 a2-a4 d7-d5 Rh1-f1 c7-c5 Ra1-a3 f7-f5 f2-f4 Ke8-f7 Ra3-g3 h7-h5 Rg3-h3 c5-c4 c2-c3 Kf7-e8 a4-a5 Ke8-f7 Rh3-e3
19/25 00:11 10,844k 939k +12.76 a2-a4 d7-d5 Rh1-f1 c7-c5 f2-f4 c5-c4 Ra1-a3 f7-f5 Ra3-h3 Ke8-f7 a4-a5 Kf7-g8 c2-c3 h7-h5 Rh3xh5
20/29 00:16 15,417k 916k +12.68 a2-a4 d7-d5 Ra1-a3 c7-c5 Rh1-f1 c5-c4 f2-f4 g7-g6 Ra3-h3 h7-h5 c2-c3 f7-f5 a4-a5 Ke8-f7 Rh3-e3 a7-a6 Re3-g3 e7-e6 Rg3-h3 Kf7-g8 Rh3-e3 Kg8-f7
21/33 00:24 21,948k 892k +12.86 a2-a4 d7-d5 f2-f4 c7-c5 Rh1-f1 a7-a5 Ra1-a3 g7-g6 Ra3-h3 f7-f5 d2-d4 c5-c4 Qd1-c1 h7-h5 Rh3-c3 Ke8-f7 Rf1-f3 e7-e6 Rc3-e3 Kf7-f6 Re3-e5
22/33 00:32 28,481k 882k +12.95 a2-a4 d7-d5 f2-f4 c7-c5 Rh1-f1 f7-f6 Ra1-a3 c5-c4 a4-a5 Ke8-f7 d2-d4 e7-e6 Ra3-c3 f6-f5 Rc3-h3 Kf7-g8 Rh3-e3 h7-h5 Qd1-c1 g7-g6 Rf1-f3 b7-b5
23/28 00:34 32,693k 942k +12.94 a2-a4 d7-d5 Ra1-a3 c7-c6 f2-f4 c6-c5 Rh1-f1 c5-c4 Ra3-e3 f7-f5 c2-c3 h7-h5 a4-a5 Ke8-f7 Re3-e5 g7-g6 Re5-e3 e7-e6 Qd1-a4
24/32 00:37 44,307k 1,177k +12.93 a2-a4 d7-d5 Ra1-a3 c7-c6 f2-f4 c6-c5 Rh1-f1 a7-a5 e2-e3 e7-e6 e3-e4 d5xe4 Qd1-e2 Ke8-f8 Qe2xe4 f7-f5
25/36 00:49 101,537k 2,045k +13.12 a2-a4 d7-d5 f2-f4 c7-c5 Rh1-f1 c5-c4 Ra1-a3 d5-d4 Ra3-h3 e7-e6 c2-c3 Ke8-f8 c3xd4 Kf8-g8 Qd1-c1 g7-g6
26/42 01:04 161,425k 2,509k +13.04 a2-a4 d7-d5 Ra1-a3 e7-e5 Rh1-f1 Ke8-f8 e2-e4 Kf8-g8 e4xd5 e5-e4 Qd1-e2 h7-h6 g2-g4 f7-f6 f2-f4 e4xf3/ep
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: Stockfish Development Ver 8/25 - The decay sets in

Post by carldaman »

Stephen Ham wrote: Sun Aug 27, 2023 2:38 am
Regarding playing style, many of us think SF's playing style is "safety first" and thus too drawish. I'd instead prefer to see SF always select more complex lines ("chaotic"?, "turbulent"?) when the options are roughly equivalent. Being the strongest chess engine, all of its opponents are inferior. So, it should always avoid draws as much as possible.
Completely agree about the SF playing style being safety first, having the strongest engine ever playing as if not to lose.
Something to be improved upon there!
syzygy
Posts: 5801
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish Development Ver 8/25 - The decay sets in

Post by syzygy »

Stephen Ham wrote: Sun Aug 27, 2023 2:38 amFor example, "Decay the evaluation..." - what the heck does that mean? Reduce the evaluation?
"...chaotic positions...". Does he mean complex?
"...play a little bit more turbulent..." - again, what's that mean? How do "chaotic positions" positions differ from those he deems "turbulent"? My guess is he wants to increase complexity when defending. But, how is that rational? The NNUE evaluation is what it is. It's objective and empirical output from an engine far stronger than any human. Instead, changing the output seems arbitrary and "human". How does it make sense to select a line with a lower evaluation?
Chaotic/turbulent/complex all means the same thing here.

From how I understand the commit message, the problem lies with the choice of the terms "attack" and "defend". Instead of "attacking" and "defending" read: "when being ahead" and "when being behind" (in evaluation).

So when SF has the choice between two positions evaluated as -100cp and -102cp according to the NNUE eval, it may now pick the -102cp position if that position is "more complex". This is achieved by making a small adjustment to the NNUE score based on "material imbalance, positional compensations, etc." (which may e.g. adjust the -102cp score to -99cp and leave the -100cp score at -100cp).
Stephen Ham
Posts: 2504
Joined: Wed Mar 08, 2006 9:40 pm
Location: Eden Prairie, Minnesota
Full name: Stephen Ham

Stockfish Development Ver 8/25 - The decay sets in

Post by Stephen Ham »

Thank you Syzygy. Well done!

Your proposed translation communicates far better than what's posted at SF's website. I also agree that the website's word choices of attacking/defending were poor, and should instead be replaced by whether SF's self-evaluation is positive or negative. The message is now clear.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Stockfish Development Ver 8/25 - The decay sets in

Post by dangi12012 »

syzygy wrote: Tue Aug 29, 2023 1:31 am So when SF has the choice between two positions evaluated as -100cp and -102cp according to the NNUE eval, it may now pick the -102cp position if that position is "more complex". This is achieved by making a small adjustment to the NNUE score based on "material imbalance, positional compensations, etc." (which may e.g. adjust the -102cp score to -99cp and leave the -100cp score at -100cp).
If you have the choice you take your opponent into a deep dark forrest to beat them up when the alternative is certain defeat.
You trade certain+worse against probably+worse.

Shows the need for more dimensions in the score than 1. CP is ambiguous in that regard.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
abgursu
Posts: 92
Joined: Thu May 14, 2020 3:34 pm
Full name: A. B. Gursu

Re: Stockfish Development Ver 8/25 - The decay sets in

Post by abgursu »

carldaman wrote: Sun Aug 27, 2023 6:24 pm
Stephen Ham wrote: Sun Aug 27, 2023 2:38 am
Regarding playing style, many of us think SF's playing style is "safety first" and thus too drawish. I'd instead prefer to see SF always select more complex lines ("chaotic"?, "turbulent"?) when the options are roughly equivalent. Being the strongest chess engine, all of its opponents are inferior. So, it should always avoid draws as much as possible.
Completely agree about the SF playing style being safety first, having the strongest engine ever playing as if not to lose.
Something to be improved upon there!
Maybe bringing back the contempt?
I've tested it myself with some different values and I observed that most of the time it didn't ended up bad, sometimes had hard times against official sf but played much more complexly(sometimes more fortressy) overall, and I didn't notice any significent change in results. But to bring it officially back to Stockfish requires much more serious and in-depth testings than what I've done, and probably with multiple opponents with different levels to see the effect objectively.
syzygy
Posts: 5801
Joined: Tue Feb 28, 2012 11:56 pm

Re: Stockfish Development Ver 8/25 - The decay sets in

Post by syzygy »

Stephen Ham wrote: Tue Aug 29, 2023 8:43 pm Thank you Syzygy. Well done!

Your proposed translation communicates far better than what's posted at SF's website. I also agree that the website's word choices of attacking/defending were poor, and should instead be replaced by whether SF's self-evaluation is positive or negative. The message is now clear.
And I'll now have a look at the code to make sure it corresponds to my understanding of the commit message :-)
alvinypeng
Posts: 36
Joined: Thu Mar 03, 2022 7:29 am
Full name: Alvin Peng

Re: Stockfish Development Ver 8/25 - The decay sets in

Post by alvinypeng »

dangi12012 wrote: Wed Aug 30, 2023 5:37 pm
syzygy wrote: Tue Aug 29, 2023 1:31 am So when SF has the choice between two positions evaluated as -100cp and -102cp according to the NNUE eval, it may now pick the -102cp position if that position is "more complex". This is achieved by making a small adjustment to the NNUE score based on "material imbalance, positional compensations, etc." (which may e.g. adjust the -102cp score to -99cp and leave the -100cp score at -100cp).
If you have the choice you take your opponent into a deep dark forrest to beat them up when the alternative is certain defeat.
You trade certain+worse against probably+worse.

Shows the need for more dimensions in the score than 1. CP is ambiguous in that regard.
Replacing a scalar CP score with a WDL vector seems to be an obvious area to experiment for improvement. I imagine it would be a lot more painful to write a HCE that outputs WDL instead of CP. But with NNUEs, training a WDL evaluation is just a softmax classification problem.

Of course, you would need to convert the WDL into a scalar value eventually in order for AB search to work.