So what do we miss in the traditional evaluation?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: So what do we miss in the traditional evaluation?

Post by Ferdy »

Patrice Duhamel wrote: Sun Jan 31, 2021 11:49 am Can we learn something from NNUE networks, using some kind of "reverse engineering" ?

Maybe it's a bad idea but for example taking thousands of winning positions with passed pawns and looking at the highest values in hidden layers, then look at the inputs used to produce these values, is it possible to find new ideas for traditional evaluation ?
What I am working is create a plug-in to Lakas and optuna such that instead of doing engine vs engine matches, I will just let the engine analyze the training positions at certain depth or time. The training positions are just a <epd>,<nnue_eval>, the error would be engine_eval - nnue_eval, square it and get the mse, this mse is then sent to one of the optimizers of Lakas/Optuna, then ask the optimizer for the best param (for the engine) to try next. The optimizer will do its job to minimize the mse by giving us the best param to try next.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: So what do we miss in the traditional evaluation?

Post by maksimKorzh »

When I watch NNUE vs HCE matches I feel like NNUE follows some hidden patterns in developing a position, I guess that's the result of supervised learning. If we take human players - knowing patterns aka middle game plans is what makes GM different from amateur - both know what is "good" and what is "bad" but the difference in application of those knowledge. From ML perspective it gets even more clear - HCE can only handle linear dependencies between static eval and results while NNUE thankfully to it's hidden layers is capable of revealing non linear dependencies, e.g. non-deterministic behavior in similar positions. These hidden interactions is what I called "patterns" initially. So bearing in mind how crucial following these patterns is in human games it seems very natural that machines are making use of it as well.
Collingwood
Posts: 89
Joined: Sat Nov 09, 2019 3:24 pm
Full name: .

Re: So what do we miss in the traditional evaluation?

Post by Collingwood »

maksimKorzh wrote: Sun Jan 31, 2021 10:40 pm From ML perspective it gets even more clear - HCE can only handle linear dependencies between static eval and results while NNUE thankfully to it's hidden layers is capable of revealing non linear dependencies, e.g. non-deterministic behavior in similar positions.
Nonlinear dependencies have nothing to do with non-deterministic behavior in this context.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: So what do we miss in the traditional evaluation?

Post by maksimKorzh »

Collingwood wrote: Mon Feb 01, 2021 6:39 am
maksimKorzh wrote: Sun Jan 31, 2021 10:40 pm From ML perspective it gets even more clear - HCE can only handle linear dependencies between static eval and results while NNUE thankfully to it's hidden layers is capable of revealing non linear dependencies, e.g. non-deterministic behavior in similar positions.
Nonlinear dependencies have nothing to do with non-deterministic behavior in this context.
Sorry I just used wrong terms, I simply meant that with non-linear dependencies we can reveal non-linear correlations of various position parameters which
is impossible with linear approach because it's the same as trying to solve XOR gate with single layer perceptron.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: So what do we miss in the traditional evaluation?

Post by Gerd Isenberg »

maksimKorzh wrote: Mon Feb 01, 2021 2:11 pm
Collingwood wrote: Mon Feb 01, 2021 6:39 am
maksimKorzh wrote: Sun Jan 31, 2021 10:40 pm From ML perspective it gets even more clear - HCE can only handle linear dependencies between static eval and results while NNUE thankfully to it's hidden layers is capable of revealing non linear dependencies, e.g. non-deterministic behavior in similar positions.
Nonlinear dependencies have nothing to do with non-deterministic behavior in this context.
Sorry I just used wrong terms, I simply meant that with non-linear dependencies we can reveal non-linear correlations of various position parameters which
is impossible with linear approach because it's the same as trying to solve XOR gate with single layer perceptron.
Very good analogy.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: So what do we miss in the traditional evaluation?

Post by maksimKorzh »

Gerd Isenberg wrote: Mon Feb 01, 2021 9:31 pm
maksimKorzh wrote: Mon Feb 01, 2021 2:11 pm
Collingwood wrote: Mon Feb 01, 2021 6:39 am
maksimKorzh wrote: Sun Jan 31, 2021 10:40 pm From ML perspective it gets even more clear - HCE can only handle linear dependencies between static eval and results while NNUE thankfully to it's hidden layers is capable of revealing non linear dependencies, e.g. non-deterministic behavior in similar positions.
Nonlinear dependencies have nothing to do with non-deterministic behavior in this context.
Sorry I just used wrong terms, I simply meant that with non-linear dependencies we can reveal non-linear correlations of various position parameters which
is impossible with linear approach because it's the same as trying to solve XOR gate with single layer perceptron.
Very good analogy.
Thanks Gerd, I've learned a couple of smart words recently)))
Branko Radovanovic
Posts: 89
Joined: Sat Sep 13, 2014 4:12 pm
Location: Zagreb, Croatia
Full name: Branko Radovanović

Re: So what do we miss in the traditional evaluation?

Post by Branko Radovanovic »

Ferdy wrote: Sun Jan 31, 2021 1:59 pm
Patrice Duhamel wrote: Sun Jan 31, 2021 11:49 am Can we learn something from NNUE networks, using some kind of "reverse engineering" ?

Maybe it's a bad idea but for example taking thousands of winning positions with passed pawns and looking at the highest values in hidden layers, then look at the inputs used to produce these values, is it possible to find new ideas for traditional evaluation ?
What I am working is create a plug-in to Lakas and optuna such that instead of doing engine vs engine matches, I will just let the engine analyze the training positions at certain depth or time. The training positions are just a <epd>,<nnue_eval>, the error would be engine_eval - nnue_eval, square it and get the mse, this mse is then sent to one of the optimizers of Lakas/Optuna, then ask the optimizer for the best param (for the engine) to try next. The optimizer will do its job to minimize the mse by giving us the best param to try next.
To me, this is a very interesting approach. It's essentially mentor-assisted evaluation function optimization (see https://elidavid.com/pubs/mentor.pdf). I imagine I'd do it by implementing every eval term known to man (taking them, for example, from L. Tsvetkov's compendium). Once you calculate all features for say 1M positions, and you get the corresponding NNUE evals as the target variable, it's easy to use a RMSE method (GLM comes to mind; I wouldn't mess with GAs like the paper does) to find optimal weights for all these features. This is effectively tuning, but without any games actually being played.

My guess is NNUE is so good because it captures non-linear aspects of the position but, above that, it also very likely models very complex interactions between features. This is also why NNUE evals are probably very hard to reverse-engineer using straightforward machine learning methods; human expertise is definitely needed (that is true for machine learning in general, though).

I firmly believe that in principle it should be possible to create a traditional eval at least as good as NNUE eval (which means better than NNUE at the same speed, or as good but faster). NNUEs are better not because they're inherently more powerful, it's just that their training is vastly more efficient than endless SPRT/tuning runs. What traditional eval needs is a radically different optimization framework.
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: So what do we miss in the traditional evaluation?

Post by Harald »

To me, this is a very interesting approach. It's essentially mentor-assisted evaluation function optimization (see https://elidavid.com/pubs/mentor.pdf). I imagine I'd do it by implementing every eval term known to man (taking them, for example, from L. Tsvetkov's compendium). Once you calculate all features for say 1M positions, and you get the corresponding NNUE evals as the target variable, it's easy to use a RMSE method (GLM comes to mind; I wouldn't mess with GAs like the paper does) to find optimal weights for all these features. This is effectively tuning, but without any games actually being played.
Been there. Done that. (The first half without the NNUE training.)
It seems you have seen the other thread
Playing with "The Secret of Chess"
http://www.talkchess.com/forum3/viewtop ... =2&t=76453

With that I am now doing some Texel Tuning.
Branko Radovanovic
Posts: 89
Joined: Sat Sep 13, 2014 4:12 pm
Location: Zagreb, Croatia
Full name: Branko Radovanović

Re: So what do we miss in the traditional evaluation?

Post by Branko Radovanovic »

Harald wrote: Wed Feb 03, 2021 1:45 pm Been there. Done that. (The first half without the NNUE training.)
It seems you have seen the other thread
Playing with "The Secret of Chess"
http://www.talkchess.com/forum3/viewtop ... =2&t=76453

With that I am now doing some Texel Tuning.
Wow... I haven't seen that thread! I'll leave a comment there too.

The main difference between mentor-assisted optimization and Texel tuning is the target variable: eval from another engine vs. game result. The former is a bit of a "copycat" technique. Not sure which one would produce better results.
gaard
Posts: 447
Joined: Mon Jun 07, 2010 3:13 am
Location: Holland, MI
Full name: Martin W

Re: So what do we miss in the traditional evaluation?

Post by gaard »

Ferdy wrote: Fri Jan 29, 2021 3:57 pm We have the passed pawn, mobility, kingsafety, piece value, pst, piece threats etc. There must be something big that we failed to include in the traditional eval as the gap of nnue and non-nnue is huge.
I'm interested in finding equal positions where the NNUE and non-NNUE evaluations vary greatly. Maybe if enough positions were collected a discernible pattern could be found, maybe not.