Orion 0.7 : NNUE experiment

Frank Quisinsky · Post by **Frank Quisinsky** » Mon Aug 24, 2020 3:02 pm

Hi there,

the problem with NNUE is that it's harder to compare engine strength with non NNUE engines.

With the final result that all engines used NNUE should be run in an own rating list.

Means:
All engines should have the same conditions for engine testing (tournaments, rating list systems).
All with the same hashtables, all with 4-men or all with 5-men, all with the same book, all with contempt = 0 and so on.

In my opinion it make no sense to test NNUE engines with engines not used NNUE.

Or I can make the chaos perfect and start a bigger confusion if I test different engines with 4-men, different others with 5-men, or for group of engines I used 1 min. pro move for an other group 10 minutes / move and added all in one list.

For engine testing:
All should have the same conditions or nothing is to compare with others.

If programmers will start the way with NNUE, we have from this group of engines the last NON NNUE version for testing.
I think the NNUE engines (not important if a programmer goes here an own way or used the good known sources) should run in a new rating list.

Made no sense to compare:
Andscacs 1.95 NON NNUE with Igel 2.7.0 NNUE.

Really a big problem because a lot of engine programmers have strong engines but not working on it.

For a modern way of engine development (I think NNUE is interesting) we should go a modern way of testing. I will do that if more NNUE engines are available.

Best
Frank

Sylwy · Post by **Sylwy** » Mon Aug 24, 2020 3:45 pm

Frank Quisinsky wrote: ↑Mon Aug 24, 2020 3:02 pm Hi there,

the problem with NNUE is that it's harder to compare engine strength with non NNUE engines.

With the final result that all engines used NNUE should be run in an own rating list.

Gens una sumus. The motto of FIDE, the World Chess Federation - literally "we are one people", it's usually translated into English as "we are one family". Not two, or three, or four...........

Maybe better & clear test conditions.

From years the rating lists are full of nn-GPU chess engines. Without any problem. Take a look to the current top of CCRL-blitz rating list:

And yes, the era of classical chess engines is over. All will remain, in time, at the bottom of the rating lists. This is life. Only collectors remained. I'm one.

Frank Quisinsky · Post by **Frank Quisinsky** » Mon Aug 24, 2020 4:30 pm

Hi,

not sure with "The era of classical chess engines is over".
Maybe for yourself?

Classical chess engines = wrong difination.
Right defination = computer chess engines

Not very interesting the extraction of CCRL rating list.
Because, most of that if the same ...
Very bad for all other engines in the list because the final result from such testing methods is an Elo inflation.
To tell the fortune with cards are maybe more interesting as to read Elo here?

But I think too that most of engines, still under development, will use NNUE.
Most of programmers with interest on NNUE are programmers using other good known ideas also.
This explain that many engines today started with 2.500 or an higher Elo.

If I remember the best amateurs around the year 2000 have 2.400 Elo.
Sure that not 50 or more genius-people was born in the last 20 years.

I think that is more the new era to use good known ideas.
Here I can give to 100% my agreement.
And NNUE is a fantastic example for it.
Can be used with a clear consicience.

In times Fruit are available ...
To used Fruit codes without to give an information about it are gentlemen like crime.
Many programmers will like more to used under a clear consicience as a gentlemen like crime.

That's my own opinion about it.
Again: 50 or more times a genius was born ...
I am far from believing it.

Again ...
I think we don't need chess programs stronger as Shredder 12 with around 2.800 Elo for self playing.
No human will have a chance vs. programs with higher Elo as 2850.

For analyzes of chess games stronger chess programs are interesting.
But I don't need 200 of such programs.

Sure if all the NNUE programs are later in an own rating list most of people have interest to test place 1 - maybe place 6 - 8. Because all of them are 400-700 Elo stronger as World Champion Carlsen. What can the people do with it. The people can say ... looking Nakamura vs. Carlson, so many mistakes are in the games. That's what people do with it without to understand the chess the engines can play.

NNUE isn't the end of the era from _classical chess engines_??!

Classical chess engines is wrong: "Computer chess engines" is more right.
NNUE and LCO are more: Special computer chess engines or we can say "Modern chess engines, used most of times good known ideas".

For LCO: I am thinking on the book about the topic (Nighmare programmer give it others in Leiden 1999 or 1998).
Not new ideas ... so better to write is:

Modern chess engines, used most of times good known ideas.
Classcial chess engines or better "Computer Chess Engines" should be programs the programmer used own ideas.

Copy and Paste in chess programming ...
Not an art in my eyes.

Best
Frank

Sylwy · Post by **Sylwy** » Mon Aug 24, 2020 6:18 pm

Frank Quisinsky wrote: ↑Mon Aug 24, 2020 4:30 pm Hi,

not sure with "The era of classical chess engines is over".
Maybe for yourself?

Classical chess engines = wrong difination.
Right defination = computer chess engines

Yes, I'm sure It's over. A few classic chess engines (100% handcrafted evaluation) will appear. Less and less.
Computer chess engines-all sorts (handcrafted evaluation, hybrid, pure NNUE, nn-GPU & many more) will exist for a long time. The classic ones (again, the handcrafted evaluation and even the hybrids) they will soon be history. Just my opinion !

Best
SilvianR

MikeB · Post by **MikeB** » Mon Aug 24, 2020 6:52 pm

Frank Quisinsky wrote: ↑Mon Aug 24, 2020 3:02 pm Hi there,

the problem with NNUE is that it's harder to compare engine strength with non NNUE engines.

With the final result that all engines used NNUE should be run in an own rating list.

Means:
All engines should have the same conditions for engine testing (tournaments, rating list systems).
All with the same hashtables, all with 4-men or all with 5-men, all with the same book, all with contempt = 0 and so on.

In my opinion it make no sense to test NNUE engines with engines not used NNUE.

Or I can make the chaos perfect and start a bigger confusion if I test different engines with 4-men, different others with 5-men, or for group of engines I used 1 min. pro move for an other group 10 minutes / move and added all in one list.

For engine testing:
All should have the same conditions or nothing is to compare with others.

If programmers will start the way with NNUE, we have from this group of engines the last NON NNUE version for testing.
I think the NNUE engines (not important if a programmer goes here an own way or used the good known sources) should run in a new rating list.

Made no sense to compare:
Andscacs 1.95 NON NNUE with Igel 2.7.0 NNUE.

Really a big problem because a lot of engine programmers have strong engines but not working on it.

For a modern way of engine development (I think NNUE is interesting) we should go a modern way of testing. I will do that if more NNUE engines are available.

Best
Frank

HI Frank - you argument is a little bit silly -- how about ratings using check extensions and not using check extensions. It's a slippery slope my friend and it simply doesn't make sense. You're going to see an explosion of NN engines - with open source SF - as opposed to closed source Alpha Zero, it is the new paradigm.

All the best,
MikeB

Frank Quisinsky · Post by **Frank Quisinsky** » Mon Aug 24, 2020 7:19 pm

Hi Mike,

the arguments are fact.
But nobody will read it and for sure nobody of puzzle programmers will read it.

"Puzzle programmers" isn't a word combination by myself.
Word combination used by hardcore programmers.
Hardcore programmers do the main work, the puzzle programmers using.

The reason I will not speak about a new era in computer chess if "puzzle programmers" try to create it.
Without the main work programmers do all the years, no sources for puzzle programmers.

Hybrid engines is also a better word for that we called at the moment "NNUE".
NNUE is a bad combination of letters ...

Better is:
Stockfish Hybrid ...
Pedone Hybrid ...
Minic Hybrid ...
Igel Hybrid ...

A new word, nobody understand for the moment and looks complicated for the second time of reading, is more or less the class of our younger generation. With full power _new things_ should be more interesting for others!

The younger generation are thinking often ...
Let us give it a new word, nobody understand and we can start the interest of others faster.
In realtiy a very old commercial trick!

We have a word for it:
"Hybrid"

And for the group of Hybrid engines we need a new rating list later.
To mix Hybrid with computer chess engines is a bad idea.

Best
Frank

David Carteau · Post by **David Carteau** » Wed Aug 26, 2020 2:31 pm

Sylwy wrote: ↑Mon Aug 24, 2020 2:47 pm Waiting for your NNUE engine !

https://www.lrz.de/services/compute/cou ... tro_AI.pdf

Thanks a lot for sharing !

On my side, I managed to implement a very preliminary version of my neural network trainer based on PBIL. I'm not even sure that it works : there should be tons of bugs, but let's see !

First results in a few days (or weeks ?!)

Sylwy · Post by **Sylwy** » Thu Sep 03, 2020 8:15 pm

David Carteau wrote: ↑Wed Aug 26, 2020 2:31 pm
Sylwy wrote: ↑Mon Aug 24, 2020 2:47 pm Waiting for your NNUE engine !

https://www.lrz.de/services/compute/cou ... tro_AI.pdf
Thanks a lot for sharing !

On my side, I managed to implement a very preliminary version of my neural network trainer based on PBIL. I'm not even sure that it works : there should be tons of bugs, but let's see !

First results in a few days (or weeks ?!)

More:
https://www.eso.org/sci/meetings/2019/A ... Intro3.pdf
https://www.eso.org/sci/meetings/2019/A ... Intro4.pdf

David Carteau · Post by **David Carteau** » Sun Sep 06, 2020 2:32 pm

Thx for sharing !

Building a neural network trainer is a lot more complex than implementing NNUE

First, I reused my PBIL framework, used for v0.6 evaluation tuning. I have no doubt that it works. But, despite several attempts, I didn't manage to obtain a network that produced a satisfying eval (the average gap per position with the nn-82215d0fd0df.nnue was around 2.00, that is to say 200 cp).

Then, I tried to understand what was Back-propagation and implemented what seems to be Stochastic gradient descent (SGD). I must admit that I'm not sure the way I computed the different gradients, notably due to the NNUE architecture and its "features transformation" layer, which is used twice (once for side to move, once for the other player). The trainer seemed to work, but was terribly slow and didn't perform better than the PBIL version.

I finally decided to go back to PBIL. I think I have an hint on why the algorithm converges so fast : after inspecting results, I discovered that a lot of found solutions (weights + biases) resulted in a lot of layer's outputs being "saturated" (i.e. greater than 127). I also found solutions where all positions were equally evaluated !

I'm currently working on finding a way to avoid this kind of behaviour... I cross fingers

David Carteau · Post by **David Carteau** » Mon Sep 14, 2020 8:58 am

For those who are interested, I have just released the result of my NNUE experiment with Orion : see here.

As regards my attempt to build a network trainer, I'm not sure to make progress

I first thought that I could avoid a premature convergence of the optimisation by setting as primary objective the number of correct positive/negative evaluation scores (and then, as next objective, the sum of squared errors). In my dataset, I have more than 90% of signed scores, the rest being equals to 0.00 (0 cp). I didn't managed to go further than (precisely) 41.6 % of correct signed scores. Again, the algorithm seemed to converge too fast to a suboptimal solution.

I'm now trying another approach, which is terribly slow, but seems (for now) to work : I built a first net which was able to approximate the expected evaluation scores based on 20 positions. When the approximation was appropriate (average error of each score lower or equal than 0.20, i.e. 20 cp), I increased the number of positions to evaluate. And so on... It seems to work, but I need to find another solution because after two days, I have only reached... 80 positions (with PBIL) and 112 positions (with SGD) !

Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment

Re: Orion 0.7 : NNUE experiment